• About
  • Privacy Policy
  • Disclaimer
  • Contact
Soft Bliss Academy
No Result
View All Result
  • Home
  • Artificial Intelligence
  • Software Development
  • Machine Learning
  • Research & Academia
  • Startups
  • Home
  • Artificial Intelligence
  • Software Development
  • Machine Learning
  • Research & Academia
  • Startups
Soft Bliss Academy
No Result
View All Result
Home Machine Learning

Stop Building AI Platforms | Towards Data Science

softbliss by softbliss
June 14, 2025
in Machine Learning
0
Stop Building AI Platforms | Towards Data Science
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


and medium companies achieve success in building Data and ML platforms, building AI platforms is now profoundly challenging. This post discusses three key reasons why you should be cautious about building AI platforms and proposes my thoughts on promising directions instead.

Disclaimer: It is based on personal views and does not apply to cloud providers and data/ML SaaS companies. They should instead double down on the research of AI platforms.

Where I am Coming From

In my previous article From Data Platform to ML Platform in Toward Data Science, I shared how a data platform evolves into an ML platform. This journey applies to most small and medium-sized companies. However, there was no clear path for small and medium-sized companies to continue developing their platforms into AI platforms yet. Leveling up to AI platforms, the path forked into two directions:

  • AI Infrastructure: The “New Electricity” (AI Inference) is more efficient when centrally generated. It is a game for big techs and large model providers.
  • AI Applications Platform: Cannot build the “beach house” (AI platform) on constantly shifting ground. The evolving AI capability and emerging new development paradigm make finding lasting standardization challenging.

However, there are still directions that are likely to remain important even as AI models continue to evolve. It is covered at the end of this post.

High Barrier of AI Infrastructure

While Databricks is maybe only several times better than your own Spark jobs, DeepSeek could be 100x more efficient than you on LLM inferencing. Training and serving an LLM model require significantly more investment in infrastructure and, as importantly, control over the LLM model’s structure.

Image Generated by OpenAI ChatGPT 4o

In this series, I briefly shared the infrastructure for LLM training, which includes parallel training strategies, topology designs, and training accelerations. On the hardware side, besides high-performance GPUs and TPUs, a significant portion of the cost went to networking setup and high-performance storage services. Clusters require an additional RDMA network to enable non-blocking, point-to-point connections for data exchange between instances. The orchestration services must support complex job scheduling, failover strategies, hardware issue detection, and GPU resource abstraction and pooling. The training SDK needs to facilitate asynchronous checkpointing, data processing, and model quantization.

Regarding model serving, model providers often incorporate inference efficiency during model development stages. Model providers likely have better model quantification strategies, which would produce the same model quality with a significantly smaller model size. Model providers are likely to develop a better model parallel strategy due to the control they have over the model structure. It can increase the batch size during LLM inference, which effectively increases GPU utilization. Additionally, large LLM players have logistical advantages that enable them to access cheaper routers, mainframes, and GPU chips. More importantly, stronger model structure control and better model parallel capability mean model providers can leverage cheaper GPU devices. For model consumers relying on open-source models, GPU deprecation could be a bigger concern.

Take DeepSeek R1 as an example. Let’s say you’re using p5e.48xlarge AWS instance which provide 8 H200 chips with NVLink connected. It will cost you 35$ per hour. Assuming you are doing as well as Nvidia and achieve 151 tokens/second performance. To generate 1 million output tokens, it will cost you $64(1 million / (151 * 3600) * $35). How much does DeepSeek sell its token at per million? 2$ only! DeepSeek can achieve 60 times the efficiency of your cloud deployment (assuming a 50% margin from DeepSeek).

So, LLM inference power is indeed like electricity. It reflects the diversity of applications that LLMs can power; it also implies that it is most efficient when centrally generated. Nevertheless, you should still self-host LLM services for privacy-sensitive use cases, just like hospitals have their electricity generators for emergencies.

Constantly shifting ground

Investing in AI infrastructure is a bold game, and building lightweight platforms for AI applications comes with its hidden pitfalls. With the rapid evolution of AI model capabilities, there is no aligned paradigm for AI applications; therefore, there is a lack of a solid foundation for building AI applications.

Image Generated by OpenAI ChatGPT 4o

The simple answer to that is: be patient.

If we take a holistic view of data and ML platforms, development paradigms emerge only when the capabilities of algorithms converge.
Domains Algorithm Emerge Solution Emerge Big Platforms Emerge
Data Platform 2004 — MapReduce (Google) 2010–2015 — Spark, Flink, Presto, Kafka 2020–Now — Databricks, Snowflake
ML Platform 2012 — ImageNet (AlexNet, CNN breakthrough) 2015–2017 — TensorFlow, PyTorch, Scikit-learn 2018–Now — SageMaker, MLflow, Kubeflow, Databricks ML
AI Platform 2017 — Transformers (Attention is All You Need) 2020–2022 —ChatGPT, Claude, Gemini, DeepSeek 2023–Now — ??

After several years of fierce competition, a few large model players remain standing in the Arena. However, the evolution of the AI capability is not yet converging. With the advancement of AI models’ capabilities, the existing development paradigm will quickly become obsolete. Big players have just started to take their stab at agent development platforms, and new solutions are popping up like popcorn in an oven. Winners will eventually appear, I believe. For now, building agent standardization themselves is a tricky call for small and medium-sized companies. 

Path Dependency of Old Success

Another challenge of building an AI platform is rather subtle. It is about reflecting the mindset of platform builders, whether having path dependency from the previous success of building data and ML platforms.

Image Generated by OpenAI ChatGPT 4o

As we previously shared, since 2017, the data and ML development paradigms are well-aligned, and the most critical task for the ML platform is standardization and abstraction. However, the development paradigm for AI applications is not yet established. If the team follows the previous success story of building a data and ML platform, they might end up prioritizing standardization at the wrong time. Possible directions are:

  • Build an AI Model Gateway: Provide centralised audit and logging of requests to LLM models.
  • Build an AI Agent Framework: Develop a self-built SDK for creating AI agents with enhanced connectivity to the internal ecosystem.
  • Standardise RAG Practices: Building a Standard Data Indexing Flow to lower the bar for engineer build knowledge services.

Those initiatives can indeed be significant. But the ROI really depends on the scale of your company. Regardless, you’re gonna have the following challenges:

  • Keep up with the latest AI developments.
  • Customer adoption rate when it is easy for customers to bypass your abstraction.

Suppose builders of data and ML platforms are like “Closet Organizers”, AI builders now should act like “Fashion Designers”. It requires embracing new ideas, conducting rapid experiments, and even accepting a level of imperfection.

My Thoughts on Promising Directions

Even though so many challenges are ahead, please be reminded that it is still gratifying to work on the AI platform right now, as you have substantial leverage which wasn’t there before:

  • The transformation capability of AI is more substantial than that of data and machine learning.
  • The motivation to adopt AI is way more potent than ever.

If you pick the right direction and strategy, the transformation you can bring to your organisation is significant. Here are some of my thoughts on directions that might experience less disruption as the AI model scales further. I think they are equally important with AI platformisation:

  • High-quality, rich-semantic data products: Data products with high accuracy and accountability, rich descriptions, and trustworthy metrics will “radiate” more impact with the growth of AI models.
  • Multi-modal Data Serving: OLTP, OLAP, NoSQL, and Elasticsearch, a scalable knowledge service behind the MCP server, may require multiple types of databases to support high-performance data serving. It is challenging to maintain a single source of truth and performance with constant reverse ETL jobs.
  • AI DevOps: AI-centric software development, maintenance, and analytics. Code-gen accuracy is greatly increased over the past 12 months.
  • Experimentation and Monitoring: Given the increased uncertainty of AI applications, the evaluation and monitoring of these applications are even more critical.

These are my thoughts on building AI platforms. Please let me know your thoughts on it as well. Cheers!


Tags: BuildingDataPlatformsScienceStop
Previous Post

A Guide to Telemedicine Software Development

Next Post

Why Creators Are Craving Unfiltered AI Video Generators

softbliss

softbliss

Related Posts

Automating GitHub Workflows with Claude 4
Machine Learning

Automating GitHub Workflows with Claude 4

by softbliss
June 14, 2025
Normal Technology at Scale – O’Reilly
Machine Learning

Normal Technology at Scale – O’Reilly

by softbliss
June 14, 2025
NVIDIA CEO Drops the Blueprint for Europe’s AI Boom
Machine Learning

NVIDIA CEO Drops the Blueprint for Europe’s AI Boom

by softbliss
June 13, 2025
Machine Learning

Apple Machine Learning Research at CVPR 2025

by softbliss
June 13, 2025
Bringing meaning into technology deployment | MIT News
Machine Learning

Bringing meaning into technology deployment | MIT News

by softbliss
June 12, 2025
Next Post
Why Creators Are Craving Unfiltered AI Video Generators

Why Creators Are Craving Unfiltered AI Video Generators

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Premium Content

Build and train a recommender system in 10 minutes using Keras and JAX

Build and train a recommender system in 10 minutes using Keras and JAX

May 16, 2025
Dynatrace Live Debugger, Mistral Agents API, and more – SD Times Daily Digest

Dynatrace Live Debugger, Mistral Agents API, and more – SD Times Daily Digest

May 30, 2025
Is Your Data Storage Strategy AI-Ready?

Is Your Data Storage Strategy AI-Ready?

April 15, 2025

Browse by Category

  • Artificial Intelligence
  • Machine Learning
  • Research & Academia
  • Software Development
  • Startups

Browse by Tags

Amazon App Apps Artificial Blog Build Building Business CEO Coding Data Development Framework Future Gemini Generative Google Guide Innovation Intelligence Language Learning LLM LLMs Machine Microsoft MIT model Models News NVIDIA opinion OReilly Research Science Series Software Solutions Startup Startups Strategies students Tech Tools Video

Soft Bliss Academy

Welcome to SoftBliss Academy, your go-to source for the latest news, insights, and resources on Artificial Intelligence (AI), Software Development, Machine Learning, Startups, and Research & Academia. We are passionate about exploring the ever-evolving world of technology and providing valuable content for developers, AI enthusiasts, entrepreneurs, and anyone interested in the future of innovation.

Categories

  • Artificial Intelligence
  • Machine Learning
  • Research & Academia
  • Software Development
  • Startups

Recent Posts

  • Strategies for Student-Driven Learning for Reading & Writing
  • Google Gemini Introduces Kid-Safe AI
  • Automating GitHub Workflows with Claude 4

© 2025 https://softblissacademy.online/- All Rights Reserved

No Result
View All Result
  • Home
  • Artificial Intelligence
  • Software Development
  • Machine Learning
  • Research & Academia
  • Startups

© 2025 https://softblissacademy.online/- All Rights Reserved

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?