If you ask your favorite large language model, Microsoft Fabric appears to be the ultimate solution for any data challenge you can imagine. That’s also the impression many people get from Microsoft’s sales teams. But is it really the silver bullet it’s made out to be? This article takes a closer look.
The first part explores the glossy marketing and sales definition of the platform. The second part looks at Fabris’s layers and deconstructs it from a more practical perspective. The third part uncovers what Microsoft Fabric is truly built for, and how it fits into the wider data landscape, especially in comparison to other major players in the data analytics market like Databricks and Snowflake.
What is Microsoft Fabric?
If you listen to Microsoft’s sales and marketing, then Microsoft Fabric is a silver bullet for every use case. Let’s take a two-step approach. Look at the sales and marketing definition. Then we’ll deconstruct it a bit from a more realistic point of view.
GenAI Definition (= Sales and Marketing)
Here’s what happened when I asked my favorite LLM, “What is Microsoft Fabric” (based on sales and marketing content):
“Microsoft Fabric is an end-to-end analytics platform designed to integrate various data services and enable businesses to manage, analyze, and act on their data seamlessly. It was launched as part of Microsoft’s data ecosystem and builds upon key features from platforms like Power BI, Azure Synapse Analytics, and Azure Data Factory. Source: Microsoft
Here are some key aspects of Microsoft Fabric:
- Unified Platform: It combines data engineering, data science, data warehousing, and real-time analytics into a single platform. This helps businesses eliminate the need to use multiple services for data management and analysis.
- Lakehouse Architecture: Fabric is designed around the lakehouse concept, which merges the best of data lakes and data warehouses. It allows for both structured and unstructured data to be stored and processed together.
- Tightly Integrated with Microsoft 365 and Azure: Microsoft Fabric connects seamlessly with other Microsoft services like Microsoft 365, Power BI, and Azure Machine Learning, enabling better collaboration, reporting, and AI-driven insights.
- Low-code/No-code Experience: The platform provides intuitive tools for data analysts, developers, and business users, allowing non-technical users to work with data through drag-and-drop interfaces, while also enabling more complex scenarios for advanced users.
- AI and Machine Learning Integration: Microsoft Fabric incorporates AI tools, making it easier for businesses to build predictive models and automate data-driven decisions.
- End-to-End Security and Governance: The platform supports robust security measures and compliance requirements, offering features like data encryption, role-based access control, and regulatory compliance support.
- Real-time Data Processing: With support for real-time analytics, Fabric enables organizations to derive insights from live data streams, improving decision-making speed and accuracy.
Microsoft Fabric is designed to streamline how businesses use data, combining the power of analytics with cloud-scale capabilities.“
Wow. Just wow. Microsoft Fabric seems to be everything you ever need for your data challenges.
Microsoft Developer has an excellent 45-minute presentation about OneLake and Microsoft Fabric with a few more technical details. This video is also the source of the screenshots below.
So let’s dig deeper. What is Microsoft Fabric really? Let’s deconstruct it a bit…
Microsoft Fabric is a Data Analytics Platform ( = NOT for Operational / Transactional Workloads)
Microsoft Fabric is part of Microsoft’s data analytics portfolio. That’s already the first alarm signal when you consider building operational workloads. This is not a criticism, but it is important to understand!
Microsoft Fabric is NOT a platform for transactional workloads like payments, fraud detection, order management, or ERP integration. You should not build an operational application like an Azure Serverless Function or a self-managed Spring Boot container for Fabric.
Furthermore, within the data analytics layer, the foundation of Microsoft Fabric is (only) an optimized storage layer. And this storage layer called OneLake is a SaaS offering. That is to say the storage is part of the Microsoft tenant. Unlike many other data lakes and lakehouses, such as Databricks, you do not control or own the storage.
While the conversation typically revolves around cloud analytics, Microsoft Fabric is a unified analytics platform that integrates with Azure Cloud but is sold separately. This allows organizations to deploy it in various environments, edge, and hybrid setups. For instance, Microsoft sells Fabric for hybrid IoT projects where data needs to be processed both locally and in the cloud.
OneLake – Cloud-based Storage Layer on Top of Azure Data Lake Storage (ADLS)
Microsoft OneLake is a unified, cloud-based data lake that acts as the central storage layer within Microsoft Fabric:
Microsoft OneLake is built on top of Azure Data Lake Storage (ADLS), using its scalable and secure data storage capabilities for long-term data retention. OneLake inherits ADLS’s features, such as hierarchical namespaces and advanced security, while adding a unified data lake experience across multiple clouds and deep integration with Microsoft’s analytics and data tools through Microsoft Fabric. Source: Microsoft
The message is obvious: Store all data in OneLake and connect your favourite compute engines, such as Microsoft Fabric, Azure Databricks, and Snowflake. Open Table formats, such as Delta Lake and Apache Iceberg, allow for simple integration without the need to copy data again.
Microsoft Fabric Connects to Many Existing Azure Services
In addition to the storage layer, OneLake, Microsoft Fabric connects to numerous existing Microsoft Azure services, including Power BI, Data Explorer, various Synapse services, and others. This explains why Microsoft Fabric can magically provide every capability you are looking for just a few months after the initial announcement. Source: Microsoft
Here are a few integrations of Azure services into the unified storage of Microsoft Fabric:
- Power BI: A critical component of Microsoft Fabric, enabling data visualization and business intelligence. It allows users to create interactive dashboards and reports directly from data stored in the lakehouse, providing real-time insights with minimal data movement.
- Azure Data Explorer: Used for analyzing large volumes of streaming and historical data. Microsoft Fabric connects to Data Explorer, allowing users to perform fast, complex, real-time queries on structured and semi-structured data.
- Azure Synapse Analytics: Fabric integrates Synapse’s data engineering capabilities, allowing users to prepare, transform, and orchestrate data pipelines. It provides a unified workspace to manage end-to-end data engineering workflows, reducing the need for complex data movement.
- Synapse Data Warehousing: Fabric integrates with Synapse’s data warehousing services, enabling easy execution of massively parallel processing (MPP) queries for large-scale analytics on structured data.
- Synapse Spark Pools: Fabric integrates with Apache Spark in Synapse, supporting big data processing, AI, and machine learning workloads. Users can leverage Spark’s distributed computing power within Fabric for data transformation, advanced analytics, and machine learning.
- Azure Machine Learning (AML): Enables data scientists to build, train, and deploy machine learning models on data stored within the Fabric lakehouse. Users can perform machine learning experiments, automate ML model training, and deploy models with an unified data platform.
- Azure Data Factory: Used for data ingestion, ETL (extract, transform, load), and data orchestration. Fabric integrates with Azure Data Factory, enabling the creation of data pipelines that seamlessly move and transform data from a wide range of sources, including on-premises databases, cloud storage, and third-party systems.
- Azure Purview: Provides a unified data catalog, allowing users to discover, classify, and govern data assets across the Fabric ecosystem. It also provides compliance and auditing capabilities.
- Azure Event Hubs and Stream Analytics: Real-time data processing and analytics. Event Hubs enables streaming data ingestion from sources like IoT devices, applications, and logs, while Stream Analytics allows for real-time data querying and analysis.
Expect more Azure services to be integrated with Microsoft Fabric in the coming months to provide a “complete lakehouse experience”. Also expect more sophisticated marketing brands, such as the new “Real-Time Intelligence Hub” that is built by connecting and reusing existing Microsoft Azure services.
So, what is the main idea behind building this lakehouse product and brand within Microsoft’s huge cloud portfolio?
Microsoft Fabric is a Lakehouse Competing with Snowflake and Databricks
A lakehouse is a data architecture that combines the features of data lakes and data warehouses, enabling the storage and processing of both structured and unstructured data. It provides the scalability and flexibility of a data lake with the data management, governance, and performance features of a data warehouse. This unified approach enables real-time analytics and machine learning on diverse types of data, reducing the need for separate infrastructures.
Most analytical data vendors transition to a full-blown lakehouse. While Databricks transitioned from the data lake foundation powered by Apache Spark to the lakehouse, Snowflake originated from the data warehouse approach but has incorporated many lakehouse features over time (although Snowflake refers to it as a more general “data cloud”).
Microsoft Fabric competes with platforms like Databricks and Snowflake in the realm of data analytics, data engineering, and data warehousing by providing an integrated, cloud-native solution for data management and analytics.
Microsoft Fabric positions itself as a more holistic and integrated platform, offering a unified solution for businesses that need to handle everything from data ingestion to real-time analytics and AI. Its Microsoft ecosystem integration is a key competitive advantage.
There are also trade-offs. For instance:
- Microsoft Fabric is only available on Azure cloud
- Not a mature product yet
- Starting a much more competitive approach with strategic partners like Databricks
The support of open table formats like Delta Lake and Apache Iceberg is great. However, this is a common trend across all lakehouses, driven by market pressure. Not because the data cloud vendors like Databricks, Snowflake, and now Microsoft with Fabric have a new business model. All of these vendors still want to collect all the data, store it forever, and put (their own!) compute services on top.
Microsoft Fabric is Azure’s Future Lakehouse
Microsoft Fabric’s integration with various Azure services enables it to offer a broad range of capabilities, including data ingestion, storage, and transformation, as well as real-time analytics, machine learning, and governance. This interconnected ecosystem enables Fabric to quickly meet diverse enterprise needs by leveraging Microsoft’s existing suite of powerful tools, providing a comprehensive data platform with minimal friction and seamless workflows.
In the end, Microsoft Fabric is a new lakehouse built on top of the optimized cloud storage, OneLake. It directly competes with other lakehouses and data clouds such as Databricks and Snowflake to become the leading unified solution for all things analytics. The future will show where this competition goes. Snowflake and Databricks already have a very strong product and customer base. They will not give up to Microsoft Fabric voluntarily.
Microsoft Fabric includes integrations with Azure Event Hubs (based on the Kafka protocol) and is building a brand around real-time intelligence. In the next article of this blog series, I will explore how this new lakehouse on Azure cloud competes or overlaps with data streaming technologies, such as Apache Kafka, Apache Flink, and others. Primer: Data Streaming and Microsoft Fabric are mostly complementary and have very different sweet spots.
How do you see the future of Microsoft Fabric? Do you already use it? What is the plan for the future, considering that you likely already have other lakehouses in your enterprise architecture?