Warehouse vs Lakehouse: A Decision Guide¶

Purpose

Microsoft Fabric offers two enterprise-scale, open-standard storage workloads: Warehouse and Lakehouse. Both store data in Delta format on OneLake, but serve different use cases and developer profiles. This guide helps you choose the right platform based on your workload, data type, and analytical needs.

Quick Decision Tree¶

%%{init: { "flowchart": { "useMaxWidth": true, "curve": "basis" }, "theme": "base" } }%%
flowchart TD
    Start["Choose your platform"] --> Q1{"How do you want<br/>to develop?"}
    Q1 -->|Spark / Python| Spark["Use Lakehouse"]
    Q1 -->|T-SQL / SQL| SQL["Use Warehouse"]

    Start --> Q2{"Multi-table<br/>transactions?"}
    Q2 -->|Yes| Warehouse["Use Warehouse"]
    Q2 -->|No| Q3{"Data type?"}

    Q3 -->|Unstructured +<br/>Structured| Lakehouse["Use Lakehouse"]
    Q3 -->|Structured only| Q4{"Workload?"}
    Q4 -->|SQL Analytics<br/>BI queries| Warehouse2["Use Warehouse"]
    Q4 -->|Data exploration<br/>medallion arch| Lakehouse2["Use Lakehouse"]

At a Glance¶

Aspect	Warehouse	Lakehouse
Best for	Enterprise SQL/BI analytics, multi-table ACID transactions	Data engineering, mixed data types, medallion architectures (Bronze/Silver/Gold, see Medallion Architecture)
Developer profile	SQL developers, BI analysts, citizen developers	Data engineers, data scientists, Python/Spark developers
Data loading	SQL, pipelines, dataflows	Spark, pipelines, dataflows, shortcuts, notebooks
Language	T-SQL (full DQL, DML, DDL)	PySpark, Spark SQL (limited T-SQL via SQL endpoint)
Transactions	Full ACID support with multi-table guarantees	No transactions (read-only SQL endpoint)
Unstructured data	❌ Not ideal	✅ Native support
Cost model	Per-warehouse capacity usage	Per-capacity Spark compute + storage

Detailed Comparison¶

1. Warehouse - Enterprise Data Warehouse¶

When to use: - Enterprise-scale analytics with strict ACID compliance - Structured data only, analyzed with T-SQL - Complex workflows with multi-table transactions (updates, inserts, deletes) - BI teams need reliable, performant SQL analytics - Departmental or business-unit data warehousing

Key features: - Full T-SQL support: DQL, DML, DDL, stored procedures, views, functions - ACID transactions: Multi-table consistency guarantees - No configuration: Auto-scaling, no compute/storage tuning needed - Lake-centric: Data stored in Delta format on OneLake for portability - Integrated tooling: Full read/write support for SQL Server Management Studio (SSMS), Azure Data Studio, and third-party tools - Semantic layer: Direct Power BI integration with default semantic models - Cross-database queries: Virtual warehouses for federated analytics

Typical architecture: - Staging zone (bronze) → transform with T-SQL → production tables (gold) - ETL pipelines using SQL or dataflows - Direct BI consumption via semantic models

Example use case:

A retail company needs to consolidate sales, inventory, and customer data into a central warehouse for daily reporting and forecasting. Multi-table transactions ensure data consistency during nightly ETL. SQL developers use SSMS to build procedures; BI analysts connect Power BI for dashboards.

2. Lakehouse - Data Architecture Platform¶

When to use: - Mixed data types (logs, images, raw data + structured tables) - Data engineering and data science workflows (Spark / Python) - Medallion architecture (bronze → silver → gold zones, see Medallion Architecture) - Rapid data exploration and prototyping - Incremental data pipelines with Spark

Key features: - Spark-first development: Native PySpark, SQL, and notebooks - Automatic discovery: Files → tables with minimal effort - SQL analytics endpoint: Read-only T-SQL endpoint for querying Delta tables and shortcuts - Open format: Data in Delta format, accessible via Spark and external tools - Flexible ingestion: Pipelines, dataflows, shortcuts, notebooks, ETL code - Medallion-friendly: Zone-based structure for bronze (raw), silver (cleaned), and gold (analytics) - see Medallion Architecture for how Plainsight maps this to explicit layers - Large-scale processing: Spark compute for transformations and ML

Typical architecture: - Raw zone (bronze) → Spark ETL → curated zone (silver) → optional SQL endpoint for BI - Optional SQL analytics endpoint for read-only T-SQL querying - Notebooks for data exploration and feature engineering

Example use case:

A healthcare organization ingests streaming sensor data (unstructured logs), historical patient records (structured), and external APIs into a Lakehouse. Spark jobs clean and normalize the data into silver tables; data scientists use notebooks for exploratory analysis. A read-only SQL endpoint serves analytical queries to Power BI dashboards.

Decision Framework¶

By Development Language¶

Language	Recommendation
T-SQL	Warehouse (full DDL/DML/DQL support)
PySpark / Spark SQL	Lakehouse (native Spark support)
SQL + Python mix	Lakehouse (Spark) or Warehouse (if T-SQL + pipelines suffice)
Python / Notebooks	Lakehouse

By Data Type¶

Data Profile	Recommendation
Structured only	Warehouse (simpler, optimized for SQL)
Structured + unstructured	Lakehouse (native file + table support)
Logs, images, raw blobs	Lakehouse
Well-defined schemas	Either (prefer Warehouse for compliance)

By Workload Pattern¶

Pattern	Recommendation
BI/Analytics (SQL queries)	Warehouse or Lakehouse SQL endpoint
ETL (bulk transforms)	Warehouse (T-SQL) or Lakehouse (Spark)
Data engineering	Lakehouse
Data science / ML	Lakehouse
Multi-table transactions	Warehouse (required for ACID)
Medallion architecture	Lakehouse

Feature Comparison Table¶

Feature	Warehouse	Lakehouse	Lakehouse SQL Endpoint
ACID compliance	✅ Full	❌	❌
T-SQL (DQL)	✅ Full	✅ Via endpoint	✅ Full
T-SQL (DML/DDL)	✅ Full	❌	❌ Limited (views, TVFs)
Spark support	❌	✅ Native	❌
Unstructured data	⚠️ Via shortcuts	✅ Native	❌
Read Delta tables	✅	✅	✅
Write Delta tables	✅	✅	❌
Data loading	SQL, pipelines, dataflows	Pipelines, dataflows, Spark, shortcuts, notebooks
Notebooks	❌	✅	❌
Transactions	✅ Multi-table	❌	❌
Stored procedures	✅	❌	❌ (views/TVFs only)
Third-party tools	✅ Read/write	⚠️ Read (via endpoint)	⚠️ Read (via endpoint)

Common Patterns¶

Pattern 1: SQL-Only Analytics¶

Use Warehouse
SQL developers load structured data via pipelines or SSMS, build views and procedures, and serve BI via Power BI. No Spark, no notebooks needed.

Pattern 2: Medallion Architecture (Data Engineering)¶

Use Lakehouse
Bronze (raw ingestion) → Silver (Spark cleaning) → Gold (business tables). Optional SQL endpoint for BI consumption without extra replication.

Pattern 3: Hybrid (Analytics + Science)¶

Use Lakehouse + optional Warehouse
Data engineers use Lakehouse for ingestion and transformation; data scientists explore with notebooks. BI teams consume either via Lakehouse SQL endpoint or replicate curated gold tables to Warehouse for stricter governance.

Pattern 4: Multi-Tenant / Data Mesh¶

Use Warehouse
Departments own independent warehouses with strict ACID guarantees and cross-warehouse queries via virtual warehouses for federated analytics.

Cost & Performance Considerations¶

Both Warehouse and Lakehouse consume Fabric capacity (F2–F64 SKUs).

Warehouse: Optimized for SQL query cost; minimal overhead for read-heavy BI workloads.
Lakehouse: Cost depends on Spark compute and storage; medallion patterns with incremental pipelines minimize Spark overhead.

Reserve capacity for predictable workloads; scale up for ad-hoc or peak analytics.

Lakehouse Architecture - Deep dive into medallion design patterns
Capacity Management - Optimize Fabric capacity for both platforms
Workspace Organization - Structure teams and permissions
Data Pipeline Patterns - Design data flows for either platform