Skip to content
plainsight.pro

Landing and Staging Layers

Purpose

Landing and Staging form the Bronze entry point of your data platform. Landing is optional (delta-only preprocessing), while Staging is required (full replica of source data).

Overview

These layers bring source data into your platform with minimal transformation, preserving source fidelity while preparing for downstream integration.

%%{init: { "flowchart": { "useMaxWidth": true } } }%%
graph LR
    subgraph Source Systems
        S[(Source<br/>All data)]
    end

    subgraph Bronze
        L[Landing<br/>delta-only<br/>Optional]:::bronzeOptional
        B[Staging<br/>All data<br/>Required]:::bronze
    end

    subgraph Silver
        I[Intermediate]:::silverOptional
        C[ADS]:::silver
    end

    S -->|Optional| L
    L --> B
    S -.->|Direct| B
    B --> I
    I --> C

    classDef bronze fill:#CD7F32,stroke:#8B4513,stroke-width:1px,color:#FFFFFF;
    classDef bronzeOptional fill:#CD7F32,stroke:#8B4513,stroke-width:2px,stroke-dasharray:5 5,color:#FFFFFF;
    classDef silver fill:#C0C0C0,stroke:#808080,stroke-width:1px,color:#111827;
    classDef silverOptional fill:#C0C0C0,stroke:#808080,stroke-width:2px,stroke-dasharray:5 5,color:#111827;

Landing Layer (Optional)

Purpose: Preprocess incremental data before merging into staging.

Data pattern: Delta-only (increments, changes, new records)

When to use: - External ingestion tools populate external tables (e.g., replication tools, data connectors) - Incremental CDC streams need preprocessing before full history build - Raw files require parsing (JSON, Parquet, XML) - Source data needs filtering or light transformation before staging

When to skip: - Source is directly queryable without preprocessing - No external tools; direct database connections work - Simple incremental patterns can be handled in staging

Landing Characteristics

Aspect Details
Data Volume Incremental only
Transformation Minimal: parse, flatten, filter
Materialization External tables, views, or ephemeral
Naming lnd_<source>_<entity>
Retention Short-term or transient

Common Landing Patterns

1. External tables from ingestion tools - Replication tools land data as external tables - Landing layer filters deleted records, adds metadata - Staging merges increments into full history

2. Raw file parsing - Object storage contains JSON/Parquet/CSV files - Landing parses and flattens nested structures - Staging consolidates parsed data

3. CDC stream preprocessing - Change streams capture INSERT/UPDATE/DELETE operations - Landing captures operations with sequence tracking - Staging applies CDC logic to build full state

Start Without Landing

Most projects don't need landing initially. Add it only when preprocessing complexity (external tools, file parsing, CDC) justifies the extra layer.

Staging Layer (Required)

Purpose: Stable, source-aligned replica with full history.

Data pattern: All data (complete history after ETL load)

Key principle: Staging contains a replica of source data after ETL processing. Data is as close to the source as possible - similar column names, similar table names, minimal corrections. This layer enables reloads to subsequent layers without hitting source systems again.

Staging Characteristics

Aspect Details
Data Volume All data (full history)
Transformation Minimal - rename, cast, light cleaning
Materialization Views or tables
Naming stg_<source>_<entity>
Retention Long-term (foundation for downstream)

Why Staging Matters

Single source of truth: Downstream layers reference staging, not raw sources; consistent data access across the platform.
Reprocessing capability: Rebuild ADS/dimensional layers from staging without impacting source systems.
Testing entry point: Data quality checks start here: catch bad data before propagation.
Source isolation: Each source system gets its own staging area - clear lineage and simplified troubleshooting.