Medallion Architecture
A layered data pipeline pattern organizing data into Bronze (raw), Silver (modeled), and Gold (consumption) layers of progressive refinement.
Last updated: 2026-04-12
Overview
The medallion architecture organizes a data platform into three progressive layers. Databricks popularized the Bronze/Silver/Gold naming; dbt uses staging/intermediate/marts; other teams say raw/curated/refined or landing/transform/serve. The branding varies but the core idea is identical: data flows from source through increasing levels of refinement until it’s ready for consumption.
The pattern answers “where does this model live?” — but deliberately doesn’t prescribe what each layer contains. That loose definition is both its strength (flexible) and its weakness (teams diverge on what belongs where).
The Three Layers
| Layer | Also called | What lives here | Who consumes it |
|---|---|---|---|
| Bronze | Raw / Landing / Staging | Source data as-received; mechanical cleaning only | Data engineers debugging quality issues; auditors tracing original values |
| Silver | Curated / Transform / Intermediate | Business data model — facts, dimensions, conformed entities | Data scientists needing full grain; analysts exploring unmodeled questions |
| Gold | Refined / Serve / Marts | Consumption-ready — governed metrics, pre-joined wide tables, semantic layer | Analysts, dashboards, BI tools, AI tools; anyone who wants pre-defined metrics without writing SQL |
Combining Medallion with Kimball and Semantic Layer
The medallion architecture answers project structure but not modeling methodology or consumption pattern. Three patterns address three different questions:
| Pattern | Question it answers |
|---|---|
| Medallion | Where does this model live? |
| Kimball dimensional modeling | How do I represent the business accurately? |
| Semantic layer | How do I expose governed metrics to consumers? |
When mapped intentionally:
- Bronze = staging: Normalize column names, cast types, deduplicate, handle nulls. No business interpretation — just a clean representation of what each source provided.
- Silver = Kimball: Fact tables at declared grains, dimension tables, SCDs, conformed dimensions. The authoritative answer to “what is a customer?”
- Gold = semantic layer: Governed metric definitions (revenue, CAC, ROAS) as first-class objects. BI tools query here. Definitions are centralized, not reimplemented per dashboard.
OBTs and Dimensional Models Coexist
One-big-tables (OBTs) and dimensional models are not either/or. OBTs are a consumption artifact that belong in Gold, downstream of Silver’s facts and dimensions. You get the rigor of dimensional modeling and the usability of pre-joined wide tables. One feeds the other.
Concrete Example: Marketing Attribution
Bronze ← Ad platform exports, clickstream events, conversion events
(cast types, normalize names, deduplicate, fix timezones)
Silver ← campaign_dim (conformed across all ad platforms)
fact_ad_spend (campaign/channel/day grain)
fact_conversions (event grain)
Gold ← metric view: total_spend, total_conversions, CAC, ROAS
(defined once; every dashboard gets the same number)
Connections
- dimensional-modeling — the modeling methodology that belongs in the Silver layer
- semantic-layer — the consumption pattern that belongs in the Gold layer
- etl-architecture — the pipeline mechanics that move data between layers
- data-warehouse — the broader system this architecture sits within
Sources
- How I Structure My Data Pipelines — added 2026-04-12
- Local clip: How I Structure My Data Pipelines