Data Warehouse
A subject-oriented, integrated, time-variant, non-volatile collection of data supporting management decision-making. (Bill Inmon’s definition)
Last updated: 2026-04-12
Overview
A data warehouse (DWH) is the central repository for an organization’s historical, integrated, analytics-ready data. It is distinct from operational databases (OLTP), which are optimized for transaction processing. The DWH is optimized for OLAP: aggregation, slicing, historical comparison, and reporting.
Inmon’s Four Properties
bill-inmon defined a data warehouse as having four essential properties:
- Subject-oriented: Organized around business subjects (customers, products, sales) rather than application functions (order entry, billing)
- Integrated: Data from disparate sources is cleansed, conformed, and unified — one definition of “customer” enterprise-wide
- Time-variant: Contains history; data is not updated in place. A snapshot is preserved at each point in time
- Non-volatile: Data is loaded and read; not deleted or modified by operational processes
Architecture Layers
| Layer | Role |
|---|---|
| Source systems (OLTP) | Operational databases, APIs, flat files |
| ETL pipeline | Extract, transform, load |
| Staging / Integration | Cleansed, reconciled intermediate layer |
| Enterprise DWH | Integrated, historical data (3NF in Inmon; dimensional in Kimball) |
| Data marts | Subject-specific, user-facing star schemas |
| BI / Reporting | Tools querying the marts (Tableau, Power BI, Looker) |
DWH vs. Data Lake vs. Data Lakehouse
| Data Warehouse | Data Lake | Data Lakehouse | |
|---|---|---|---|
| Format | Structured (tables) | Raw (any format) | Both |
| Schema | Schema-on-write | Schema-on-read | Both |
| Users | BI analysts | Data scientists | Both |
| Examples | Snowflake, Redshift | S3, ADLS | Databricks, Iceberg |
Design Approaches
- Kimball (bottom-up): Build dimensional data marts first; conformed dimensions provide integration. See dimensional-modeling.
- Inmon (top-down): Build normalized enterprise DWH first; derive data marts from it.
- Data Vault: Hub-Satellite-Link model for auditability and flexibility. Developed by Dan Linstedt.
- Medallion: Bronze/Silver/Gold layers of progressive refinement. Maps neatly to Kimball (Silver) + semantic layer (Gold). See medallion-architecture.
Data Vault (brief)
An alternative to Kimball/Inmon for the integration layer:
- Hubs: Business keys (customer_id, product_id)
- Satellites: Descriptive attributes with full history (attached to hubs)
- Links: Relationships between hubs (order contains products)
Data vault is insert-only (full audit trail), highly parallelizable for loading, and resilient to schema change. Typically sits in the integration layer before a Kimball-style presentation layer.
Connections
- medallion-architecture — Bronze/Silver/Gold pipeline structure; combines with Kimball and semantic layer
- semantic-layer — governed metric definitions as first-class objects (Gold layer)
- dimensional-modeling — the primary design methodology for DWH data marts
- star-schema — the physical schema of a dimensional data mart
- etl-architecture — the pipeline that feeds the DWH
- slowly-changing-dimensions — how history is preserved in dimension tables
- bill-inmon — coined the term “data warehouse” and the 4 properties
- ralph-kimball — dimensional modeling methodology
Sources
- Dimensional Modeling — Joakim Dalby — added 2026-04-12