Data Warehouse

A subject-oriented, integrated, time-variant, non-volatile collection of data supporting management decision-making. (Bill Inmon’s definition)

Last updated: 2026-04-12

Overview

A data warehouse (DWH) is the central repository for an organization’s historical, integrated, analytics-ready data. It is distinct from operational databases (OLTP), which are optimized for transaction processing. The DWH is optimized for OLAP: aggregation, slicing, historical comparison, and reporting.

Inmon’s Four Properties

bill-inmon defined a data warehouse as having four essential properties:

  1. Subject-oriented: Organized around business subjects (customers, products, sales) rather than application functions (order entry, billing)
  2. Integrated: Data from disparate sources is cleansed, conformed, and unified — one definition of “customer” enterprise-wide
  3. Time-variant: Contains history; data is not updated in place. A snapshot is preserved at each point in time
  4. Non-volatile: Data is loaded and read; not deleted or modified by operational processes

Architecture Layers

LayerRole
Source systems (OLTP)Operational databases, APIs, flat files
ETL pipelineExtract, transform, load
Staging / IntegrationCleansed, reconciled intermediate layer
Enterprise DWHIntegrated, historical data (3NF in Inmon; dimensional in Kimball)
Data martsSubject-specific, user-facing star schemas
BI / ReportingTools querying the marts (Tableau, Power BI, Looker)

DWH vs. Data Lake vs. Data Lakehouse

Data WarehouseData LakeData Lakehouse
FormatStructured (tables)Raw (any format)Both
SchemaSchema-on-writeSchema-on-readBoth
UsersBI analystsData scientistsBoth
ExamplesSnowflake, RedshiftS3, ADLSDatabricks, Iceberg

Design Approaches

  • Kimball (bottom-up): Build dimensional data marts first; conformed dimensions provide integration. See dimensional-modeling.
  • Inmon (top-down): Build normalized enterprise DWH first; derive data marts from it.
  • Data Vault: Hub-Satellite-Link model for auditability and flexibility. Developed by Dan Linstedt.
  • Medallion: Bronze/Silver/Gold layers of progressive refinement. Maps neatly to Kimball (Silver) + semantic layer (Gold). See medallion-architecture.

Data Vault (brief)

An alternative to Kimball/Inmon for the integration layer:

  • Hubs: Business keys (customer_id, product_id)
  • Satellites: Descriptive attributes with full history (attached to hubs)
  • Links: Relationships between hubs (order contains products)

Data vault is insert-only (full audit trail), highly parallelizable for loading, and resilient to schema change. Typically sits in the integration layer before a Kimball-style presentation layer.

Connections

Sources