Context Engineering

The systematic process of designing and optimizing context collection, storage, management, and usage to enhance machine understanding and task performance — a discipline with 20+ years of history, now at its most consequential phase.

Last updated: 2026-04-16

Overview

Context engineering is often treated as a recent LLM-era concept (prompt engineering, RAG, memory systems). Hua et al. (2025) argue it’s a 20+ year discipline that has evolved through four distinct phases aligned with machine intelligence levels. The core challenge is unchanged across all eras: bridging the cognitive gap between human intention and machine understanding.

Formal definition:

CE: (C, 𝒯) → fcontext

Where C = raw contextual information, 𝒯 = target task, and fcontext = resulting context processing function. Practically: “the systematic process of designing and optimizing context collection, storage, management, and usage to enhance machine understanding and task performance.”

Core framing: context engineering is fundamentally entropy reduction — transforming high-entropy human intentions and situations into low-entropy representations that machines can process. As machines get smarter, the entropy gap narrows and the engineering effort required decreases.


The Four Eras

EraPeriodMachine IntelligenceKey Mechanism
1.01990s–2020Primitive / rule-basedSensor fusion, rule triggers, structured inputs (GPS, time, device state)
2.02020–presentAgent-centric / LLMPrompting, RAG, tool-calling, chain-of-thought, memory agents
3.0Near futureHuman-levelMultimodal perception (touch, taste, smell), emotional understanding, seamless collaboration
4.0SpeculativeSuperhumanMachines construct context proactively, reveal unarticulated needs, “god’s eye view”

Key principle: “Each qualitative leap in machine intelligence triggers a fundamental revolution in human-machine interfaces.”

Era 1.0 systems (Context Toolkit, Cooltown, ContextPhone): passive sensing, context-aware triggers. You describe your location and the system adapts.

Era 2.0 systems (ChatGPT, LangChain, Claude Code, Letta): active, context-cooperative. The system interprets and collaborates on context — it doesn’t just receive it.

Era 4.0 (speculative): machines construct your context for you, inferring what you need before you articulate it — “digital presence” as a computational representation of an individual.


Three Design Dimensions

1. Collection & Storage

Two foundational principles:

  • Minimal Sufficiency — collect and store only what’s necessary for the task
  • Semantic Continuity — maintain continuity of meaning, not just continuity of data

Era 1.0: single-device, simple logs, local databases Era 2.0: distributed endpoints, layered architecture (edge cache → local DB like SQLite/LevelDB → cloud persistence)

2. Management

Five textual compression approaches:

  1. Timestamp marking — preserves order, lacks semantic structure
  2. Functional tagging — labels by role (goal, decision, action)
  3. QA-pair compression — reformulates as question-answer pairs
  4. Hierarchical notes — tree-like concept organization
  5. Vector compression — progressive embedding into semantic vectors

Multimodal fusion: map inputs to shared vector space; joint processing via unified Transformer; cross-attention between modalities.

Self-baking: converting raw context into compact representations (summaries, schemas, embeddings) for efficient future retrieval.

Layered memory: short-term (high temporal relevance) ↔ long-term (high importance). Subagents with isolated context windows and restricted permissions.

3. Usage

Intra-system sharing: embedding context into prompts, structured messages between agents, shared memory (blackboards, task graphs, semantic graphs).

Cross-system sharing: adapters, shared data formats (JSON schemas, APIs), human-readable summaries, semantic vectors.

Context selection criteria: semantic relevance, logical dependency (task prerequisites), recency and frequency, deduplication, user preferences.

Proactive inference: learning from interaction patterns, inferring hidden goals from query sequences, detecting user struggles.


Four Challenges for Lifelong Context

  1. Storage bottlenecks at scale — context accumulation has no natural ceiling
  2. Processing degradation — attention collapse at long context lengths; O(n²) complexity
  3. System instability — accumulated errors compound over time
  4. Evaluation difficulty — no clear correctness metrics for “good context”

Applications in Practice

  • Claude Code: CLAUDE.md + AGENTS.md files as project context inheritance — a direct Era 2.0 implementation of context management
  • Deep research agents (e.g., Tongyi): cyclic search-extract-question-integrate loop with periodic context compression
  • Brain-Computer Interfaces: emerging frontier for richer, implicit context collection

Connections

  • agent-memory — three-layer memory model (MEMORY.md, episodic vectors, QMD) is a concrete Era 2.0 context management implementation
  • agentic-engineering — the Configuration and Capability layers are context engineering in practice
  • coding-agent — context management is one of the six core components (compaction, memory)
  • thin-harness-fat-skills — CLAUDE.md as resolver is context engineering at the harness level; skill files are prepackaged context
  • auto-research — program.MD is a context engineering artifact: structured intent for autonomous agent loops
  • dark-code — context gaps (no audit trail, no decision path reconstruction) are the failure mode of bad context engineering
  • ai-agents — context engineering is the discipline underlying agent effectiveness

Sources