Context Engineering

The systematic process of designing and optimizing context collection, storage, management, and usage to enhance machine understanding and task performance — a discipline with 20+ years of history, now at its most consequential phase.

Last updated: 2026-04-16

Overview

Context engineering is often treated as a recent LLM-era concept (prompt engineering, RAG, memory systems). Hua et al. (2025) argue it’s a 20+ year discipline that has evolved through four distinct phases aligned with machine intelligence levels. The core challenge is unchanged across all eras: bridging the cognitive gap between human intention and machine understanding.

Formal definition:

CE: (C, 𝒯) → fcontext

Where C = raw contextual information, 𝒯 = target task, and fcontext = resulting context processing function. Practically: “the systematic process of designing and optimizing context collection, storage, management, and usage to enhance machine understanding and task performance.”

Core framing: context engineering is fundamentally entropy reduction — transforming high-entropy human intentions and situations into low-entropy representations that machines can process. As machines get smarter, the entropy gap narrows and the engineering effort required decreases.

The Four Eras

Era	Period	Machine Intelligence	Key Mechanism
1.0	1990s–2020	Primitive / rule-based	Sensor fusion, rule triggers, structured inputs (GPS, time, device state)
2.0	2020–present	Agent-centric / LLM	Prompting, RAG, tool-calling, chain-of-thought, memory agents
3.0	Near future	Human-level	Multimodal perception (touch, taste, smell), emotional understanding, seamless collaboration
4.0	Speculative	Superhuman	Machines construct context proactively, reveal unarticulated needs, “god’s eye view”

Key principle: “Each qualitative leap in machine intelligence triggers a fundamental revolution in human-machine interfaces.”

Era 1.0 systems (Context Toolkit, Cooltown, ContextPhone): passive sensing, context-aware triggers. You describe your location and the system adapts.

Era 2.0 systems (ChatGPT, LangChain, Claude Code, Letta): active, context-cooperative. The system interprets and collaborates on context — it doesn’t just receive it.

Era 4.0 (speculative): machines construct your context for you, inferring what you need before you articulate it — “digital presence” as a computational representation of an individual.

Three Design Dimensions

1. Collection & Storage

Two foundational principles:

Minimal Sufficiency — collect and store only what’s necessary for the task
Semantic Continuity — maintain continuity of meaning, not just continuity of data

Era 1.0: single-device, simple logs, local databases Era 2.0: distributed endpoints, layered architecture (edge cache → local DB like SQLite/LevelDB → cloud persistence)

2. Management

Five textual compression approaches:

Timestamp marking — preserves order, lacks semantic structure
Functional tagging — labels by role (goal, decision, action)
QA-pair compression — reformulates as question-answer pairs
Hierarchical notes — tree-like concept organization
Vector compression — progressive embedding into semantic vectors

Multimodal fusion: map inputs to shared vector space; joint processing via unified Transformer; cross-attention between modalities.

Self-baking: converting raw context into compact representations (summaries, schemas, embeddings) for efficient future retrieval.

Layered memory: short-term (high temporal relevance) ↔ long-term (high importance). Subagents with isolated context windows and restricted permissions.

3. Usage

Intra-system sharing: embedding context into prompts, structured messages between agents, shared memory (blackboards, task graphs, semantic graphs).

Cross-system sharing: adapters, shared data formats (JSON schemas, APIs), human-readable summaries, semantic vectors.

Context selection criteria: semantic relevance, logical dependency (task prerequisites), recency and frequency, deduplication, user preferences.

Proactive inference: learning from interaction patterns, inferring hidden goals from query sequences, detecting user struggles.

Four Challenges for Lifelong Context

Storage bottlenecks at scale — context accumulation has no natural ceiling
Processing degradation — attention collapse at long context lengths; O(n²) complexity
System instability — accumulated errors compound over time
Evaluation difficulty — no clear correctness metrics for “good context”

Applications in Practice

Claude Code: CLAUDE.md + AGENTS.md files as project context inheritance — a direct Era 2.0 implementation of context management
Deep research agents (e.g., Tongyi): cyclic search-extract-question-integrate loop with periodic context compression
Brain-Computer Interfaces: emerging frontier for richer, implicit context collection

Connections

agent-memory — three-layer memory model (MEMORY.md, episodic vectors, QMD) is a concrete Era 2.0 context management implementation
agentic-engineering — the Configuration and Capability layers are context engineering in practice
coding-agent — context management is one of the six core components (compaction, memory)
thin-harness-fat-skills — CLAUDE.md as resolver is context engineering at the harness level; skill files are prepackaged context
auto-research — program.MD is a context engineering artifact: structured intent for autonomous agent loops
dark-code — context gaps (no audit trail, no decision path reconstruction) are the failure mode of bad context engineering
ai-agents — context engineering is the discipline underlying agent effectiveness

Sources

Context Engineering 2.0: The Context of Context Engineering — Hua et al., SJTU/SII/GAIR — arXiv:2510.26493, added 2026-04-16

second-brain

Explorer

context-engineering