Agent Memory

Three-layer memory model for coding agents: active context (MEMORY.md), session history (episodic-memory), and broader knowledge base (QMD).

Last updated: 2026-04-12

Overview

Every AI coding session starts fresh. The architectural decisions you explained yesterday, the rationale for that odd module pattern, the debugging context you built up: gone. You start every morning re-explaining your own project from scratch.

Memory across sessions doesn’t happen by default because LLMs are stateless — they only know what’s in the current context window. Claude Code stores each session as a JSONL file on disk, but the next session doesn’t read it.

Three tools address this at different layers, each suited to a different kind of knowledge:

Layer	Tool	What it covers	Scales?
Active context	MEMORY.md	Conventions, decisions, current state	No — flat file, token pressure
Session history	episodic-memory	Past conversations, reasoning traces	Yes — vector search
Broader knowledge	QMD	Docs, specs, meeting notes	Yes — on-device full-text

Layer 1: MEMORY.md

Claude Code’s built-in solution. A flat Markdown file the agent reads automatically at the start of every session and can write back to during a session. Think of it as a scratchpad that survives between conversations: conventions, architectural decisions, session summaries, bits of working context you don’t want to retype.

The limitation is scale. MEMORY.md is injected into the system prompt — same token cost and context pressure as CLAUDE.md. Because it’s a flat file with no retrieval layer, there’s no good way to pull one useful detail out of weeks of accumulated notes. It works well for a while, then gradually turns into a wall of text the model skims past.

Layer 2: episodic-memory

The plugin episodic-memory indexes those JSONL conversation files, embeds them as vectors, and stores everything locally in SQLite. The agent gets MCP tools to semantically search previous sessions.

This solves the scaling problem. Instead of cramming everything into one file, the agent can query a growing collection of past conversations and pull back just the relevant parts.

Documentation usually captures what you decided. Session history often captures why: the options you considered, the dead ends you ruled out, the trade-offs that led to the final call. So instead of manually digging through old chats, the agent can find the actual discussion.

When to use it: Not every retrieval problem needs vectors. For plenty of projects, grepping Markdown is faster, simpler, and easier to trust. Semantic search starts to help when the collection gets big, the wording gets inconsistent, or the thing you’re trying to recover is an idea rather than an exact phrase. Start with grep; reach for vectors when grep stops being enough.

Layer 3: QMD

Plenty of useful context never lived in your AI coding session to begin with: meeting notes, design docs, specs, technical writeups.

QMD, built by Shopify CEO Tobi Lutke, is an on-device search engine for your broader knowledge base. It exposes an MCP server so the agent can query those materials during a session. Where episodic-memory searches your conversations, QMD searches your documents.

The Rule

Start with grep. Reach for vectors when grep stops being enough.

The three layers form a progressive complexity curve. Most projects start at Layer 1. Layer 2 becomes worthwhile when sessions accumulate and you need to recover decisions. Layer 3 becomes worthwhile when the relevant context lives outside your coding sessions entirely.

Connections

agentic-engineering — Layer 2 (Capability) of Hoekstra’s framework
model-context-protocol — episodic-memory and QMD both expose MCP servers
paul-hoekstra — source of the three-layer model

Sources

Agentic Engineering, part 2: What the Agent Doesn’t Know — Paul Hoekstra — added 2026-04-12

second-brain

Explorer

agent-memory

Agent Memory

Overview

Layer 1: MEMORY.md

Layer 2: episodic-memory

Layer 3: QMD

The Rule

Connections

Sources

Graph View

Table of Contents

Backlinks

Chat