Agent Memory

Three-layer memory model for coding agents: active context (MEMORY.md), session history (episodic-memory), and broader knowledge base (QMD).

Last updated: 2026-04-12

Overview

Every AI coding session starts fresh. The architectural decisions you explained yesterday, the rationale for that odd module pattern, the debugging context you built up: gone. You start every morning re-explaining your own project from scratch.

Memory across sessions doesn’t happen by default because LLMs are stateless — they only know what’s in the current context window. Claude Code stores each session as a JSONL file on disk, but the next session doesn’t read it.

Three tools address this at different layers, each suited to a different kind of knowledge:

LayerToolWhat it coversScales?
Active contextMEMORY.mdConventions, decisions, current stateNo — flat file, token pressure
Session historyepisodic-memoryPast conversations, reasoning tracesYes — vector search
Broader knowledgeQMDDocs, specs, meeting notesYes — on-device full-text

Layer 1: MEMORY.md

Claude Code’s built-in solution. A flat Markdown file the agent reads automatically at the start of every session and can write back to during a session. Think of it as a scratchpad that survives between conversations: conventions, architectural decisions, session summaries, bits of working context you don’t want to retype.

The limitation is scale. MEMORY.md is injected into the system prompt — same token cost and context pressure as CLAUDE.md. Because it’s a flat file with no retrieval layer, there’s no good way to pull one useful detail out of weeks of accumulated notes. It works well for a while, then gradually turns into a wall of text the model skims past.

Layer 2: episodic-memory

The plugin episodic-memory indexes those JSONL conversation files, embeds them as vectors, and stores everything locally in SQLite. The agent gets MCP tools to semantically search previous sessions.

This solves the scaling problem. Instead of cramming everything into one file, the agent can query a growing collection of past conversations and pull back just the relevant parts.

Documentation usually captures what you decided. Session history often captures why: the options you considered, the dead ends you ruled out, the trade-offs that led to the final call. So instead of manually digging through old chats, the agent can find the actual discussion.

When to use it: Not every retrieval problem needs vectors. For plenty of projects, grepping Markdown is faster, simpler, and easier to trust. Semantic search starts to help when the collection gets big, the wording gets inconsistent, or the thing you’re trying to recover is an idea rather than an exact phrase. Start with grep; reach for vectors when grep stops being enough.

Layer 3: QMD

Plenty of useful context never lived in your AI coding session to begin with: meeting notes, design docs, specs, technical writeups.

QMD, built by Shopify CEO Tobi Lutke, is an on-device search engine for your broader knowledge base. It exposes an MCP server so the agent can query those materials during a session. Where episodic-memory searches your conversations, QMD searches your documents.

The Rule

Start with grep. Reach for vectors when grep stops being enough.

The three layers form a progressive complexity curve. Most projects start at Layer 1. Layer 2 becomes worthwhile when sessions accumulate and you need to recover decisions. Layer 3 becomes worthwhile when the relevant context lives outside your coding sessions entirely.

Connections

Sources