Coding Agent
An LLM wrapped in a software harness that manages repo context, tools, memory, and session state to perform software engineering tasks autonomously.
Last updated: 2026-04-12
Overview
A coding agent is not just a model — it is a model plus a surrounding system called a coding harness (or agent harness). The harness handles all the plumbing: assembling prompts, exposing tools, managing file state, applying edits, running commands, compacting context, and storing session memory.
This distinction matters because the harness often explains why one system feels more capable than another, even when the underlying models are similar in benchmark quality. Much of the apparent “model quality” in tools like Claude Code or Codex is really context quality — how well the harness packages information for the model.
LLM vs. Reasoning Model vs. Agent
| Term | What it is |
|---|---|
| LLM | Core next-token model |
| Reasoning model | LLM trained/prompted to spend more inference-time compute on intermediate reasoning |
| Agent | A control loop around a model — decides what to inspect, which tools to call, when to stop |
| Agent harness | Software scaffold managing context, tools, prompts, and state |
| Coding harness | Task-specific harness for software engineering (repo nav, tool use, execution, feedback) |
Six Core Components
1. Live Repo Context
Before any work begins, the harness collects stable workspace facts: repo root, branch, git status, project docs (README, AGENTS.md, CLAUDE.md). This workspace summary is passed with every prompt so the model isn’t starting from zero context on each turn.
2. Prompt Shape and Cache Reuse
Coding sessions are repetitive. Rules, tool descriptions, and the workspace summary rarely change. A smart harness splits the prompt into:
- Stable prefix — instructions + tool descriptions + workspace summary (cached, rarely rebuilt)
- Dynamic suffix — short-term memory + recent transcript + newest user request (updated each turn)
Prompt caching on the stable prefix avoids re-processing the same tokens on every model call.
3. Tool Access and Use
The harness exposes a pre-defined list of named tools (read file, list files, search, run shell command, write file, etc.) with structured inputs. When the model emits a tool call, the harness:
- Validates the tool name and arguments
- Checks permissions (does the path stay inside the workspace?)
- Optionally gates on user approval
- Executes and feeds the bounded result back into the loop
This is narrower than arbitrary shell access but more reliable — the model can’t accidentally execute malformed commands.
4. Context Reduction and Output Management
Coding agents accumulate context fast: repeated file reads, lengthy tool outputs, long transcripts. Without compaction, the context window fills quickly. A good harness applies:
- Clipping: truncates verbose tool outputs and document snippets so no single item dominates the budget
- Transcript summarization: compresses older turns; keeps recent events richer (more relevant)
- Deduplication: collapses repeated reads of the same file into one copy
Most of what feels like “model quality” is actually context quality — how well the harness curates what the model sees.
5. Structured Session Memory
The harness maintains two separate state layers:
| Layer | Purpose | Structure |
|---|---|---|
| Full transcript | Durable record of everything — user requests, tool outputs, LLM responses | Append-only JSONL |
| Working memory | Small, explicitly maintained summary of current task, key files, important notes | Distilled, gets rewritten |
The compact transcript (from component 4) is for prompt reconstruction — give the model a compressed recent history. Working memory is for task continuity — track what matters across turns explicitly.
6. Delegation with Bounded Subagents
The main agent can spawn subagents to parallelize subtasks (e.g., “which file defines this symbol?”, “why is this test failing?”). Design tension: subagents need enough context to do real work, but must be constrained to avoid duplicate work, recursive spawning, or file conflicts.
Typical constraints: read-only mode, restricted recursion depth, scoped file access. Claude Code has supported subagents for a long time; Codex added them more recently.
The Unix Agent Architecture
Andreessen’s framing: an agent is LLM + bash shell + file system + markdown + cron job. Every component except the LLM was already known and battle-tested for decades.
The key implication — the agent is just its files:
- Model-agnostic: swap out the LLM, keep all state. The agent’s memories and capabilities persist across model changes.
- Self-extending: agent has full introspection of its own files and can rewrite them. Tell it to add a new capability and it will — find the API, write the code, wire it up.
- Migratable: tell your agent to move to a different runtime; it will orchestrate the migration.
The shell unlocks enormous latent capability: every Unix command, every CLI tool, the full power of the computer is already available without any new protocols. Andreessen argues MCP and “fancy protocols” are unnecessary — CLIs already exist for everything. The Unix shell is the natural agent interface because it’s already the interface to everything.
This parallels the Unix mindset vs IBM’s monolithic OS/360: modular, composable, hackable wins over the castle-in-the-sky approach.
Macro Actions and Token Throughput
The paradigm shift Karpathy marks from December 2024: coding moves from micro actions (write this function, fix this bug) to macro actions — delegate an entire new functionality to an agent, let it run to completion, review the result.
The corollary: token throughput is the new GPU utilization. Just as a researcher with idle GPUs felt waste, an agent user with unused subscription tokens is leaving capability on the table. The human becomes the bottleneck — not in a disempowering way, but because it means improvement is always available.
In practice: multiple agents run in parallel on non-overlapping features, with the human moving between them — giving work, reviewing results, assigning more. “You can move in much larger macro actions over your repository.”
The Claw: Beyond Interactive Sessions
A claw (Karpathy’s term) is a persistent autonomous agent loop that runs on your behalf even when you’re not actively prompting it. Distinct from an interactive coding session:
- Has its own sandbox and environment
- Runs loops independently (keeps going without you)
- Has more sophisticated memory than default context compaction
- Interacts through a persistent channel (WhatsApp, etc.) rather than a terminal session
The claw is what makes home automation (Dobby example: LAN scan → Sonos API discovery → lights/HVAC/spa control via WhatsApp) and auto research viable — tasks that run for minutes or hours without human checkpoints.
Key Insights
A good coding harness can make a weaker model feel much stronger than a better model in a plain chat UI, because it manages context, tools, and state on the model’s behalf.
When agents don’t work, it feels like skill issue — bad instructions, missing memory tool, wrong scope. This is empowering: improvement is always available.
Connections
- thin-harness-fat-skills — complementary framing: same architecture described as “thin harness, fat skills” with skill files, resolvers, latent vs. deterministic
- latent-vs-deterministic — draws the judgment/trust line through each of the six components; three-column ownership model (harness/skills/model)
- llm-wiki-pattern — this wiki itself is a kind of agent harness for knowledge management
- sebastian-raschka — author of source article; also wrote Build a Large Language Model (From Scratch)
- garry-tan — YC president; wrote the thin-harness-fat-skills companion piece
- andrej-karpathy — macro actions, token throughput as bottleneck, claw framing
- marc-andreessen — Unix agent architecture: LLM + shell + files + cron; agent = its files
- auto-research — claws are the execution environment for autonomous research loops
Sources
- Components of A Coding Agent — Sebastian Raschka — added 2026-04-12
- Local clip: Components of A Coding Agent
- Skill Issue: Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI — added 2026-04-13
- Marc Andreessen introspects on Death of the Browser, Pi, OpenClaw, and Why “This Time Is Different” — added 2026-04-13