Coding Agent

An LLM wrapped in a software harness that manages repo context, tools, memory, and session state to perform software engineering tasks autonomously.

Last updated: 2026-04-12

Overview

A coding agent is not just a model — it is a model plus a surrounding system called a coding harness (or agent harness). The harness handles all the plumbing: assembling prompts, exposing tools, managing file state, applying edits, running commands, compacting context, and storing session memory.

This distinction matters because the harness often explains why one system feels more capable than another, even when the underlying models are similar in benchmark quality. Much of the apparent “model quality” in tools like Claude Code or Codex is really context quality — how well the harness packages information for the model.

LLM vs. Reasoning Model vs. Agent

Term	What it is
LLM	Core next-token model
Reasoning model	LLM trained/prompted to spend more inference-time compute on intermediate reasoning
Agent	A control loop around a model — decides what to inspect, which tools to call, when to stop
Agent harness	Software scaffold managing context, tools, prompts, and state
Coding harness	Task-specific harness for software engineering (repo nav, tool use, execution, feedback)

Six Core Components

1. Live Repo Context

Before any work begins, the harness collects stable workspace facts: repo root, branch, git status, project docs (README, AGENTS.md, CLAUDE.md). This workspace summary is passed with every prompt so the model isn’t starting from zero context on each turn.

2. Prompt Shape and Cache Reuse

Coding sessions are repetitive. Rules, tool descriptions, and the workspace summary rarely change. A smart harness splits the prompt into:

Stable prefix — instructions + tool descriptions + workspace summary (cached, rarely rebuilt)
Dynamic suffix — short-term memory + recent transcript + newest user request (updated each turn)

Prompt caching on the stable prefix avoids re-processing the same tokens on every model call.

3. Tool Access and Use

The harness exposes a pre-defined list of named tools (read file, list files, search, run shell command, write file, etc.) with structured inputs. When the model emits a tool call, the harness:

Validates the tool name and arguments
Checks permissions (does the path stay inside the workspace?)
Optionally gates on user approval
Executes and feeds the bounded result back into the loop

This is narrower than arbitrary shell access but more reliable — the model can’t accidentally execute malformed commands.

4. Context Reduction and Output Management

Coding agents accumulate context fast: repeated file reads, lengthy tool outputs, long transcripts. Without compaction, the context window fills quickly. A good harness applies:

Clipping: truncates verbose tool outputs and document snippets so no single item dominates the budget
Transcript summarization: compresses older turns; keeps recent events richer (more relevant)
Deduplication: collapses repeated reads of the same file into one copy

Most of what feels like “model quality” is actually context quality — how well the harness curates what the model sees.

5. Structured Session Memory

The harness maintains two separate state layers:

Layer	Purpose	Structure
Full transcript	Durable record of everything — user requests, tool outputs, LLM responses	Append-only JSONL
Working memory	Small, explicitly maintained summary of current task, key files, important notes	Distilled, gets rewritten

The compact transcript (from component 4) is for prompt reconstruction — give the model a compressed recent history. Working memory is for task continuity — track what matters across turns explicitly.

6. Delegation with Bounded Subagents

The main agent can spawn subagents to parallelize subtasks (e.g., “which file defines this symbol?”, “why is this test failing?”). Design tension: subagents need enough context to do real work, but must be constrained to avoid duplicate work, recursive spawning, or file conflicts.

Typical constraints: read-only mode, restricted recursion depth, scoped file access. Claude Code has supported subagents for a long time; Codex added them more recently.

The Unix Agent Architecture

Andreessen’s framing: an agent is LLM + bash shell + file system + markdown + cron job. Every component except the LLM was already known and battle-tested for decades.

The key implication — the agent is just its files:

Model-agnostic: swap out the LLM, keep all state. The agent’s memories and capabilities persist across model changes.
Self-extending: agent has full introspection of its own files and can rewrite them. Tell it to add a new capability and it will — find the API, write the code, wire it up.
Migratable: tell your agent to move to a different runtime; it will orchestrate the migration.

The shell unlocks enormous latent capability: every Unix command, every CLI tool, the full power of the computer is already available without any new protocols. Andreessen argues MCP and “fancy protocols” are unnecessary — CLIs already exist for everything. The Unix shell is the natural agent interface because it’s already the interface to everything.

This parallels the Unix mindset vs IBM’s monolithic OS/360: modular, composable, hackable wins over the castle-in-the-sky approach.

Macro Actions and Token Throughput

The paradigm shift Karpathy marks from December 2024: coding moves from micro actions (write this function, fix this bug) to macro actions — delegate an entire new functionality to an agent, let it run to completion, review the result.

The corollary: token throughput is the new GPU utilization. Just as a researcher with idle GPUs felt waste, an agent user with unused subscription tokens is leaving capability on the table. The human becomes the bottleneck — not in a disempowering way, but because it means improvement is always available.

In practice: multiple agents run in parallel on non-overlapping features, with the human moving between them — giving work, reviewing results, assigning more. “You can move in much larger macro actions over your repository.”

The Claw: Beyond Interactive Sessions

A claw (Karpathy’s term) is a persistent autonomous agent loop that runs on your behalf even when you’re not actively prompting it. Distinct from an interactive coding session:

Has its own sandbox and environment
Runs loops independently (keeps going without you)
Has more sophisticated memory than default context compaction
Interacts through a persistent channel (WhatsApp, etc.) rather than a terminal session

The claw is what makes home automation (Dobby example: LAN scan → Sonos API discovery → lights/HVAC/spa control via WhatsApp) and auto research viable — tasks that run for minutes or hours without human checkpoints.

Key Insights

A good coding harness can make a weaker model feel much stronger than a better model in a plain chat UI, because it manages context, tools, and state on the model’s behalf.

When agents don’t work, it feels like skill issue — bad instructions, missing memory tool, wrong scope. This is empowering: improvement is always available.

Connections

thin-harness-fat-skills — complementary framing: same architecture described as “thin harness, fat skills” with skill files, resolvers, latent vs. deterministic
latent-vs-deterministic — draws the judgment/trust line through each of the six components; three-column ownership model (harness/skills/model)
llm-wiki-pattern — this wiki itself is a kind of agent harness for knowledge management
sebastian-raschka — author of source article; also wrote Build a Large Language Model (From Scratch)
garry-tan — YC president; wrote the thin-harness-fat-skills companion piece
andrej-karpathy — macro actions, token throughput as bottleneck, claw framing
marc-andreessen — Unix agent architecture: LLM + shell + files + cron; agent = its files
auto-research — claws are the execution environment for autonomous research loops

Sources

Components of A Coding Agent — Sebastian Raschka — added 2026-04-12
Local clip: Components of A Coding Agent
Skill Issue: Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI — added 2026-04-13
Marc Andreessen introspects on Death of the Browser, Pi, OpenClaw, and Why “This Time Is Different” — added 2026-04-13

second-brain

Explorer

coding-agent