Thin Harness, Fat Skills

An agent architecture principle: keep the harness minimal (~200 lines), and encode domain intelligence in reusable markdown skill files.

Last updated: 2026-04-24

Overview

The productivity gap between 2x and 100x AI users isn’t the model — people at both levels use the same models. The difference is architecture. Garry Tan (YC) calls the pattern thin harness, fat skills: push all intelligence upward into portable skill files; keep the harness just a thin loop around the model.

The anti-pattern is a fat harness with thin skills: 40+ tool definitions consuming half the context window, MCP round-trips adding seconds per step, REST wrappers turning every API endpoint into a separate tool. Three times the tokens, three times the latency, three times the failure rate.

Five Definitions

1. Skill Files

A skill file is a reusable markdown document that teaches the model how to do something — the process, not the goal. The user supplies the goal; the skill supplies the steps.

Key insight: a skill file works like a method call. It takes parameters. Same procedure, different arguments → radically different capability.

Example: /investigate with steps (scope, timeline, diarize, synthesize, argue both sides, cite sources) and parameters TARGET, QUESTION, DATASET. Point it at discovery emails → medical research analyst. Point it at FEC filings → forensic investigator. Same skill, same seven steps, same markdown file.

This is not prompt engineering. This is software design, using markdown as the programming language and human judgment as the runtime.

Skill files are permanent upgrades: they don’t degrade, don’t forget, run at 3 AM, and automatically improve when the underlying model improves.

2. The Harness

The harness does exactly four things:

  1. Runs the model in a loop
  2. Reads and writes files
  3. Manages context
  4. Enforces safety

That’s the “thin.” ~200 lines of code. JSON in, text out. Read-only by default. Purpose-built tooling, fast and narrow — e.g. a Playwright CLI doing each browser operation in 100ms vs. a Chrome MCP taking 15 seconds for screenshot-find-click-wait-read (75x slower).

See coding-agent for the six components of a coding harness (Raschka’s complementary framing).

3. Resolvers

A resolver is a routing table for context: when task type X appears, load document Y first.

Skills tell the model how. Resolvers tell it what to load and when.

Example: a developer changes a prompt. Without resolver, they ship it. With resolver, the model reads docs/EVALS.md first — which says run the eval suite, compare scores, if accuracy drops > 2% revert. The developer didn’t know the eval suite existed. The resolver surfaced it at the right moment.

Claude Code has a built-in resolver: every skill has a description field, and the model matches user intent to skill descriptions automatically.

Practical implication: a CLAUDE.md can be ~200 lines of pointers to documents rather than 20,000 lines of content. The resolver loads the right one on demand, without polluting the context window.

4. Latent vs. Deterministic

Every step in a system is one or the other. Confusing them is the most common mistake in agent design.

Latent spaceDeterministic
What lives hereIntelligence, judgment, synthesis, pattern recognitionTrust — same input, same output, every time
ExamplesSeat 8 people accounting for personalities; classify a founder’s real sector from a transcriptSQL queries, compiled code, arithmetic, seat 800 people (combinatorial optimization)
Failure modeHallucination when forced to do deterministic workBrittleness, no flexibility

The best systems are ruthless about which side of this line each step belongs on. Push intelligence into latent space (skill files). Push execution into deterministic tooling (your application layer).

5. Diarization

Diarization is the step that makes AI useful for real knowledge work. The model reads everything about a subject and writes a single structured profile — a page of judgment distilled from dozens of documents.

No SQL query, no RAG pipeline produces this. The model has to read, hold contradictions in mind, notice what changed and when, and synthesize structured intelligence. It’s the difference between a database lookup and an analyst’s brief.

Example output:

FOUNDER: Maria Santos
COMPANY: Contrail
SAYS: "Datadog for AI agents"
ACTUALLY BUILDING: 80% of commits are billing module. Building a FinOps tool disguised as observability.

The gap between “says” and “actually building” requires reading GitHub commit history, the application, and an advisor transcript simultaneously. No embedding search finds this.

Three-Layer Architecture

┌─────────────────────────────────────┐
│  Fat skills (markdown procedures)  │  ← 90% of value lives here
├─────────────────────────────────────┤
│  Thin CLI harness (~200 lines)     │  ← loop, file I/O, context, safety
├─────────────────────────────────────┤
│  Your application                  │  ← QueryDB, ReadDoc, Search (deterministic)
└─────────────────────────────────────┘

Principle: push intelligence up into skills, push execution down into deterministic tooling, keep the harness thin.

When the next model drops, every skill automatically improves. The deterministic layer stays perfectly reliable.

The Self-Improving Loop

Skills can rewrite themselves:

  1. After an event, /improve skill reads feedback (specifically the “OK” responses — almost worked but didn’t)
  2. Diarizes patterns from mediocre outcomes
  3. Proposes new rules
  4. Writes rules back into the skill file

Example: 12% “OK” ratings → extracted patterns → rules written back → next event: 4% “OK”. The skill file learned what “OK” meant without anyone rewriting code.

The meta-rule:

You are not allowed to do one-off work. If I ask you to do something that will happen again: do it manually on 3–10 items, show me the output, if approved codify it into a skill file, if it should run automatically put it on a cron. If I have to ask you twice, you failed.

Skills > MCP in Practice

An empirical observation from Alex Krantz (UC Berkeley, after a month running OpenClaw in production): MCP server adoption has faded in personal agent setups because agents are now sufficiently adept at CLI tools directly. Skills are easier to write, require no wrapper infrastructure, and have proven more reliable and more effective than MCP round-trips in practice.

This validates the theoretical argument against fat harnesses with many tool integrations: the skill file approach wins not just in theory but in observed production behavior.

The mechanism: skills encode how to use the tool (CLI commands, flags, error patterns) in markdown. The agent reads the skill body on demand, then runs the CLI. No MCP server, no protocol overhead, no extra process.

See alex-krantz for the source observation.

Connections

  • claude-code-skills — concrete implementation of skill files in Claude Code: 9 types, writing tips, distribution patterns
  • coding-agent — complementary framing of the same harness architecture (Raschka’s 6-component view)
  • latent-vs-deterministic — extends the latent-vs-deterministic section with the artifact-dependency rule and three-column ownership across harness/skills/model
  • llm-wiki-pattern — this wiki applies the thin-harness-fat-skills pattern: CLAUDE.md is a resolver, each wiki operation is a skill
  • openclaw — consumer-facing agent platform that embodies this pattern; 3-level skill fidelity (header/body/linked files) is OpenClaw’s specific implementation
  • garry-tan — author; YC president
  • alex-krantz — empirical validation: skills beat MCP in production; provided the 3-level fidelity breakdown

Sources