Paul Hoekstra
Data engineer and builder; writes Paul’s Pipeline on AI, data, and side projects.
Last updated: 2026-04-12
Overview
Paul Hoekstra writes Paul’s Pipeline on Substack, covering practical AI agent engineering from a data engineering perspective. His writing is grounded in real production experience — “written by a data engineer who spends too much time tinkering and then writes about it so you don’t have to make the same mistakes.”
His Agentic Engineering series (4 parts) is a practitioner’s framework for getting reliable results from coding agents across four layers: Configuration, Capability, Orchestration, and Guardrails.
Key Ideas
- The configuration gap: the difference between mediocre and amazing agent results is mostly about configuration, not the model
- CLAUDE.md as a cost center: every token in CLAUDE.md is paid on every API call; bloat actively hurts performance
- Skills beat model size: Haiku + human-curated skills (27.7%) beats Opus without skills (22.0%) on SkillsBench
- HARD-GATE directives: XML-tagged checkpoints that exploit the model’s training to enforce process compliance
- Anti-rationalization tables: pre-emptive lists of model excuses paired with corrections — short-circuits the model’s tendency to justify skipping steps
Connections
- agentic-engineering — his four-layer framework
- claude-code-skills — practical extensions: HARD-GATE, anti-rationalization, SkillsBench evidence
- thin-harness-fat-skills — complementary framing; Hoekstra adds the token cost argument for keeping CLAUDE.md lean
Sources
- Agentic Engineering, part 1: The Configuration Layer — added 2026-04-12
- Agentic Engineering, part 2: What the Agent Doesn’t Know — added 2026-04-12
- Agentic Engineering, part 3: The Orchestration Layer — added 2026-04-12
- Agentic Engineering, part 4: Keeping Agents on a Leash — added 2026-04-12