Paul Hoekstra

Data engineer and builder; writes Paul’s Pipeline on AI, data, and side projects.

Last updated: 2026-04-12

Overview

Paul Hoekstra writes Paul’s Pipeline on Substack, covering practical AI agent engineering from a data engineering perspective. His writing is grounded in real production experience — “written by a data engineer who spends too much time tinkering and then writes about it so you don’t have to make the same mistakes.”

His Agentic Engineering series (4 parts) is a practitioner’s framework for getting reliable results from coding agents across four layers: Configuration, Capability, Orchestration, and Guardrails.

Key Ideas

  • The configuration gap: the difference between mediocre and amazing agent results is mostly about configuration, not the model
  • CLAUDE.md as a cost center: every token in CLAUDE.md is paid on every API call; bloat actively hurts performance
  • Skills beat model size: Haiku + human-curated skills (27.7%) beats Opus without skills (22.0%) on SkillsBench
  • HARD-GATE directives: XML-tagged checkpoints that exploit the model’s training to enforce process compliance
  • Anti-rationalization tables: pre-emptive lists of model excuses paired with corrections — short-circuits the model’s tendency to justify skipping steps

Connections

Sources