S3-First Architecture

Using S3 as the durable source of truth for user data, with a relational database as a query-optimized read layer.

Last updated: 2026-04-12

Overview

The pattern: all writes go to S3. A Lambda function syncs changes to PostgreSQL. List queries hit the database (fast). Single-item reads go to S3 (freshest data).

Writes → S3 (source of truth)
         ↓
         Lambda trigger
         ↓
         PostgreSQL (fs_files table)
         ↓
Reads  ← Fast queries

This inverts the typical assumption that a database is always the right primary store for user data.

Key Points

  • S3 durability: 11 nines. A typical Postgres instance doesn’t come close
  • Versioning for free: S3 versioning gives an automatic audit trail — every write is tracked without any application-layer logging
  • Human-readable debugging: YAML and markdown files can be inspected with cat. No DB client, no query needed
  • Cost: S3 storage is significantly cheaper than database storage at scale
  • Sync architecture: two Lambda functions keep S3 and PostgreSQL in sync
    • fs-sync: triggered by S3 upload/delete events via SNS → real-time upsert/delete in fs_files
    • fs-reconcile: EventBridge every 3 hours → full S3 vs DB scan, fixes any discrepancies from cold starts or network blips
    • Both use upsert with timestamp guards so newer data always wins
  • User memories: each user has /private/memories/UserMemories.md in S3. Plain markdown, editable in the UI. Loaded and injected as context on every conversation — no schema migrations needed
  • Skills and watchlists follow the same pattern: YAML files in S3, queried via PostgreSQL

Connections

  • agent-sandbox — the sandbox mount system exposes S3 prefixes as filesystem paths to the agent
  • claude-code-skills — skills are stored in S3, discovered via SQL query against the synced fs_files table

Sources