Context Normalization

The process of converting heterogeneous source data into a uniform, agent-readable format.

Last updated: 2026-04-12

Overview

An agent is only as good as the context it can reason over. Raw data sources — filings, transcripts, PDFs, databases, news articles — each arrive in different formats, schemas, and quality levels. The normalization layer is the infrastructure that converts all of it into something the model can actually use.

At Fintool, all financial data flows into one of three canonical formats:

Markdown — for narrative content (SEC filings, earnings transcripts, news articles)
CSV / markdown tables — for structured numerical data (financials, segment metrics, comparisons)
JSON metadata — for searchability (ticker, date, document type, fiscal period)

LLMs reason well over markdown tables but struggle with raw HTML <table> tags or unformatted CSV dumps. Normalization converts everything to the format the model actually handles well.

Key Points

Chunking strategy matters: different documents chunk differently
- 10-K filings → by regulatory section (Item 1, 1A, 7, 8…)
- Earnings transcripts → by speaker turn (CEO remarks, CFO, individual analyst Q&A)
- Press releases → usually one chunk
- News → paragraph-level
Metadata enables retrieval: every document gets a meta.json with ticker, date, document type, and fiscal period. Without it, retrieval degrades to keyword search over a haystack
Fiscal period normalization is critical: “Q1 2024” is ambiguous — Apple’s Q1 is Oct–Dec 2023, Microsoft’s is Jul–Sep 2023, calendar Q1 is Jan–Mar 2024. All period references must normalize to absolute date ranges
Table extraction is hard: financial tables have merged header cells, footnote markers, parentheses for negatives, mixed units. Fintool scores every extracted table; below 90% confidence, it’s flagged and excluded from agent context
SEC filings are adversarial: designed for legal compliance, not machine reading. Multi-page tables with repeated headers, nested footnotes, XBRL tags that are often wrong

Connections

agent-sandbox — clean context is injected into the agent’s sandbox environment
agent-evaluation — evaluation catches normalization failures before they reach users
data-warehouse — similar principle: raw data must be cleaned and structured before it’s useful for analysis

Sources

Lessons from Building AI Agents for Financial Services — added 2026-04-12

second-brain

Explorer

context-normalization

Context Normalization

Overview

Key Points

Connections

Sources

Graph View

Table of Contents

Backlinks

Chat