highlighter

Glow big or go home.

Status: experiment. Interfaces, scope, and behavior will change without notice.

A researcher reading a dense academic paper — or a student working through a textbook — reaches for a neon ink pen and marks the passages that answer the question they're holding in their head. They don't paraphrase on the fly; they highlight the verbatim text and come back to synthesize once the full picture is on the page.

highlighter is the digital counterpart. It reads a document with a query in mind, pulls out verbatim excerpts as it goes, and only then synthesizes an answer grounded in those highlights. Every claim in the final answer points back to a quoted span in the source — no paraphrasing, no invention.

We're trading latency for higher accuracy, for now. The sweet spot is deep questions over single documents where the answer has to be assembled from many passages rather than retrieved as one. Markdown input only.

Usage

Ask a question, get cited excerpts:

uv run python -m highlighter <markdown-file> -q "your question"

Add --synthesize to also get a short grounded answer that cites the excerpts by number.

Arg / flag	Required	Default	Notes
`<markdown-file>`	yes	—	Path to a markdown document.
`-q`, `--question`	yes	—	The question to ask.
`--chunk-size`	no	`2000`	Tokens per chunk.
`--chunk-overlap`	no	`200`	Token overlap between consecutive chunks.
`--synthesize`	no	off	Run the final LLM synthesis step.

Requires an API key for the configured provider (default: anthropic:claude-haiku-4-5-20251001, so ANTHROPIC_API_KEY must be set).

Evals

Two tracks measure extraction quality. Both report precision / recall / F1 via substring match against fixture-provided expected_excerpts, and both support a baseline + regression gate.

Chunk-level — pins one chunk per case and measures the extractor in isolation, bypassing query expansion and chunk selection.
Pipeline — runs the full pipeline end-to-end against a whole document, so generated sub-questions and rubric are part of the score.

# Chunk-level
uv run python -m evals                    # all cases, one run each
uv run python -m evals --case <name>      # single case by name
uv run python -m evals --runs 3           # repeat each N times
uv run python -m evals --debug            # also print prompt + raw LLM output
uv run python -m evals --write-baseline   # record current mean scores
uv run python -m evals --check-baseline   # gate: exit non-zero on regression

# Pipeline
uv run python -m evals.pipeline                 # all cases, one run each
uv run python -m evals.pipeline --case <name>
uv run python -m evals.pipeline --runs 3
uv run python -m evals.pipeline --debug         # also print generated sub-Qs + rubric
uv run python -m evals.pipeline --write-baseline
uv run python -m evals.pipeline --check-baseline

Baselines live in evals/baseline.json (chunk-level, tolerance 0.02) and evals/baseline-pipeline.json (pipeline, tolerance 0.05 — wider because generation variance stacks on top of extraction).

Testing

uv run pytest                             # full suite
uv run pytest tests/test_consolidate.py   # single file
uv run pytest -k synthesize               # filter by keyword

LLM-driven tests stub the pydantic-ai agent with a FunctionModel via agent.override(...), so the suite runs offline — no API key required. Internal logic (verification, citation mapping, consolidation) is never stubbed.

Development

uv sync
uv run ruff check .

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
evals		evals
highlighter		highlighter
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

highlighter

Usage

Evals

Testing

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

highlighter

Usage

Evals

Testing

Development

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages