v0.2.0 · MIT License · Python 3.10+
AI-powered academic paper reader with structured summarization, persistent paper library, and RAG-based Q&A.
Author: Donglin Bai & Claude Code · Email: baidonglin295332@gmail.com · WeChat: bdl332
- What it does
- Installation
- Quick start
- CLI commands
- Chinese language support
- Configuration
- Optional components
- Troubleshooting
- Development
- High-level architecture
- Project structure
- Ingests papers from local PDF paths, arXiv URLs, or arXiv IDs (including legacy IDs)
- Parses PDFs with
marker,mineru, orpymupdf - Optionally enriches metadata via GROBID
- Structures the paper into a normalized representation
- Generates high-level overviews with LangGraph-driven summarization
- Supports interactive
chatand tool-basedagentQ&A - Exports to Markdown, JSON, or BibTeX
- Stores papers in a local library and supports comparison/export workflows
- Uses profile-based LLM configuration via
llm_config.yaml(LiteLLM-compatible) - Supports Chinese-language papers and Chinese output (简体中文)
- Pipeline hooks allow injecting custom logic before/after any processing step
- Python 3.10+
pip- Optional: Docker (for local GROBID)
cd paper_reader
pip install -e .Alternative:
pip install -r requirements.txtcp .env.example .env.env controls parser/GROBID/storage behavior (not model profiles).
llm_config.yaml in the repo root is ready to use and can be customized with your own profiles.
# Parse only (no LLM summarization)
paper-reader parse 2308.13418
# Parse + summarize
paper-reader read 2308.13418
# Start interactive RAG Q&A
paper-reader chat 2308.13418
# Start tool-based agent Q&A with optional long-session memory
paper-reader agent 2308.13418
# Launch Gradio UI
paper-reader serve# Extract parsed sections without LLM summarization
paper-reader parse 2308.13418 --output parsed.md
# Save markdown summary to file
paper-reader read 2308.13418 --output summary.md
# Force re-processing from scratch
paper-reader read 2308.13418 --force
# Export as JSON or BibTeX
paper-reader export 2308.13418 --format json --output paper.json
paper-reader export 2308.13418 --format bibtex --output paper.bib
# Batch process multiple papers
paper-reader batch 2308.13418 2401.12345 --format markdown --output-dir summaries
# Quiet mode — only warnings and errors
paper-reader read 2308.13418 --quietRunning paper-reader read 2308.13418 --output summary.md produces:
summary.md (click to expand)
# Nougat: Neural Optical Understanding for Academic Documents
**Authors:** Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic
## One-Line Summary
Nougat is an end-to-end visual Transformer that converts scientific PDF pages directly into structured markup, preserving text, tables, and mathematical expressions without relying on external OCR.
## Motivation
Most scientific knowledge is stored as PDFs, a format that discards semantic structure and is particularly inadequate for mathematical expressions and tables. Existing PDF processing tools and OCR pipelines fail to reliably recover this structure, limiting machine accessibility, searchability, and reuse of scientific content. A robust document-to-markup solution is therefore critical for large-scale scientific knowledge extraction.
## Key Observations
PDFs retain visual layout but lose semantic meaning; mathematical expressions are especially poorly handled by classical OCR and PDF parsers. Prior pipelines that stitch together OCR, layout analysis, and formula recognition are brittle and error-prone. Recent Transformer-based visual document understanding models suggest that text recognition and structural understanding can be learned jointly from images alone.
## Core Idea
The paper reframes scientific document conversion as a visual document understanding problem rather than a traditional OCR task. Nougat uses an end-to-end encoder–decoder Transformer that takes only rasterized page images as input and directly generates a structured markup representation, implicitly learning text, layout, and math recognition in a single model.
## Methods
Nougat follows an encoder–decoder Transformer architecture inspired by Donut. Page images rendered at 96 DPI are resized and padded to 896×672 and encoded using a Swin Transformer (base) visual encoder with pre-trained weights. A large autoregressive Transformer decoder based on mBART generates markup tokens with cross-attention to visual embeddings, using a scientific-domain tokenizer and a maximum sequence length of 4096 tokens. The base model has ~350M parameters, with a smaller 250M-parameter variant. Training uses AdamW over 3 epochs with an effective batch size of 192 and a decaying learning rate. Extensive image augmentations (noise, blur, erosion, distortion, compression) and text-level token replacement are applied to improve robustness and prevent repetition collapse. Inference uses greedy decoding with heuristic repetition detection and early stopping.
## Main Results
Evaluation on an arXiv-based test set uses normalized edit distance, BLEU, METEOR, and precision/recall/F1 across plain text, math, tables, and overall content. Embedded PDF text achieves edit distance 0.255 and F1 79.2; GROBID performs worse overall (edit distance 0.312, F1 73.0), particularly on tables and math. A LaTeX-OCR baseline shows extremely poor math performance (BLEU 0.3, F1 9.7) despite strong aggregate scores when combined with other signals. Nougat small (250M) and base (350M) achieve the best overall performance, with edit distance around 0.07 and F1 ≈93 on all content, strong gains in math (F1 ≈77) and plain text (F1 ≈95.7). The smaller model matches the base model’s accuracy. Anti-repetition training and inference heuristics reduce failed page conversions on out-of-domain documents by 32%, with repetition occurring in about 1.5% of test pages.
## Strengths
End-to-end design eliminates dependence on OCR engines or embedded PDF text. Strong performance on mathematical expressions and tables, where prior systems struggle. Large-scale dataset creation from 1.7M arXiv articles enables robust training. Competitive accuracy even with a smaller 250M-parameter model. Released models and code support reproducibility and future research.
## Limitations
Inference is significantly slower than classical systems (e.g., ~19.5s per batch of 6 pages on an NVIDIA A10G versus GROBID’s ~10.6 pages/s). The model can still collapse into repetitive loops, especially out of domain. Training data is overwhelmingly English, with poor handling of non-Latin scripts. Page-wise independent processing causes cross-page inconsistencies in section numbering and bibliographies. Dataset ground truth contains artifacts from LaTeXML preprocessing and page alignment heuristics.
## Key Figures & Tables
- Qualitative example showing a dense mathematical PDF page converted to LaTeX and re-rendered accurately
- Table comparing Nougat (small/base) against PDF text, GROBID, and LaTeX-OCR across edit distance, BLEU, METEOR, and F1 by modality
## Related Work Context
Nougat builds on advances in OCR, mathematical expression recognition, and visual document understanding. It extends Transformer-based encoder–decoder approaches such as Donut by targeting scientific documents with dense math and structure. Unlike tools such as GROBID or pdf2htmlEX, Nougat directly recovers semantic representations of equations and tables. It complements LayoutLM-style models by focusing on full document-to-markup generation rather than token-level understanding.
## Future Work
Reducing repetition collapse remains the primary open challenge. Improving document-level consistency across pages, expanding multilingual and non-Latin script support, and accelerating inference are important directions. More robust and cleaner ground truth generation, as well as better evaluation metrics for mathematically equivalent expressions, are also highlighted as future research opportunities.
All commands support --verbose / -v for debug logging, --quiet / -q for warnings-only output, and --force / -F to re-process from scratch.
<source> in the commands below accepts any of:
| Format | Example |
|---|---|
| arXiv ID | 2308.13418 |
| arXiv URL | https://arxiv.org/abs/2308.13418 |
| Legacy arXiv ID | hep-th/9905111 |
| Local PDF path | ./papers/my_paper.pdf |
# Parse only (no LLM summarization)
paper-reader parse <source> \
[-b <profile>] [-p marker|mineru|pymupdf] \
[--vision] [--no-grobid] [--use-llm-parsing] \
[-l auto|en|zh] [--output-language auto|en|zh] \
[-o parsed.md] [-v] [-q]
# Parse + summarize
paper-reader read <source> \
[-b <profile>] [-p marker|mineru|pymupdf] \
[--vision] [--no-grobid] [--use-llm-parsing] \
[-l auto|en|zh] [--output-language auto|en|zh] \
[-o summary.md] [--force] [-v] [-q]
# Parse + summarize + interactive RAG Q&A
paper-reader chat <source> \
[-b <profile>] [-p marker|mineru|pymupdf] \
[--vision] [--no-grobid] [--use-llm-parsing] \
[-l auto|en|zh] [--output-language auto|en|zh] \
[--force] [-v] [-q]
# Parse + summarize + tool-based agent Q&A
paper-reader agent <source> \
[-b <profile>] [-p marker|mineru|pymupdf] \
[--vision] [--no-grobid] [--use-llm-parsing] \
[-l auto|en|zh] [--output-language auto|en|zh] \
[--no-memory] [--consolidation-threshold 20] \
[--force] [-v] [-q]paper-reader serve [--host 127.0.0.1] [--port 7860] [-b <profile>] [-v] [-q]The web UI provides three actions:
- Analyze Paper — runs the full pipeline (parse + summarize) in one step
- Parse Only — runs ingestion, parsing, metadata extraction, and structuring without LLM summarization
- Summarize — generates the LLM overview from an already-parsed paper
The Chat tab supports switching between the RAG Chat and Agent (tool-based) Q&A pipelines.
# Batch analysis and export
paper-reader batch <source1> <source2> ... \
[-d summaries] [-f markdown|json|bibtex] \
[-b <profile>] [-p marker|mineru|pymupdf] \
[-l auto|en|zh] [--output-language auto|en|zh] \
[--no-grobid] [--force] [-v] [-q]
# Export one paper directly
paper-reader export <source> \
[-o paper.md] [-f markdown|json|bibtex] \
[-b <profile>] [-p marker|mineru|pymupdf] \
[-l auto|en|zh] [--output-language auto|en|zh] \
[--no-grobid] [--force] [-v] [-q]
# Library management
paper-reader library list
paper-reader library search <query>
paper-reader library remove <paper_id>
# Compare papers already in the library (requires ≥ 2 IDs)
paper-reader compare <id1> <id2> [<id3> ...] \
[-o comparison.md] [-b <profile>] \
[--output-language auto|en|zh] [-v] [-q]Flag reference (short → long)
| Short | Long | Used by |
|---|---|---|
-b |
--backend |
all commands |
-p |
--parser |
parse, read, chat, agent, batch, export |
-o |
--output |
parse, read, export, compare, config migrate |
-f |
--format |
batch, export |
-d |
--output-dir |
batch |
-l |
--language |
parse, read, chat, agent, batch, export |
--output-language |
parse, read, chat, agent, batch, export, compare |
|
-F |
--force |
read, chat, agent, batch, export |
-v |
--verbose |
all commands |
-q |
--quiet |
all commands |
paper-reader config migrate [--output llm_config.yaml] [--force]Use this when migrating from older PAPER_READER_* LLM env-var setups.
Paper Reader supports Chinese-language papers and Chinese output out of the box.
# Auto-detect language and produce Chinese output
paper-reader read chinese_paper.pdf
# Force Chinese output even for an English paper
paper-reader read 2308.13418 --output-language zh
# Ask questions in Chinese
paper-reader chat chinese_paper.pdf --output-language zh| Flag | Commands | Description |
|---|---|---|
--language, -l |
parse, read, chat, agent, batch, export |
Paper language (auto/en/zh). Default: auto (detected from content). |
--output-language |
parse, read, chat, agent, batch, export, compare |
Output language (auto/en/zh). Default: auto (same as detected input). |
The Gradio web UI includes an Output Language dropdown in the sidebar.
- Language detection — Uses a CJK character ratio heuristic (with optional
langdetectfallback) to classify papers as Chinese or English. - Chinese-aware chunking — Smaller chunk sizes (600 chars) and Chinese sentence-boundary separators (。!?;,) for better retrieval.
- Chinese heading detection — Recognises Chinese section headings (第一章, 一、, 摘要, 引言, etc.) as a fallback when Markdown headings are absent.
- Localised prompts — All LLM prompts have Chinese variants in
paper_reader/prompts/zh.py. - Localised output — Overview headers, comparison headers, and CLI progress messages switch to Chinese when appropriate.
Three pre-configured profiles for Chinese content are included in default_llm_config.yaml:
deepseek:
chat_model: "deepseek/deepseek-chat"
embedding_model: "BAAI/bge-m3"
temperature: 0.3
qwen:
chat_model: "dashscope/qwen-plus"
embedding_model: "dashscope/text-embedding-v3"
vision_model: "dashscope/qwen-vl-max"
temperature: 0.3
local-chinese:
chat_model: "ollama/qwen2.5:7b"
embedding_model: "BAAI/bge-m3"
api_base: "http://localhost:11434"
temperature: 0.3Use them with --backend deepseek, --backend qwen, or --backend local-chinese.
pip install langdetect jieba langchain-huggingface sentence-transformers| Package | Purpose |
|---|---|
langdetect |
Fallback language detection for ambiguous text |
jieba |
Chinese word segmentation (for future BM25 hybrid retrieval) |
langchain-huggingface |
Local HuggingFace embedding models (bge-m3) |
sentence-transformers |
Backend for langchain-huggingface |
All Chinese features work without these packages — they are only needed for local embedding models and enhanced detection.
Model/provider config is profile-based. Select profile at runtime with --backend <profile>.
Example profile config:
default_profile: openai
profiles:
local:
chat_model: "openai/gpt-4.1"
embedding_model: "openai/text-embedding-3-small"
vision_model: "openai/gpt-4.1"
api_base: "http://localhost:8000/v1"
api_key: "not-needed"
max_retries: 3
timeout: 120
openai:
chat_model: "openai/gpt-4o"
embedding_model: "openai/text-embedding-3-small"
vision_model: "openai/gpt-4o"
temperature: 0.2
anthropic:
chat_model: "anthropic/claude-sonnet-4-20250514"
embedding_model: "openai/text-embedding-3-small"
vision_model: "anthropic/claude-sonnet-4-20250514"
temperature: 0.2
ollama:
chat_model: "ollama/llama3.1"
embedding_model: "ollama/nomic-embed-text"
api_base: "http://localhost:11434"Config discovery order:
- Explicit path passed to loader
PAPER_READER_LLM_CONFIGenv var./llm_config.yaml(current working directory)~/.config/paper_reader/llm_config.yaml(user-level)- Legacy
PAPER_READER_*env vars (emits deprecation warning) - Bundled
default_llm_config.yamlshipped with the package
Adding a new backend (zero code changes):
# 1. Add a profile to llm_config.yaml — e.g. groq:
# groq:
# chat_model: "groq/llama-3.1-70b-versatile"
# embedding_model: "openai/text-embedding-3-small"
# 2. Set the provider's API key
export GROQ_API_KEY="gsk_..."
# 3. Use it
paper-reader read 2308.13418 --backend groqAny model supported by LiteLLM works out of the box:
| Provider | Model prefix | Example |
|---|---|---|
| OpenAI | openai/ |
openai/gpt-4o |
| Anthropic | anthropic/ |
anthropic/claude-sonnet-4-20250514 |
| Ollama | ollama/ |
ollama/llama3.1 |
| Groq | groq/ |
groq/llama-3.1-70b-versatile |
| Together AI | together_ai/ |
together_ai/mistralai/Mixtral-8x7B |
| Local (OpenAI-compat) | openai/ |
openai/gpt-4.1 (with api_base) |
| Variable | Description | Default |
|---|---|---|
PAPER_READER_LLM_PROFILE |
Default LLM profile name | local |
PAPER_READER_PARSER_BACKEND |
Parser backend (marker, mineru, pymupdf) |
marker |
PAPER_READER_MARKER_USE_LLM |
Enable Marker LLM-augmented parsing | false |
PAPER_READER_GROBID_URL |
GROBID endpoint | http://localhost:8070 |
PAPER_READER_ENABLE_VISION_PARSING |
Enable page-image vision parsing | false |
PAPER_READER_CHROMA_PERSIST_DIR |
ChromaDB storage directory | ~/.paper_reader/chroma |
PAPER_READER_CACHE_DIR |
Download/cache directory | ~/.paper_reader/cache |
All commands that process a paper (read, chat, agent, export, batch)
and the web UI automatically save results to the paper library at ~/.paper_reader/:
| Artifact | Location | Purpose |
|---|---|---|
| Library index | ~/.paper_reader/library.json |
Metadata + PaperOverview per paper |
| Structured papers | ~/.paper_reader/papers/{id}.json |
Full StructuredPaper for reuse |
| Vector index | ~/.paper_reader/chroma/ |
ChromaDB embeddings for Q&A |
Subsequent commands for the same source skip parsing and summarization.
When the source is a local PDF file, caching is based on the file's content
hash, so renaming or moving a file still hits the cache.
Pass --force / -F to bypass the cache and re-process from scratch
(the fresh results replace the cached ones).
| Flag / Env var | Level | Effect |
|---|---|---|
| (default) | INFO |
Normal progress messages |
--verbose / -v |
DEBUG |
Everything including debug traces |
--quiet / -q |
WARNING |
Only warnings and errors |
LITELLM_LOG |
(env var) | Controls LiteLLM's own logging; defaults to ERROR to suppress chatter. Set LITELLM_LOG=DEBUG to see full LiteLLM traces. |
| Marker (default) | MinerU | PyMuPDF4LLM | |
|---|---|---|---|
| Best for | General-purpose, best accuracy | Scanned documents | Quick text extraction |
| Install | Core (marker-pdf) |
Optional (pip install "mineru[all]") |
Core (pymupdf4llm) |
| GPU | Recommended | Supported | Not needed |
| Layout detection | Deep-learning models | Deep-learning models | Rule-based |
| Table extraction | Yes | Yes | Yes |
| Math / equations | Yes | Yes | Limited |
| Figure extraction | Yes | Yes | No |
| LLM-augmented mode | Yes (--use-llm-parsing) |
No | No |
| Fallback | Falls back to PyMuPDF4LLM on failure | Falls back to PyMuPDF4LLM on failure | Automatic fallback for all parsers |
Select a parser with --parser marker|mineru|pymupdf. Marker is the default.
docker run --rm -p 8070:8070 lfoppiano/grobid:0.8.2If you plan to use the mineru parser backend, install optional dependencies:
pip install "mineru[all]"- Symptom: errors mentioning unauthorized access, invalid key, or provider auth failure.
- Fix:
- For OpenAI profiles, set
OPENAI_API_KEY. - For Anthropic profiles, set
ANTHROPIC_API_KEY. - For local profiles, make sure your LLM server is running and
api_baseis reachable.
- For OpenAI profiles, set
- Verify: run
paper-reader read 2308.13418 --backend <profile> --no-grobidand confirm auth errors are gone.
- Symptom: model/provider mismatch, unexpected endpoint, or profile not found.
- Fix:
- Check
default_profileinllm_config.yaml. - Override per command with
--backend <profile>. - Ensure the profile name exists under
profiles:.
- Check
- Verify: run with an explicit profile and confirm logs/output match the expected provider behavior.
- Symptom: app falls back to defaults or legacy env-var behavior.
- Fix:
- Run from the project root (where
llm_config.yamlexists), or - Set
PAPER_READER_LLM_CONFIG=/absolute/path/to/llm_config.yaml.
- Run from the project root (where
- Verify:
ls -la llm_config.yaml(orecho "$PAPER_READER_LLM_CONFIG") points to the intended file.
- Symptom: metadata extraction skipped/unavailable.
- Fix:
- Start GROBID locally:
docker run --rm -p 8070:8070 lfoppiano/grobid:0.8.2
- Or disable it explicitly with
--no-grobid.
- Start GROBID locally:
- Verify:
curl -sS http://localhost:8070/api/isalivereturns a healthy response.
- Symptom: parser backend import/runtime errors.
- Fix:
- Prefer
marker(default) if optional parser deps are missing. - For MinerU, install extras:
pip install "mineru[all]". - Switch parser explicitly:
--parser marker|mineru|pymupdf.
- Prefer
- Verify: run
paper-reader read 2308.13418 --parser <backend> --no-grobidwithout parser import/runtime errors.
- Symptom: connection refused or timeout errors.
- Fix:
- Verify the endpoint in your selected profile (
api_base,embedding_api_base). - Confirm the server is running and reachable from your shell.
- For local models, test with a smaller request first (
paper-reader read <id>).
- Verify the endpoint in your selected profile (
- Verify:
curlto the relevant endpoint returns JSON instead of connection/timeout errors.
-
Inspect active env vars
env | grep -E '^PAPER_READER_|^OPENAI_API_KEY|^ANTHROPIC_API_KEY'
Expected: relevant variables print (or empty output if intentionally unset).
-
Confirm config file resolution inputs
pwd ls -la llm_config.yaml echo "$PAPER_READER_LLM_CONFIG"
Expected:
llm_config.yamlexists in current working directory, orPAPER_READER_LLM_CONFIGpoints to a valid file. -
Check GROBID health (if enabled)
curl -sS http://localhost:8070/api/isalive
Expected: a healthy, non-error response.
-
Check local/proxy model endpoints (if used)
curl -sS http://localhost:8000/v1/models curl -sS http://localhost:8080/v1/models curl -sS http://localhost:11434/api/tags
Expected: JSON payloads. Connection/timeout errors mean the service is not reachable.
-
Run a minimal end-to-end test
paper-reader read 2308.13418 --backend local --no-grobid
Expected: progress stages complete and overview output is printed or saved.
pip install -r requirements-optional.txt
pytest tests/ -v # unit/default test suite
pytest tests/ -v -m integration # tests requiring a running LLM backendParsers, exporters, and Q&A backends use decorator-based registries. Add a new component without modifying existing code:
# Custom parser
from paper_reader.parsing import register_parser
@register_parser("my-parser")
class MyParser:
name = "my-parser"
def parse(self, pdf_path, settings, profile): ...
def is_available(self): return True
# Custom exporter
from paper_reader.export import register_exporter
@register_exporter("html")
class HtmlExporter:
format_name = "html"
file_suffix = ".html"
def export(self, paper, overview, output_path, **kw): ...
# Pipeline hooks
from paper_reader.pipeline.hooks import pipeline_hooks
@pipeline_hooks.after("summarize")
def notify_on_summary(result, **kwargs):
print(f"Summary complete: {result.paper.title}")Input (PDF / arXiv URL / arXiv ID)
│
▼
Ingestion ──→ Parsing (registry: marker / mineru / pymupdf) ──→ Structuring
│ │ │
│ GROBID metadata (optional) ─────────────────────────────┘
│ │
▼ ▼
ArxivMetadata StructuredPaper
│
PipelineRunner (event-yielding) │
┌───────────────────────────────────────┤
▼ ▼
Cache Summarizer (LangGraph)
(Library + Disk) ├── Map: section summaries
│ ├── Reduce: synthesis
│ └── Evaluate: quality check
│ │
│ ▼
│ PaperOverview ──→ Cache
│ │
┌────────┼───────────────────┬───────────────────┤
▼ │ ▼ ▼
Indexing │ Export (registry: Library
(ChromaDB) │ MD / JSON / BibTeX) (JSON DB)
│ │
▼ │
Chat / Agent Q&A ◄─┘ (Q&A backend registry; reuses cache)
├── Retrieve + Grade
├── Generate answer
├── Hallucination check
└── Answer quality check
Pipeline hooks: before/after any step (extensible)
paper_reader/
├── pyproject.toml # Package metadata & entry point
├── requirements.txt # Core dependencies
├── requirements-optional.txt # Dev / test dependencies
├── llm_config.yaml # LLM provider profiles (user-editable)
├── .env.example # Non-LLM settings template
├── paper_reader/
│ ├── __init__.py # Public API surface & re-exports
│ ├── core/ # Layer 0 — foundation types & utilities
│ │ ├── __init__.py
│ │ ├── exceptions.py # PaperReaderError hierarchy
│ │ ├── protocols.py # Structural Protocol definitions
│ │ ├── models.py # Pydantic data models
│ │ ├── config.py # pydantic-settings (parser, storage, etc.)
│ │ ├── language.py # Language detection & output language resolution
│ │ ├── utils.py # Pure utility functions (zero intra-project imports)
│ │ └── library.py # Persistent paper library (JSON)
│ ├── llm/ # Layer 1 — LLM infrastructure
│ │ ├── __init__.py
│ │ ├── config.py # LLM profile model + YAML loader
│ │ ├── backend.py # Thin LiteLLM facade
│ │ └── default_llm_config.yaml # Bundled fallback config
│ ├── processing/ # Layer 2 — document processing pipeline
│ │ ├── __init__.py
│ │ ├── ingestion.py # PDF download & arXiv resolution
│ │ ├── parsing.py # Parser registry + PDF → Markdown backends
│ │ ├── metadata.py # GROBID academic metadata extraction
│ │ ├── structuring.py # Section classification & merging
│ │ ├── summarizer.py # LangGraph map-reduce summarization
│ │ └── indexing.py # Section-aware chunking → ChromaDB
│ ├── qa/ # Layer 3 — Q&A & interactive features
│ │ ├── __init__.py # Q&A backend registry re-exports
│ │ ├── _registry.py # Q&A backend registry (rag / agent)
│ │ ├── rag.py # Adaptive RAG Q&A pipeline
│ │ ├── agent.py # Tool-based agent Q&A
│ │ ├── memory.py # Persistent session memory & consolidation
│ │ ├── context.py # System prompt builder for Q&A
│ │ └── comparison.py # Multi-paper comparative analysis
│ ├── ui/ # Layer 5 — presentation
│ │ ├── __init__.py
│ │ ├── app.py # Gradio web UI (layout + event wiring)
│ │ └── handlers.py # Gradio callback handlers
│ ├── cli/ # Typer CLI (one file per command)
│ │ ├── __init__.py # App object + sub-app registration
│ │ ├── __main__.py # python -m paper_reader.cli support
│ │ ├── _common.py # Logging, filename helpers, i18n lookup
│ │ ├── parse_cmd.py # `parse` command
│ │ ├── read_cmd.py # `read` command
│ │ ├── chat_cmd.py # `chat` command
│ │ ├── agent_cmd.py # `agent` command
│ │ ├── batch_cmd.py # `batch` command
│ │ ├── serve_cmd.py # `serve` command
│ │ ├── export_cmd.py # `export` command
│ │ ├── compare_cmd.py # `compare` command
│ │ ├── library_cmd.py # `library` sub-commands
│ │ └── config_cmd.py # `config` sub-commands
│ ├── pipeline/ # Pipeline orchestration
│ │ ├── __init__.py # Re-exports public names
│ │ ├── steps.py # step_ingest .. step_summarize
│ │ ├── cache.py # CachedResult, load_cached, persist_to_library
│ │ ├── helpers.py # make_settings, resolve_profile, build_metadata_md
│ │ ├── runner.py # PipelineRunner (event-yielding generator)
│ │ ├── events.py # PipelineEvent dataclass
│ │ └── hooks.py # PipelineHooks registry
│ ├── export/ # Export formats
│ │ ├── __init__.py # export_paper() dispatch + exporter registry
│ │ ├── markdown.py # MarkdownExporter
│ │ ├── json_export.py # JsonExporter
│ │ └── bibtex.py # BibtexExporter
│ ├── i18n/ # User-facing strings (CLI + UI)
│ │ ├── __init__.py # get_text(key, lang, **kwargs)
│ │ ├── en.py # English strings
│ │ └── zh.py # Chinese strings
│ └── prompts/ # LLM-facing prompt templates
│ ├── __init__.py # Prompt dispatcher (language-aware)
│ ├── en.py # English prompts
│ └── zh.py # Chinese prompts
├── tests/ # Mirrors source layout
│ ├── conftest.py
│ ├── core/ # Tests for paper_reader.core
│ ├── llm/ # Tests for paper_reader.llm
│ ├── processing/ # Tests for paper_reader.processing
│ ├── qa/ # Tests for paper_reader.qa
│ ├── ui/ # Tests for paper_reader.ui
│ ├── pipeline/ # Tests for paper_reader.pipeline
│ ├── export/ # Tests for paper_reader.export
│ └── cli/ # Tests for paper_reader.cli
└── notebooks/
└── paper_reader.ipynb # Jupyter notebook interface