📄 AI Paper Reader

v0.2.0 · MIT License · Python 3.10+

AI-powered academic paper reader with structured summarization, persistent paper library, and RAG-based Q&A.

Author: Donglin Bai & Claude Code · Email: baidonglin295332@gmail.com · WeChat: bdl332

What it does

Ingests papers from local PDF paths, arXiv URLs, or arXiv IDs (including legacy IDs)
Parses PDFs with marker, mineru, or pymupdf
Optionally enriches metadata via GROBID
Structures the paper into a normalized representation
Generates high-level overviews with LangGraph-driven summarization
Supports interactive chat and tool-based agent Q&A
Exports to Markdown, JSON, or BibTeX
Stores papers in a local library and supports comparison/export workflows
Uses profile-based LLM configuration via llm_config.yaml (LiteLLM-compatible)
Supports Chinese-language papers and Chinese output (简体中文)
Pipeline hooks allow injecting custom logic before/after any processing step

Installation

Requirements

Python 3.10+
pip
Optional: Docker (for local GROBID)

cd paper_reader
pip install -e .

Alternative:

pip install -r requirements.txt

Quick start

1) Configure non-LLM settings

cp .env.example .env

.env controls parser/GROBID/storage behavior (not model profiles).

2) Configure LLM profiles

llm_config.yaml in the repo root is ready to use and can be customized with your own profiles.

3) Run

# Parse only (no LLM summarization)
paper-reader parse 2308.13418

# Parse + summarize
paper-reader read 2308.13418

# Start interactive RAG Q&A
paper-reader chat 2308.13418

# Start tool-based agent Q&A with optional long-session memory
paper-reader agent 2308.13418

# Launch Gradio UI
paper-reader serve

4) Common workflows

# Extract parsed sections without LLM summarization
paper-reader parse 2308.13418 --output parsed.md

# Save markdown summary to file
paper-reader read 2308.13418 --output summary.md

# Force re-processing from scratch
paper-reader read 2308.13418 --force

# Export as JSON or BibTeX
paper-reader export 2308.13418 --format json --output paper.json
paper-reader export 2308.13418 --format bibtex --output paper.bib

# Batch process multiple papers
paper-reader batch 2308.13418 2401.12345 --format markdown --output-dir summaries

# Quiet mode — only warnings and errors
paper-reader read 2308.13418 --quiet

5) Example output

Running paper-reader read 2308.13418 --output summary.md produces:

summary.md (click to expand)

# Nougat: Neural Optical Understanding for Academic Documents

**Authors:** Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic

## One-Line Summary
Nougat is an end-to-end visual Transformer that converts scientific PDF pages directly into structured markup, preserving text, tables, and mathematical expressions without relying on external OCR.

## Motivation
Most scientific knowledge is stored as PDFs, a format that discards semantic structure and is particularly inadequate for mathematical expressions and tables. Existing PDF processing tools and OCR pipelines fail to reliably recover this structure, limiting machine accessibility, searchability, and reuse of scientific content. A robust document-to-markup solution is therefore critical for large-scale scientific knowledge extraction.

## Key Observations
PDFs retain visual layout but lose semantic meaning; mathematical expressions are especially poorly handled by classical OCR and PDF parsers. Prior pipelines that stitch together OCR, layout analysis, and formula recognition are brittle and error-prone. Recent Transformer-based visual document understanding models suggest that text recognition and structural understanding can be learned jointly from images alone.

## Core Idea
The paper reframes scientific document conversion as a visual document understanding problem rather than a traditional OCR task. Nougat uses an end-to-end encoder–decoder Transformer that takes only rasterized page images as input and directly generates a structured markup representation, implicitly learning text, layout, and math recognition in a single model.

## Methods
Nougat follows an encoder–decoder Transformer architecture inspired by Donut. Page images rendered at 96 DPI are resized and padded to 896×672 and encoded using a Swin Transformer (base) visual encoder with pre-trained weights. A large autoregressive Transformer decoder based on mBART generates markup tokens with cross-attention to visual embeddings, using a scientific-domain tokenizer and a maximum sequence length of 4096 tokens. The base model has ~350M parameters, with a smaller 250M-parameter variant. Training uses AdamW over 3 epochs with an effective batch size of 192 and a decaying learning rate. Extensive image augmentations (noise, blur, erosion, distortion, compression) and text-level token replacement are applied to improve robustness and prevent repetition collapse. Inference uses greedy decoding with heuristic repetition detection and early stopping.

## Main Results
Evaluation on an arXiv-based test set uses normalized edit distance, BLEU, METEOR, and precision/recall/F1 across plain text, math, tables, and overall content. Embedded PDF text achieves edit distance 0.255 and F1 79.2; GROBID performs worse overall (edit distance 0.312, F1 73.0), particularly on tables and math. A LaTeX-OCR baseline shows extremely poor math performance (BLEU 0.3, F1 9.7) despite strong aggregate scores when combined with other signals. Nougat small (250M) and base (350M) achieve the best overall performance, with edit distance around 0.07 and F1 ≈93 on all content, strong gains in math (F1 ≈77) and plain text (F1 ≈95.7). The smaller model matches the base model’s accuracy. Anti-repetition training and inference heuristics reduce failed page conversions on out-of-domain documents by 32%, with repetition occurring in about 1.5% of test pages.

## Strengths
End-to-end design eliminates dependence on OCR engines or embedded PDF text. Strong performance on mathematical expressions and tables, where prior systems struggle. Large-scale dataset creation from 1.7M arXiv articles enables robust training. Competitive accuracy even with a smaller 250M-parameter model. Released models and code support reproducibility and future research.

## Limitations
Inference is significantly slower than classical systems (e.g., ~19.5s per batch of 6 pages on an NVIDIA A10G versus GROBID’s ~10.6 pages/s). The model can still collapse into repetitive loops, especially out of domain. Training data is overwhelmingly English, with poor handling of non-Latin scripts. Page-wise independent processing causes cross-page inconsistencies in section numbering and bibliographies. Dataset ground truth contains artifacts from LaTeXML preprocessing and page alignment heuristics.

## Key Figures & Tables
- Qualitative example showing a dense mathematical PDF page converted to LaTeX and re-rendered accurately
- Table comparing Nougat (small/base) against PDF text, GROBID, and LaTeX-OCR across edit distance, BLEU, METEOR, and F1 by modality

## Related Work Context
Nougat builds on advances in OCR, mathematical expression recognition, and visual document understanding. It extends Transformer-based encoder–decoder approaches such as Donut by targeting scientific documents with dense math and structure. Unlike tools such as GROBID or pdf2htmlEX, Nougat directly recovers semantic representations of equations and tables. It complements LayoutLM-style models by focusing on full document-to-markup generation rather than token-level understanding.

## Future Work
Reducing repetition collapse remains the primary open challenge. Improving document-level consistency across pages, expanding multilingual and non-Latin script support, and accelerating inference are important directions. More robust and cleaner ground truth generation, as well as better evaluation metrics for mathematically equivalent expressions, are also highlighted as future research opportunities.

CLI commands

All commands support --verbose / -v for debug logging, --quiet / -q for warnings-only output, and --force / -F to re-process from scratch.

Accepted input formats

<source> in the commands below accepts any of:

Format	Example
arXiv ID	`2308.13418`
arXiv URL	`https://arxiv.org/abs/2308.13418`
Legacy arXiv ID	`hep-th/9905111`
Local PDF path	`./papers/my_paper.pdf`

Core analysis

# Parse only (no LLM summarization)
paper-reader parse <source> \
  [-b <profile>] [-p marker|mineru|pymupdf] \
  [--vision] [--no-grobid] [--use-llm-parsing] \
  [-l auto|en|zh] [--output-language auto|en|zh] \
  [-o parsed.md] [-v] [-q]

# Parse + summarize
paper-reader read <source> \
  [-b <profile>] [-p marker|mineru|pymupdf] \
  [--vision] [--no-grobid] [--use-llm-parsing] \
  [-l auto|en|zh] [--output-language auto|en|zh] \
  [-o summary.md] [--force] [-v] [-q]

# Parse + summarize + interactive RAG Q&A
paper-reader chat <source> \
  [-b <profile>] [-p marker|mineru|pymupdf] \
  [--vision] [--no-grobid] [--use-llm-parsing] \
  [-l auto|en|zh] [--output-language auto|en|zh] \
  [--force] [-v] [-q]

# Parse + summarize + tool-based agent Q&A
paper-reader agent <source> \
  [-b <profile>] [-p marker|mineru|pymupdf] \
  [--vision] [--no-grobid] [--use-llm-parsing] \
  [-l auto|en|zh] [--output-language auto|en|zh] \
  [--no-memory] [--consolidation-threshold 20] \
  [--force] [-v] [-q]

Web UI

paper-reader serve [--host 127.0.0.1] [--port 7860] [-b <profile>] [-v] [-q]

The web UI provides three actions:

Analyze Paper — runs the full pipeline (parse + summarize) in one step
Parse Only — runs ingestion, parsing, metadata extraction, and structuring without LLM summarization
Summarize — generates the LLM overview from an already-parsed paper

The Chat tab supports switching between the RAG Chat and Agent (tool-based) Q&A pipelines.

Batch / export / library / compare

# Batch analysis and export
paper-reader batch <source1> <source2> ... \
  [-d summaries] [-f markdown|json|bibtex] \
  [-b <profile>] [-p marker|mineru|pymupdf] \
  [-l auto|en|zh] [--output-language auto|en|zh] \
  [--no-grobid] [--force] [-v] [-q]

# Export one paper directly
paper-reader export <source> \
  [-o paper.md] [-f markdown|json|bibtex] \
  [-b <profile>] [-p marker|mineru|pymupdf] \
  [-l auto|en|zh] [--output-language auto|en|zh] \
  [--no-grobid] [--force] [-v] [-q]

# Library management
paper-reader library list
paper-reader library search <query>
paper-reader library remove <paper_id>

# Compare papers already in the library (requires ≥ 2 IDs)
paper-reader compare <id1> <id2> [<id3> ...] \
  [-o comparison.md] [-b <profile>] \
  [--output-language auto|en|zh] [-v] [-q]

Flag reference (short → long)

Short	Long	Used by
`-b`	`--backend`	all commands
`-p`	`--parser`	`parse`, `read`, `chat`, `agent`, `batch`, `export`
`-o`	`--output`	`parse`, `read`, `export`, `compare`, `config migrate`
`-f`	`--format`	`batch`, `export`
`-d`	`--output-dir`	`batch`
`-l`	`--language`	`parse`, `read`, `chat`, `agent`, `batch`, `export`
	`--output-language`	`parse`, `read`, `chat`, `agent`, `batch`, `export`, `compare`
`-F`	`--force`	`read`, `chat`, `agent`, `batch`, `export`
`-v`	`--verbose`	all commands
`-q`	`--quiet`	all commands

Config migration (legacy env vars)

paper-reader config migrate [--output llm_config.yaml] [--force]

Use this when migrating from older PAPER_READER_* LLM env-var setups.

Chinese language support

Paper Reader supports Chinese-language papers and Chinese output out of the box.

Reading a Chinese paper

# Auto-detect language and produce Chinese output
paper-reader read chinese_paper.pdf

# Force Chinese output even for an English paper
paper-reader read 2308.13418 --output-language zh

# Ask questions in Chinese
paper-reader chat chinese_paper.pdf --output-language zh

Language flags

Flag	Commands	Description
`--language`, `-l`	`parse`, `read`, `chat`, `agent`, `batch`, `export`	Paper language (`auto`/`en`/`zh`). Default: `auto` (detected from content).
`--output-language`	`parse`, `read`, `chat`, `agent`, `batch`, `export`, `compare`	Output language (`auto`/`en`/`zh`). Default: `auto` (same as detected input).

The Gradio web UI includes an Output Language dropdown in the sidebar.

How it works

Language detection — Uses a CJK character ratio heuristic (with optional langdetect fallback) to classify papers as Chinese or English.
Chinese-aware chunking — Smaller chunk sizes (600 chars) and Chinese sentence-boundary separators (。！？；，) for better retrieval.
Chinese heading detection — Recognises Chinese section headings (第一章, 一、, 摘要, 引言, etc.) as a fallback when Markdown headings are absent.
Localised prompts — All LLM prompts have Chinese variants in paper_reader/prompts/zh.py.
Localised output — Overview headers, comparison headers, and CLI progress messages switch to Chinese when appropriate.

Chinese-optimised LLM profiles

Three pre-configured profiles for Chinese content are included in default_llm_config.yaml:

deepseek:
  chat_model: "deepseek/deepseek-chat"
  embedding_model: "BAAI/bge-m3"
  temperature: 0.3

qwen:
  chat_model: "dashscope/qwen-plus"
  embedding_model: "dashscope/text-embedding-v3"
  vision_model: "dashscope/qwen-vl-max"
  temperature: 0.3

local-chinese:
  chat_model: "ollama/qwen2.5:7b"
  embedding_model: "BAAI/bge-m3"
  api_base: "http://localhost:11434"
  temperature: 0.3

Use them with --backend deepseek, --backend qwen, or --backend local-chinese.

Optional Chinese dependencies

pip install langdetect jieba langchain-huggingface sentence-transformers

Package	Purpose
`langdetect`	Fallback language detection for ambiguous text
`jieba`	Chinese word segmentation (for future BM25 hybrid retrieval)
`langchain-huggingface`	Local HuggingFace embedding models (bge-m3)
`sentence-transformers`	Backend for langchain-huggingface

All Chinese features work without these packages — they are only needed for local embedding models and enhanced detection.

Configuration

LLM profiles (`llm_config.yaml`)

Model/provider config is profile-based. Select profile at runtime with --backend <profile>.

Example profile config:

default_profile: openai

profiles:
  local:
    chat_model: "openai/gpt-4.1"
    embedding_model: "openai/text-embedding-3-small"
    vision_model: "openai/gpt-4.1"
    api_base: "http://localhost:8000/v1"
    api_key: "not-needed"
    max_retries: 3
    timeout: 120

  openai:
    chat_model: "openai/gpt-4o"
    embedding_model: "openai/text-embedding-3-small"
    vision_model: "openai/gpt-4o"
    temperature: 0.2

  anthropic:
    chat_model: "anthropic/claude-sonnet-4-20250514"
    embedding_model: "openai/text-embedding-3-small"
    vision_model: "anthropic/claude-sonnet-4-20250514"
    temperature: 0.2

  ollama:
    chat_model: "ollama/llama3.1"
    embedding_model: "ollama/nomic-embed-text"
    api_base: "http://localhost:11434"

Config discovery order:

Explicit path passed to loader
PAPER_READER_LLM_CONFIG env var
./llm_config.yaml (current working directory)
~/.config/paper_reader/llm_config.yaml (user-level)
Legacy PAPER_READER_* env vars (emits deprecation warning)
Bundled default_llm_config.yaml shipped with the package

Adding a new backend (zero code changes):

# 1. Add a profile to llm_config.yaml — e.g. groq:
#    groq:
#      chat_model: "groq/llama-3.1-70b-versatile"
#      embedding_model: "openai/text-embedding-3-small"
# 2. Set the provider's API key
export GROQ_API_KEY="gsk_..."
# 3. Use it
paper-reader read 2308.13418 --backend groq

Supported backends

Any model supported by LiteLLM works out of the box:

Provider	Model prefix	Example
OpenAI	`openai/`	`openai/gpt-4o`
Anthropic	`anthropic/`	`anthropic/claude-sonnet-4-20250514`
Ollama	`ollama/`	`ollama/llama3.1`
Groq	`groq/`	`groq/llama-3.1-70b-versatile`
Together AI	`together_ai/`	`together_ai/mistralai/Mixtral-8x7B`
Local (OpenAI-compat)	`openai/`	`openai/gpt-4.1` (with `api_base`)

Non-LLM settings (`.env`)

Variable	Description	Default
`PAPER_READER_LLM_PROFILE`	Default LLM profile name	`local`
`PAPER_READER_PARSER_BACKEND`	Parser backend (`marker`, `mineru`, `pymupdf`)	`marker`
`PAPER_READER_MARKER_USE_LLM`	Enable Marker LLM-augmented parsing	`false`
`PAPER_READER_GROBID_URL`	GROBID endpoint	`http://localhost:8070`
`PAPER_READER_ENABLE_VISION_PARSING`	Enable page-image vision parsing	`false`
`PAPER_READER_CHROMA_PERSIST_DIR`	ChromaDB storage directory	`~/.paper_reader/chroma`
`PAPER_READER_CACHE_DIR`	Download/cache directory	`~/.paper_reader/cache`

Caching & library

All commands that process a paper (read, chat, agent, export, batch) and the web UI automatically save results to the paper library at ~/.paper_reader/:

Artifact	Location	Purpose
Library index	`~/.paper_reader/library.json`	Metadata + `PaperOverview` per paper
Structured papers	`~/.paper_reader/papers/{id}.json`	Full `StructuredPaper` for reuse
Vector index	`~/.paper_reader/chroma/`	ChromaDB embeddings for Q&A

Subsequent commands for the same source skip parsing and summarization. When the source is a local PDF file, caching is based on the file's content hash, so renaming or moving a file still hits the cache. Pass --force / -F to bypass the cache and re-process from scratch (the fresh results replace the cached ones).

Logging

Flag / Env var	Level	Effect
(default)	`INFO`	Normal progress messages
`--verbose` / `-v`	`DEBUG`	Everything including debug traces
`--quiet` / `-q`	`WARNING`	Only warnings and errors
`LITELLM_LOG`	(env var)	Controls LiteLLM's own logging; defaults to `ERROR` to suppress chatter. Set `LITELLM_LOG=DEBUG` to see full LiteLLM traces.

Optional components

PDF parsers

	Marker (default)	MinerU	PyMuPDF4LLM
Best for	General-purpose, best accuracy	Scanned documents	Quick text extraction
Install	Core (`marker-pdf`)	Optional (`pip install "mineru[all]"`)	Core (`pymupdf4llm`)
GPU	Recommended	Supported	Not needed
Layout detection	Deep-learning models	Deep-learning models	Rule-based
Table extraction	Yes	Yes	Yes
Math / equations	Yes	Yes	Limited
Figure extraction	Yes	Yes	No
LLM-augmented mode	Yes (`--use-llm-parsing`)	No	No
Fallback	Falls back to PyMuPDF4LLM on failure	Falls back to PyMuPDF4LLM on failure	Automatic fallback for all parsers

Select a parser with --parser marker|mineru|pymupdf. Marker is the default.

GROBID

docker run --rm -p 8070:8070 lfoppiano/grobid:0.8.2

MinerU parser extras

If you plan to use the mineru parser backend, install optional dependencies:

pip install "mineru[all]"

Troubleshooting

Missing API key / auth errors

Symptom: errors mentioning unauthorized access, invalid key, or provider auth failure.
Fix:
- For OpenAI profiles, set OPENAI_API_KEY.
- For Anthropic profiles, set ANTHROPIC_API_KEY.
- For local profiles, make sure your LLM server is running and api_base is reachable.
Verify: run paper-reader read 2308.13418 --backend <profile> --no-grobid and confirm auth errors are gone.

Wrong profile selected

Symptom: model/provider mismatch, unexpected endpoint, or profile not found.
Fix:
- Check default_profile in llm_config.yaml.
- Override per command with --backend <profile>.
- Ensure the profile name exists under profiles:.
Verify: run with an explicit profile and confirm logs/output match the expected provider behavior.

`llm_config.yaml` not picked up

Symptom: app falls back to defaults or legacy env-var behavior.
Fix:
- Run from the project root (where llm_config.yaml exists), or
- Set PAPER_READER_LLM_CONFIG=/absolute/path/to/llm_config.yaml.
Verify: ls -la llm_config.yaml (or echo "$PAPER_READER_LLM_CONFIG") points to the intended file.

GROBID unavailable

Symptom: metadata extraction skipped/unavailable.
Fix:
- Start GROBID locally:
```
docker run --rm -p 8070:8070 lfoppiano/grobid:0.8.2
```
- Or disable it explicitly with --no-grobid.
Verify: curl -sS http://localhost:8070/api/isalive returns a healthy response.

Parser dependency issues

Symptom: parser backend import/runtime errors.
Fix:
- Prefer marker (default) if optional parser deps are missing.
- For MinerU, install extras: pip install "mineru[all]".
- Switch parser explicitly: --parser marker|mineru|pymupdf.
Verify: run paper-reader read 2308.13418 --parser <backend> --no-grobid without parser import/runtime errors.

Local/hosted backend not reachable

Symptom: connection refused or timeout errors.
Fix:
- Verify the endpoint in your selected profile (api_base, embedding_api_base).
- Confirm the server is running and reachable from your shell.
- For local models, test with a smaller request first (paper-reader read <id>).
Verify: curl to the relevant endpoint returns JSON instead of connection/timeout errors.

Debug checklist

Inspect active env vars
```
env | grep -E '^PAPER_READER_|^OPENAI_API_KEY|^ANTHROPIC_API_KEY'
```
Expected: relevant variables print (or empty output if intentionally unset).
Confirm config file resolution inputs
```
pwd
ls -la llm_config.yaml
echo "$PAPER_READER_LLM_CONFIG"
```
Expected: llm_config.yaml exists in current working directory, or PAPER_READER_LLM_CONFIG points to a valid file.
Check GROBID health (if enabled)
```
curl -sS http://localhost:8070/api/isalive
```
Expected: a healthy, non-error response.
Check local/proxy model endpoints (if used)
```
curl -sS http://localhost:8000/v1/models
curl -sS http://localhost:8080/v1/models
curl -sS http://localhost:11434/api/tags
```
Expected: JSON payloads. Connection/timeout errors mean the service is not reachable.
Run a minimal end-to-end test
```
paper-reader read 2308.13418 --backend local --no-grobid
```
Expected: progress stages complete and overview output is printed or saved.

Development

pip install -r requirements-optional.txt
pytest tests/ -v                  # unit/default test suite
pytest tests/ -v -m integration   # tests requiring a running LLM backend

Extending Paper Reader

Parsers, exporters, and Q&A backends use decorator-based registries. Add a new component without modifying existing code:

# Custom parser
from paper_reader.parsing import register_parser

@register_parser("my-parser")
class MyParser:
    name = "my-parser"
    def parse(self, pdf_path, settings, profile): ...
    def is_available(self): return True

# Custom exporter
from paper_reader.export import register_exporter

@register_exporter("html")
class HtmlExporter:
    format_name = "html"
    file_suffix = ".html"
    def export(self, paper, overview, output_path, **kw): ...

# Pipeline hooks
from paper_reader.pipeline.hooks import pipeline_hooks

@pipeline_hooks.after("summarize")
def notify_on_summary(result, **kwargs):
    print(f"Summary complete: {result.paper.title}")

High-level architecture

Input (PDF / arXiv URL / arXiv ID)
    │
    ▼
Ingestion ──→ Parsing (registry: marker / mineru / pymupdf) ──→ Structuring
    │              │                                                  │
    │         GROBID metadata (optional) ─────────────────────────────┘
    │                                                                 │
    ▼                                                                 ▼
ArxivMetadata                                                  StructuredPaper
                                                                      │
                              PipelineRunner (event-yielding)         │
                              ┌───────────────────────────────────────┤
                              ▼                                       ▼
                         Cache                                Summarizer (LangGraph)
                  (Library + Disk)                            ├── Map: section summaries
                              │                               ├── Reduce: synthesis
                              │                               └── Evaluate: quality check
                              │                                       │
                              │                                       ▼
                              │                                 PaperOverview ──→ Cache
                              │                                       │
                     ┌────────┼───────────────────┬───────────────────┤
                     ▼        │                   ▼                   ▼
                 Indexing      │          Export (registry:       Library
               (ChromaDB)      │        MD / JSON / BibTeX)     (JSON DB)
                     │        │
                     ▼        │
            Chat / Agent Q&A ◄─┘  (Q&A backend registry; reuses cache)
            ├── Retrieve + Grade
            ├── Generate answer
            ├── Hallucination check
            └── Answer quality check

            Pipeline hooks: before/after any step (extensible)

Project structure

paper_reader/
├── pyproject.toml              # Package metadata & entry point
├── requirements.txt            # Core dependencies
├── requirements-optional.txt   # Dev / test dependencies
├── llm_config.yaml             # LLM provider profiles (user-editable)
├── .env.example                # Non-LLM settings template
├── paper_reader/
│   ├── __init__.py             # Public API surface & re-exports
│   ├── core/                   # Layer 0 — foundation types & utilities
│   │   ├── __init__.py
│   │   ├── exceptions.py       # PaperReaderError hierarchy
│   │   ├── protocols.py        # Structural Protocol definitions
│   │   ├── models.py           # Pydantic data models
│   │   ├── config.py           # pydantic-settings (parser, storage, etc.)
│   │   ├── language.py         # Language detection & output language resolution
│   │   ├── utils.py            # Pure utility functions (zero intra-project imports)
│   │   └── library.py          # Persistent paper library (JSON)
│   ├── llm/                    # Layer 1 — LLM infrastructure
│   │   ├── __init__.py
│   │   ├── config.py           # LLM profile model + YAML loader
│   │   ├── backend.py          # Thin LiteLLM facade
│   │   └── default_llm_config.yaml  # Bundled fallback config
│   ├── processing/             # Layer 2 — document processing pipeline
│   │   ├── __init__.py
│   │   ├── ingestion.py        # PDF download & arXiv resolution
│   │   ├── parsing.py          # Parser registry + PDF → Markdown backends
│   │   ├── metadata.py         # GROBID academic metadata extraction
│   │   ├── structuring.py      # Section classification & merging
│   │   ├── summarizer.py       # LangGraph map-reduce summarization
│   │   └── indexing.py         # Section-aware chunking → ChromaDB
│   ├── qa/                     # Layer 3 — Q&A & interactive features
│   │   ├── __init__.py         # Q&A backend registry re-exports
│   │   ├── _registry.py        # Q&A backend registry (rag / agent)
│   │   ├── rag.py              # Adaptive RAG Q&A pipeline
│   │   ├── agent.py            # Tool-based agent Q&A
│   │   ├── memory.py           # Persistent session memory & consolidation
│   │   ├── context.py          # System prompt builder for Q&A
│   │   └── comparison.py       # Multi-paper comparative analysis
│   ├── ui/                     # Layer 5 — presentation
│   │   ├── __init__.py
│   │   ├── app.py              # Gradio web UI (layout + event wiring)
│   │   └── handlers.py         # Gradio callback handlers
│   ├── cli/                    # Typer CLI (one file per command)
│   │   ├── __init__.py         # App object + sub-app registration
│   │   ├── __main__.py         # python -m paper_reader.cli support
│   │   ├── _common.py          # Logging, filename helpers, i18n lookup
│   │   ├── parse_cmd.py        # `parse` command
│   │   ├── read_cmd.py         # `read` command
│   │   ├── chat_cmd.py         # `chat` command
│   │   ├── agent_cmd.py        # `agent` command
│   │   ├── batch_cmd.py        # `batch` command
│   │   ├── serve_cmd.py        # `serve` command
│   │   ├── export_cmd.py       # `export` command
│   │   ├── compare_cmd.py      # `compare` command
│   │   ├── library_cmd.py      # `library` sub-commands
│   │   └── config_cmd.py       # `config` sub-commands
│   ├── pipeline/               # Pipeline orchestration
│   │   ├── __init__.py         # Re-exports public names
│   │   ├── steps.py            # step_ingest .. step_summarize
│   │   ├── cache.py            # CachedResult, load_cached, persist_to_library
│   │   ├── helpers.py          # make_settings, resolve_profile, build_metadata_md
│   │   ├── runner.py           # PipelineRunner (event-yielding generator)
│   │   ├── events.py           # PipelineEvent dataclass
│   │   └── hooks.py            # PipelineHooks registry
│   ├── export/                 # Export formats
│   │   ├── __init__.py         # export_paper() dispatch + exporter registry
│   │   ├── markdown.py         # MarkdownExporter
│   │   ├── json_export.py      # JsonExporter
│   │   └── bibtex.py           # BibtexExporter
│   ├── i18n/                   # User-facing strings (CLI + UI)
│   │   ├── __init__.py         # get_text(key, lang, **kwargs)
│   │   ├── en.py               # English strings
│   │   └── zh.py               # Chinese strings
│   └── prompts/                # LLM-facing prompt templates
│       ├── __init__.py         # Prompt dispatcher (language-aware)
│       ├── en.py               # English prompts
│       └── zh.py               # Chinese prompts
├── tests/                      # Mirrors source layout
│   ├── conftest.py
│   ├── core/                   # Tests for paper_reader.core
│   ├── llm/                    # Tests for paper_reader.llm
│   ├── processing/             # Tests for paper_reader.processing
│   ├── qa/                     # Tests for paper_reader.qa
│   ├── ui/                     # Tests for paper_reader.ui
│   ├── pipeline/               # Tests for paper_reader.pipeline
│   ├── export/                 # Tests for paper_reader.export
│   └── cli/                    # Tests for paper_reader.cli
└── notebooks/
    └── paper_reader.ipynb      # Jupyter notebook interface

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
notebooks		notebooks
paper_reader		paper_reader
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
llm_config.yaml		llm_config.yaml
pyproject.toml		pyproject.toml
requirements-optional.txt		requirements-optional.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

📄 AI Paper Reader

Contents

What it does

Installation

Requirements

Quick start

1) Configure non-LLM settings

2) Configure LLM profiles

3) Run

4) Common workflows

5) Example output

CLI commands

Accepted input formats

Core analysis

Web UI

Batch / export / library / compare

Config migration (legacy env vars)

Chinese language support

Reading a Chinese paper

Language flags

How it works

Chinese-optimised LLM profiles

Optional Chinese dependencies

Configuration

LLM profiles (llm_config.yaml)

Supported backends

Non-LLM settings (.env)

Caching & library

Logging

Optional components

PDF parsers

GROBID

MinerU parser extras

Troubleshooting

Missing API key / auth errors

Wrong profile selected

llm_config.yaml not picked up

GROBID unavailable

Parser dependency issues

Local/hosted backend not reachable

Debug checklist

Development

Extending Paper Reader

High-level architecture

Project structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

LLM profiles (`llm_config.yaml`)

Non-LLM settings (`.env`)

`llm_config.yaml` not picked up

Packages