Paper AI Reader is a Python tool that turns papers saved in a Notion database into structured AI-generated research notes.
It reads pending papers from Notion, extracts text from webpages or PDFs, analyzes the paper with an OpenAI-compatible AI provider, updates the Notion page title with the real paper title, writes research keywords, and replaces the page body with structured notes.
- CLI pipeline and PySide6 desktop GUI
- Chinese, Japanese, and English UI
- Chinese, Japanese, and English prompt output
- XML-only runtime configuration
- OpenAI-compatible provider support through
base_url - Notion
data_sourcesquery support - Webpage and PDF text extraction
- Fallback to existing Notion page text when website fetching fails
- Real paper title detection and Notion title update
- Keyword extraction into the Notion
Keywordsproperty - Existing Notion page blocks are deleted only after fetching and AI analysis succeed
Start the desktop GUI:
python gui.pyThe GUI contains:
Dashboard: start or stop the reading pipeline, inspect logs, and view model request/response text.Prompt: choose the note output language and preview prompt XML.Setting: configure Notion, AI provider, model, base URL, and text limit.
While a connectivity check, model refresh, or reading pipeline is running, the GUI temporarily locks Prompt, Setting, and language controls to prevent mid-run configuration drift. The Setting page also warns before discarding unsaved changes.
Run the CLI pipeline:
python main.pyThe CLI validates Notion and AI connectivity before starting the pipeline.
The Notion database should contain:
| Property | Type | Required | Description |
|---|---|---|---|
Title |
Title | Yes | Page title. The app can replace it with the real paper title. |
Website |
URL | Yes | Paper webpage or PDF URL. |
Status |
Status or Select | Yes | Workflow status. |
Keywords |
Multi-select, Select, or Rich text | Recommended | AI-extracted keywords. |
Processable statuses:
TBDAI Reading
Skipped statuses:
AI Read DoneHuman ReadingDONE
The app writes only:
AI ReadingAI Read Done
Create a virtual environment:
python3 -m venv venv
source venv/bin/activateInstall dependencies:
pip install -r requirements.txtConfiguration is XML-based. CLI and GUI read the same settings file. Copy the example:
cp config/settings.example.xml config/settings.xmlThen edit config/settings.xml.
Important config fields:
notion_tokennotion_database_idai_api_keyai_modelai_base_urlpaper_text_limitui_languagetheme_modeprompt_language
Leave ai_base_url empty for the default OpenAI API. For compatible providers, use their /v1 base URL.
Prompt XML files live in:
prompts/zh.xmlprompts/ja.xmlprompts/en.xml
Each prompt XML contains both system_prompt and user_prompt_template. The template supports {title}, {website}, and {paper_text}.
The default system_prompt includes example research directions such as LLM, ROS2, and HRI. Edit the prompt XML for your own field before regular use.
- Query the Notion database.
- Select pages whose status is
TBDorAI Reading. - Mark the page as
AI Reading. - Fetch text from the
WebsiteURL. - If fetching fails, try existing Notion page text.
- Analyze the paper with the configured AI provider.
- Parse structured JSON from the model response.
- Update the Notion page title with
paper_title. - Update the
Keywordsproperty when available. - Delete existing page blocks.
- Write structured notes.
- Mark the page as
AI Read Done.
The AI response is normalized to this shape:
{
"paper_title": "Real paper title",
"summary": "...",
"idea": "...",
"rating": 5,
"reason": "...",
"keywords": ["HRI", "ROS2", "emotion-aware interaction"],
"code_available": true,
"code_url": "https://github.com/example/project"
}.
├── main.py
├── gui.py
├── requirements.txt
├── config
│ └── settings.example.xml
├── prompts
│ ├── zh.xml
│ ├── ja.xml
│ └── en.xml
├── paper_ai_reader
│ ├── analyzer.py
│ ├── backend.py
│ ├── config.py
│ ├── connectivity.py
│ ├── fetcher.py
│ ├── notion_service.py
│ ├── pipeline.py
│ ├── prompts.py
│ └── gui
│ ├── app.py
│ ├── style.qss
│ └── i18n.py
├── tests
│ ├── test_analyzer.py
│ ├── test_config.py
│ ├── test_connectivity.py
│ ├── test_fetcher.py
│ └── test_notion_service.py
├── test_notion.py
└── test_blocks.py
Compile-check the project without calling external APIs:
python -m compileall main.py gui.py paper_ai_reader test_blocks.py test_notion.pyRun automated tests:
python -m pip install -r requirements-dev.txt
python -m pytestValidate example XML files:
python - <<'PY'
from paper_ai_reader.config import validate_runtime_files
print(validate_runtime_files("cli", "config/settings.example.xml"))
print(validate_runtime_files("gui", "config/settings.example.xml"))
PYAn empty list means the XML files are valid.
Build the current platform app and a source zip:
python scripts/build_release.py --version v0.1.0Artifacts are written to release/. Python desktop packages are built on the host
platform, so use the manual GitHub Actions workflow in .github/workflows/release.yml
or run the script on macOS, Linux, and Windows to produce all three app packages.
The workflow also runs when a GitHub Release is published and uploads the generated
zip files to that release.
Packaged desktop apps copy settings.example.xml and prompt XML files into a user
config directory on first launch. Source runs use config/settings.xml and
prompts/*.xml inside the repository.
- Runtime XML files can contain secrets and should not be committed.
.envis not used by the current runtime.test_notion.pyandtest_blocks.pyare manual debugging scripts and call the Notion API directly.- PDF extraction uses
pypdf; complex multi-column papers, formulas, and figure captions may not extract cleanly. For difficult papers, provide a readable webpage, curate the text manually, or integrate a dedicated academic PDF/OCR tool.