paper2ppt converts academic PDF papers into editable PowerPoint decks and matching speaker scripts.
The current project goal is practical paper presentation generation: reuse the paper's original figures and tables, use only text LLMs for planning and writing, render native editable .pptx slides, and keep a spec-aware QA/repair loop around the output.
This project is built on ideas and code paths from HKUDS/Paper2Slides, and it also borrows presentation-structuring inspiration from gejifeng/Paper2PPT. The main implementation in this repository is still the paper2slides/-based workflow, heavily modified for text-only LLM calls and native PPTX generation.
From HKUDS/Paper2Slides, this project keeps the useful paper-processing foundation:
- PDF parsing and source asset extraction.
- Summary, content planning, and checkpoint-style reruns.
- A command-line workflow for turning a paper into presentation material.
The main change is the generation path. The original image-style slide path has been replaced with a lower-cost, text-only LLM workflow:
- The model plans structured slide specs instead of generating slide images.
python-pptxrenders native editable PowerPoint objects: text boxes, shapes, tables, and inserted source figures.- All model calls are configured to use
gpt-5-mini. - The workflow generates a matching
speaker_script.md. - Spec and layout QA check for empty components, clipped text, weak metric cards, truncated ellipses, missing numbered-point fields, layout/payload mismatches, and decorative elements that do not carry information.
- Numbered points are represented with structured
claim,detail, andevidencefields before rendering. - The deck now has presentation structure: title page, contents page, section dividers, key-message blocks, numbered claim/detail/evidence points, source figures, and compact metric cards.
From gejifeng/Paper2PPT, this project mainly borrows product ideas rather than runtime code:
- Stronger section-aware paper storytelling.
- A more detailed companion-material mindset for long technical papers.
- Optional lightweight Beamer/TeX sidecar generation implemented inside this repository.
This repository does not vendor Paper2PPT and does not depend on it at runtime.
The project now supports:
- Editable PowerPoint output:
slides.pptx. - A matching narration draft:
speaker_script.md. - An optional lightweight Beamer/TeX sidecar generated by this repository's own code. This is a reference/backup path, not the main deliverable.
- LangChain/LangGraph-based text LLM orchestration.
- Default text model configuration using
gpt-5-mini. - Optional exact slide count with
--slides. - Section-aware decks with title page, contents page, section dividers, key-message blocks, structured numbered points, compact metric cards, source figures, and tables.
- Spec-aware evaluator plus PPTX layout QA.
- Bounded repair loop that reworks only failed slide specs before rerendering.
- Layout normalization for unsupported LLM layout names and visual/table layouts without matching payload.
- Automatic figure-analysis gating with
PPTX_ENABLE_FIGURE_ANALYSIS=auto. - Deterministic fallback generation with
PPTX_FORCE_DETERMINISTIC=1for cheap reruns from existing checkpoints.
The most recent visual iteration focused on making the generated deck look like a real presentation:
- Added a proper title page with title, authors, context/date, and summary tiles.
- Added a contents page with meaningful section progress lines.
- Added section divider pages.
- Reworked normal slides into title bar + key message + structured numbered points.
- Removed meaningless tiny connector marks beside numbered bullets.
- Restored useful decorative bars and tiles when they carry information.
- Improved bullet rendering so points show a short claim plus a complete detail sentence instead of clipped ellipses.
- Added evaluator-driven repair for missing point fields, low-quality metric labels/values, empty components, unsupported layouts, visual/table payload mismatches, and severe layout defects.
The current main local test paper is:
test_papers/DeepSeek_V4.pdf
The latest checked output during development was generated under:
outputs/DeepSeek_V4/paper/fast/slides_academic_medium_24slides/
Additional cross-paper checks have also been run with:
test_papers/Deep Residual Learning for Image Recognition.pdf
test_papers/Thinking_with_Visual_Primitives.pdf
test_papers/mHC:Manifold-Constrained Hyper-Connections.pdf
The exact timestamped folder changes per run. A successful run should include:
slides.pptx
speaker_script.md
layout_qa.json
Some runs may also include:
detailed_slides.tex
detailed_slides.pdf
Those TeX/PDF files are optional reference artifacts. The primary deliverables are still slides.pptx and speaker_script.md.
PDF
-> parsing and source asset extraction
-> summary checkpoint
-> content plan checkpoint
-> LangGraph PPTX workflow
-> source packet
-> optional source-figure understanding
-> text LLM deck curation
-> slide spec validation and numbered-point normalization
-> native PPTX rendering
-> spec evaluator + layout QA
-> failed-slide repair loop
-> speaker script generation
-> optional detailed Beamer/TeX sidecar generation
For a diagram and interview-friendly explanation of the evaluator-driven loop, see Agentic PPTX Workflow.
The generated PPTX is not a screenshot deck. It uses native PowerPoint text boxes, shapes, tables, and inserted source images, so it remains editable in PowerPoint.
- Windows, macOS, or Linux
- Python 3.10 or newer; Python 3.12 is recommended
- Conda or another Python environment manager
- A text-model API compatible with the OpenAI chat-completions interface
The project has been developed in a local conda environment named paper2slides, but the name is not required.
conda create -n paper2ppt python=3.12
conda activate paper2ppt
pip install -r requirements.txtIf you already have a suitable Python environment:
pip install -r requirements.txtpaper2ppt reads API settings from:
paper2slides/.env
For public GitHub safety, only the template should be committed:
paper2slides/.env.example
If setting up a new clone, create the local env file from the template:
copy paper2slides\.env.example paper2slides\.envDo not commit the local paper2slides/.env file.
Typical configuration:
RAG_LLM_API_KEY=your_api_key_here
RAG_LLM_BASE_URL=https://api.example.com/v1
LLM_MODEL=gpt-5-miniImportant cost rule:
LLM_MODEL=gpt-5-mini
PPTX_VISION_MODEL=gpt-5-miniIf model calls are not needed and you only want to rerender from existing checkpoints:
PPTX_FORCE_DETERMINISTIC=1Optional figure understanding:
PPTX_ENABLE_FIGURE_ANALYSIS=auto
PPTX_VISION_MODEL=gpt-5-mini
PPTX_MAX_FIGURE_ANALYSIS=5This optional step analyzes source paper figures. It does not generate new images. In auto mode, it only runs when figure captions look too weak for reliable slide curation. Use 1 to force it on or 0 to force it off.
In fast paper mode, redundant paper_info RAG querying is skipped because paper metadata is extracted directly from parsed markdown during summary generation.
Typical run:
python -m paper2slides --input test_papers\DeepSeek_V4.pdf --output slides --style academic --length medium --slides 24 --fastCheap rerun from existing checkpoints:
$env:PPTX_FORCE_DETERMINISTIC="1"
python -m paper2slides --input test_papers\DeepSeek_V4.pdf --output slides --style academic --length medium --slides 24 --fast --from-stage generateMain options:
--input PDF file path
--output slides
--style academic or a custom style description
--length short, medium, or long
--slides exact target content-slide count; overrides --length
--fast use direct parsing/query flow instead of full indexing
--from-stage rag, summary, plan, or generate
--list list previous outputs
--debug print more logs
Dynamic slide count:
short roughly 8-12 content slides
medium roughly 14-22 content slides
long roughly 24-36 content slides
Use --slides 24 or a similar explicit value for long papers that need fuller coverage.
Typical timestamped output folder:
outputs/<project_name>/paper/fast/slides_academic_medium_24slides/<timestamp>/
Typical files:
slides.pptx
speaker_script.md
layout_qa.json
checkpoint_slide_spec.json
checkpoint_slide_spec_llm_raw.txt
Meaning:
slides.pptx: editable PowerPoint deck.speaker_script.md: slide-by-slide narration draft.detailed_slides.tex/detailed_slides.pdf: optional reference artifacts generated by the local sidecar code when enabled and whenpdflatexis available.layout_qa.json: spec and layout QA result, including warnings and failed slide indexes.checkpoint_slide_spec.json: final structured slide specification, includingclaim,detail, andevidencefor numbered points.checkpoint_slide_spec_llm_raw.txt: raw LLM output when a curator call was used.
paper2slides/generator/text_pptx_workflow.py
paper2slides/generator/pptx_renderer.py
paper2slides/generator/pptx_qa.py
paper2slides/generator/slide_schema.py
paper2slides/generator/spec_builder.py
paper2slides/generator/content_planner.py
paper2slides/generator/detailed_tex.py
paper2slides/core/stages/rag_stage.py
paper2slides/core/stages/generate_stage.py
paper2slides/core/paths.py
python -m unittest test_phase1_pptx.pyQuick syntax check without writing __pycache__:
$env:PYTHONDONTWRITEBYTECODE="1"
python -c "import ast, pathlib; ast.parse(pathlib.Path('paper2slides/generator/pptx_renderer.py').read_text(encoding='utf-8')); print('AST OK')"If the API call fails:
- Check
paper2slides/.env. - Check
RAG_LLM_BASE_URL. - Check whether the selected model supports the needed context length.
If the deck is too sparse or too dense:
- Try a different
--length. - Use
--slides 24or another explicit count. - Rerun from
--from-stage generate.
If the endpoint is unstable or you want a cheap rerun:
- Set
PPTX_FORCE_DETERMINISTIC=1.
If a slide looks crowded or clipped:
- Inspect
layout_qa.json. - Rerender previews from the saved PPTX if possible.
paper2ppt is derived from HKUDS/Paper2Slides and takes presentation-design inspiration from gejifeng/Paper2PPT. Keep the upstream attribution and license terms when redistributing or extending this project.
