paper2ppt

paper2ppt converts academic PDF papers into editable PowerPoint decks and matching speaker scripts.

The current project goal is practical paper presentation generation: reuse the paper's original figures and tables, use only text LLMs for planning and writing, render native editable .pptx slides, and keep a spec-aware QA/repair loop around the output.

This project is built on ideas and code paths from HKUDS/Paper2Slides, and it also borrows presentation-structuring inspiration from gejifeng/Paper2PPT. The main implementation in this repository is still the paper2slides/-based workflow, heavily modified for text-only LLM calls and native PPTX generation.

Project Lineage And Changes

From HKUDS/Paper2Slides, this project keeps the useful paper-processing foundation:

PDF parsing and source asset extraction.
Summary, content planning, and checkpoint-style reruns.
A command-line workflow for turning a paper into presentation material.

The main change is the generation path. The original image-style slide path has been replaced with a lower-cost, text-only LLM workflow:

The model plans structured slide specs instead of generating slide images.
python-pptx renders native editable PowerPoint objects: text boxes, shapes, tables, and inserted source figures.
All model calls are configured to use gpt-5-mini.
The workflow generates a matching speaker_script.md.
Spec and layout QA check for empty components, clipped text, weak metric cards, truncated ellipses, missing numbered-point fields, layout/payload mismatches, and decorative elements that do not carry information.
Numbered points are represented with structured claim, detail, and evidence fields before rendering.
The deck now has presentation structure: title page, contents page, section dividers, key-message blocks, numbered claim/detail/evidence points, source figures, and compact metric cards.

From gejifeng/Paper2PPT, this project mainly borrows product ideas rather than runtime code:

Stronger section-aware paper storytelling.
A more detailed companion-material mindset for long technical papers.
Optional lightweight Beamer/TeX sidecar generation implemented inside this repository.

This repository does not vendor Paper2PPT and does not depend on it at runtime.

Current Status

The project now supports:

Editable PowerPoint output: slides.pptx.
A matching narration draft: speaker_script.md.
An optional lightweight Beamer/TeX sidecar generated by this repository's own code. This is a reference/backup path, not the main deliverable.
LangChain/LangGraph-based text LLM orchestration.
Default text model configuration using gpt-5-mini.
Optional exact slide count with --slides.
Section-aware decks with title page, contents page, section dividers, key-message blocks, structured numbered points, compact metric cards, source figures, and tables.
Spec-aware evaluator plus PPTX layout QA.
Bounded repair loop that reworks only failed slide specs before rerendering.
Layout normalization for unsupported LLM layout names and visual/table layouts without matching payload.
Automatic figure-analysis gating with PPTX_ENABLE_FIGURE_ANALYSIS=auto.
Deterministic fallback generation with PPTX_FORCE_DETERMINISTIC=1 for cheap reruns from existing checkpoints.

The most recent visual iteration focused on making the generated deck look like a real presentation:

Added a proper title page with title, authors, context/date, and summary tiles.
Added a contents page with meaningful section progress lines.
Added section divider pages.
Reworked normal slides into title bar + key message + structured numbered points.
Removed meaningless tiny connector marks beside numbered bullets.
Restored useful decorative bars and tiles when they carry information.
Improved bullet rendering so points show a short claim plus a complete detail sentence instead of clipped ellipses.
Added evaluator-driven repair for missing point fields, low-quality metric labels/values, empty components, unsupported layouts, visual/table payload mismatches, and severe layout defects.

Recommended Test PDF

The current main local test paper is:

test_papers/DeepSeek_V4.pdf

The latest checked output during development was generated under:

outputs/DeepSeek_V4/paper/fast/slides_academic_medium_24slides/

Additional cross-paper checks have also been run with:

test_papers/Deep Residual Learning for Image Recognition.pdf
test_papers/Thinking_with_Visual_Primitives.pdf
test_papers/mHC：Manifold-Constrained Hyper-Connections.pdf

The exact timestamped folder changes per run. A successful run should include:

slides.pptx
speaker_script.md
layout_qa.json

Some runs may also include:

detailed_slides.tex
detailed_slides.pdf

Those TeX/PDF files are optional reference artifacts. The primary deliverables are still slides.pptx and speaker_script.md.

How It Works

PDF
 -> parsing and source asset extraction
 -> summary checkpoint
 -> content plan checkpoint
 -> LangGraph PPTX workflow
    -> source packet
    -> optional source-figure understanding
    -> text LLM deck curation
    -> slide spec validation and numbered-point normalization
    -> native PPTX rendering
    -> spec evaluator + layout QA
    -> failed-slide repair loop
    -> speaker script generation
    -> optional detailed Beamer/TeX sidecar generation

For a diagram and interview-friendly explanation of the evaluator-driven loop, see Agentic PPTX Workflow.

The generated PPTX is not a screenshot deck. It uses native PowerPoint text boxes, shapes, tables, and inserted source images, so it remains editable in PowerPoint.

Requirements

Windows, macOS, or Linux
Python 3.10 or newer; Python 3.12 is recommended
Conda or another Python environment manager
A text-model API compatible with the OpenAI chat-completions interface

The project has been developed in a local conda environment named paper2slides, but the name is not required.

Installation

conda create -n paper2ppt python=3.12
conda activate paper2ppt
pip install -r requirements.txt

If you already have a suitable Python environment:

pip install -r requirements.txt

Configure the API

paper2ppt reads API settings from:

paper2slides/.env

For public GitHub safety, only the template should be committed:

paper2slides/.env.example

If setting up a new clone, create the local env file from the template:

copy paper2slides\.env.example paper2slides\.env

Do not commit the local paper2slides/.env file.

Typical configuration:

RAG_LLM_API_KEY=your_api_key_here
RAG_LLM_BASE_URL=https://api.example.com/v1
LLM_MODEL=gpt-5-mini

Important cost rule:

LLM_MODEL=gpt-5-mini
PPTX_VISION_MODEL=gpt-5-mini

If model calls are not needed and you only want to rerender from existing checkpoints:

PPTX_FORCE_DETERMINISTIC=1

Optional figure understanding:

PPTX_ENABLE_FIGURE_ANALYSIS=auto
PPTX_VISION_MODEL=gpt-5-mini
PPTX_MAX_FIGURE_ANALYSIS=5

This optional step analyzes source paper figures. It does not generate new images. In auto mode, it only runs when figure captions look too weak for reliable slide curation. Use 1 to force it on or 0 to force it off.

In fast paper mode, redundant paper_info RAG querying is skipped because paper metadata is extracted directly from parsed markdown during summary generation.

Run

Typical run:

python -m paper2slides --input test_papers\DeepSeek_V4.pdf --output slides --style academic --length medium --slides 24 --fast

Cheap rerun from existing checkpoints:

$env:PPTX_FORCE_DETERMINISTIC="1"
python -m paper2slides --input test_papers\DeepSeek_V4.pdf --output slides --style academic --length medium --slides 24 --fast --from-stage generate

Main options:

--input       PDF file path
--output      slides
--style       academic or a custom style description
--length      short, medium, or long
--slides      exact target content-slide count; overrides --length
--fast        use direct parsing/query flow instead of full indexing
--from-stage  rag, summary, plan, or generate
--list        list previous outputs
--debug       print more logs

Dynamic slide count:

short   roughly 8-12 content slides
medium  roughly 14-22 content slides
long    roughly 24-36 content slides

Use --slides 24 or a similar explicit value for long papers that need fuller coverage.

Output Files

Typical timestamped output folder:

outputs/<project_name>/paper/fast/slides_academic_medium_24slides/<timestamp>/

Typical files:

slides.pptx
speaker_script.md
layout_qa.json
checkpoint_slide_spec.json
checkpoint_slide_spec_llm_raw.txt

Meaning:

slides.pptx: editable PowerPoint deck.
speaker_script.md: slide-by-slide narration draft.
detailed_slides.tex / detailed_slides.pdf: optional reference artifacts generated by the local sidecar code when enabled and when pdflatex is available.
layout_qa.json: spec and layout QA result, including warnings and failed slide indexes.
checkpoint_slide_spec.json: final structured slide specification, including claim, detail, and evidence for numbered points.
checkpoint_slide_spec_llm_raw.txt: raw LLM output when a curator call was used.

Important Implementation Files

paper2slides/generator/text_pptx_workflow.py
paper2slides/generator/pptx_renderer.py
paper2slides/generator/pptx_qa.py
paper2slides/generator/slide_schema.py
paper2slides/generator/spec_builder.py
paper2slides/generator/content_planner.py
paper2slides/generator/detailed_tex.py
paper2slides/core/stages/rag_stage.py
paper2slides/core/stages/generate_stage.py
paper2slides/core/paths.py

Test

python -m unittest test_phase1_pptx.py

Quick syntax check without writing __pycache__:

$env:PYTHONDONTWRITEBYTECODE="1"
python -c "import ast, pathlib; ast.parse(pathlib.Path('paper2slides/generator/pptx_renderer.py').read_text(encoding='utf-8')); print('AST OK')"

Troubleshooting

If the API call fails:

Check paper2slides/.env.
Check RAG_LLM_BASE_URL.
Check whether the selected model supports the needed context length.

If the deck is too sparse or too dense:

Try a different --length.
Use --slides 24 or another explicit count.
Rerun from --from-stage generate.

If the endpoint is unstable or you want a cheap rerun:

Set PPTX_FORCE_DETERMINISTIC=1.

If a slide looks crowded or clipped:

Inspect layout_qa.json.
Rerender previews from the saved PPTX if possible.

Attribution

paper2ppt is derived from HKUDS/Paper2Slides and takes presentation-design inspiration from gejifeng/Paper2PPT. Keep the upstream attribution and license terms when redistributing or extending this project.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
docs		docs
paper2slides		paper2slides
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
paper2ppt_preview.jpg		paper2ppt_preview.jpg
requirements.txt		requirements.txt
test_phase1_pptx.py		test_phase1_pptx.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

paper2ppt

Project Lineage And Changes

Current Status

Recommended Test PDF

How It Works

Requirements

Installation

Configure the API

Run

Output Files

Important Implementation Files

Test

Troubleshooting

Attribution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

paper2ppt

Project Lineage And Changes

Current Status

Recommended Test PDF

How It Works

Requirements

Installation

Configure the API

Run

Output Files

Important Implementation Files

Test

Troubleshooting

Attribution

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages