Skip to content

celestial-micha/paper2ppt

Repository files navigation

paper2ppt

English | 中文

paper2ppt converts academic PDF papers into editable PowerPoint decks and matching speaker scripts.

The current project goal is practical paper presentation generation: reuse the paper's original figures and tables, use only text LLMs for planning and writing, render native editable .pptx slides, and keep a spec-aware QA/repair loop around the output.

This project is built on ideas and code paths from HKUDS/Paper2Slides, and it also borrows presentation-structuring inspiration from gejifeng/Paper2PPT. The main implementation in this repository is still the paper2slides/-based workflow, heavily modified for text-only LLM calls and native PPTX generation.

paper2ppt preview

Project Lineage And Changes

From HKUDS/Paper2Slides, this project keeps the useful paper-processing foundation:

  • PDF parsing and source asset extraction.
  • Summary, content planning, and checkpoint-style reruns.
  • A command-line workflow for turning a paper into presentation material.

The main change is the generation path. The original image-style slide path has been replaced with a lower-cost, text-only LLM workflow:

  • The model plans structured slide specs instead of generating slide images.
  • python-pptx renders native editable PowerPoint objects: text boxes, shapes, tables, and inserted source figures.
  • All model calls are configured to use gpt-5-mini.
  • The workflow generates a matching speaker_script.md.
  • Spec and layout QA check for empty components, clipped text, weak metric cards, truncated ellipses, missing numbered-point fields, layout/payload mismatches, and decorative elements that do not carry information.
  • Numbered points are represented with structured claim, detail, and evidence fields before rendering.
  • The deck now has presentation structure: title page, contents page, section dividers, key-message blocks, numbered claim/detail/evidence points, source figures, and compact metric cards.

From gejifeng/Paper2PPT, this project mainly borrows product ideas rather than runtime code:

  • Stronger section-aware paper storytelling.
  • A more detailed companion-material mindset for long technical papers.
  • Optional lightweight Beamer/TeX sidecar generation implemented inside this repository.

This repository does not vendor Paper2PPT and does not depend on it at runtime.

Current Status

The project now supports:

  • Editable PowerPoint output: slides.pptx.
  • A matching narration draft: speaker_script.md.
  • An optional lightweight Beamer/TeX sidecar generated by this repository's own code. This is a reference/backup path, not the main deliverable.
  • LangChain/LangGraph-based text LLM orchestration.
  • Default text model configuration using gpt-5-mini.
  • Optional exact slide count with --slides.
  • Section-aware decks with title page, contents page, section dividers, key-message blocks, structured numbered points, compact metric cards, source figures, and tables.
  • Spec-aware evaluator plus PPTX layout QA.
  • Bounded repair loop that reworks only failed slide specs before rerendering.
  • Layout normalization for unsupported LLM layout names and visual/table layouts without matching payload.
  • Automatic figure-analysis gating with PPTX_ENABLE_FIGURE_ANALYSIS=auto.
  • Deterministic fallback generation with PPTX_FORCE_DETERMINISTIC=1 for cheap reruns from existing checkpoints.

The most recent visual iteration focused on making the generated deck look like a real presentation:

  • Added a proper title page with title, authors, context/date, and summary tiles.
  • Added a contents page with meaningful section progress lines.
  • Added section divider pages.
  • Reworked normal slides into title bar + key message + structured numbered points.
  • Removed meaningless tiny connector marks beside numbered bullets.
  • Restored useful decorative bars and tiles when they carry information.
  • Improved bullet rendering so points show a short claim plus a complete detail sentence instead of clipped ellipses.
  • Added evaluator-driven repair for missing point fields, low-quality metric labels/values, empty components, unsupported layouts, visual/table payload mismatches, and severe layout defects.

Recommended Test PDF

The current main local test paper is:

test_papers/DeepSeek_V4.pdf

The latest checked output during development was generated under:

outputs/DeepSeek_V4/paper/fast/slides_academic_medium_24slides/

Additional cross-paper checks have also been run with:

test_papers/Deep Residual Learning for Image Recognition.pdf
test_papers/Thinking_with_Visual_Primitives.pdf
test_papers/mHC:Manifold-Constrained Hyper-Connections.pdf

The exact timestamped folder changes per run. A successful run should include:

slides.pptx
speaker_script.md
layout_qa.json

Some runs may also include:

detailed_slides.tex
detailed_slides.pdf

Those TeX/PDF files are optional reference artifacts. The primary deliverables are still slides.pptx and speaker_script.md.

How It Works

PDF
 -> parsing and source asset extraction
 -> summary checkpoint
 -> content plan checkpoint
 -> LangGraph PPTX workflow
    -> source packet
    -> optional source-figure understanding
    -> text LLM deck curation
    -> slide spec validation and numbered-point normalization
    -> native PPTX rendering
    -> spec evaluator + layout QA
    -> failed-slide repair loop
    -> speaker script generation
    -> optional detailed Beamer/TeX sidecar generation

For a diagram and interview-friendly explanation of the evaluator-driven loop, see Agentic PPTX Workflow.

The generated PPTX is not a screenshot deck. It uses native PowerPoint text boxes, shapes, tables, and inserted source images, so it remains editable in PowerPoint.

Requirements

  • Windows, macOS, or Linux
  • Python 3.10 or newer; Python 3.12 is recommended
  • Conda or another Python environment manager
  • A text-model API compatible with the OpenAI chat-completions interface

The project has been developed in a local conda environment named paper2slides, but the name is not required.

Installation

conda create -n paper2ppt python=3.12
conda activate paper2ppt
pip install -r requirements.txt

If you already have a suitable Python environment:

pip install -r requirements.txt

Configure the API

paper2ppt reads API settings from:

paper2slides/.env

For public GitHub safety, only the template should be committed:

paper2slides/.env.example

If setting up a new clone, create the local env file from the template:

copy paper2slides\.env.example paper2slides\.env

Do not commit the local paper2slides/.env file.

Typical configuration:

RAG_LLM_API_KEY=your_api_key_here
RAG_LLM_BASE_URL=https://api.example.com/v1
LLM_MODEL=gpt-5-mini

Important cost rule:

LLM_MODEL=gpt-5-mini
PPTX_VISION_MODEL=gpt-5-mini

If model calls are not needed and you only want to rerender from existing checkpoints:

PPTX_FORCE_DETERMINISTIC=1

Optional figure understanding:

PPTX_ENABLE_FIGURE_ANALYSIS=auto
PPTX_VISION_MODEL=gpt-5-mini
PPTX_MAX_FIGURE_ANALYSIS=5

This optional step analyzes source paper figures. It does not generate new images. In auto mode, it only runs when figure captions look too weak for reliable slide curation. Use 1 to force it on or 0 to force it off.

In fast paper mode, redundant paper_info RAG querying is skipped because paper metadata is extracted directly from parsed markdown during summary generation.

Run

Typical run:

python -m paper2slides --input test_papers\DeepSeek_V4.pdf --output slides --style academic --length medium --slides 24 --fast

Cheap rerun from existing checkpoints:

$env:PPTX_FORCE_DETERMINISTIC="1"
python -m paper2slides --input test_papers\DeepSeek_V4.pdf --output slides --style academic --length medium --slides 24 --fast --from-stage generate

Main options:

--input       PDF file path
--output      slides
--style       academic or a custom style description
--length      short, medium, or long
--slides      exact target content-slide count; overrides --length
--fast        use direct parsing/query flow instead of full indexing
--from-stage  rag, summary, plan, or generate
--list        list previous outputs
--debug       print more logs

Dynamic slide count:

short   roughly 8-12 content slides
medium  roughly 14-22 content slides
long    roughly 24-36 content slides

Use --slides 24 or a similar explicit value for long papers that need fuller coverage.

Output Files

Typical timestamped output folder:

outputs/<project_name>/paper/fast/slides_academic_medium_24slides/<timestamp>/

Typical files:

slides.pptx
speaker_script.md
layout_qa.json
checkpoint_slide_spec.json
checkpoint_slide_spec_llm_raw.txt

Meaning:

  • slides.pptx: editable PowerPoint deck.
  • speaker_script.md: slide-by-slide narration draft.
  • detailed_slides.tex / detailed_slides.pdf: optional reference artifacts generated by the local sidecar code when enabled and when pdflatex is available.
  • layout_qa.json: spec and layout QA result, including warnings and failed slide indexes.
  • checkpoint_slide_spec.json: final structured slide specification, including claim, detail, and evidence for numbered points.
  • checkpoint_slide_spec_llm_raw.txt: raw LLM output when a curator call was used.

Important Implementation Files

paper2slides/generator/text_pptx_workflow.py
paper2slides/generator/pptx_renderer.py
paper2slides/generator/pptx_qa.py
paper2slides/generator/slide_schema.py
paper2slides/generator/spec_builder.py
paper2slides/generator/content_planner.py
paper2slides/generator/detailed_tex.py
paper2slides/core/stages/rag_stage.py
paper2slides/core/stages/generate_stage.py
paper2slides/core/paths.py

Test

python -m unittest test_phase1_pptx.py

Quick syntax check without writing __pycache__:

$env:PYTHONDONTWRITEBYTECODE="1"
python -c "import ast, pathlib; ast.parse(pathlib.Path('paper2slides/generator/pptx_renderer.py').read_text(encoding='utf-8')); print('AST OK')"

Troubleshooting

If the API call fails:

  • Check paper2slides/.env.
  • Check RAG_LLM_BASE_URL.
  • Check whether the selected model supports the needed context length.

If the deck is too sparse or too dense:

  • Try a different --length.
  • Use --slides 24 or another explicit count.
  • Rerun from --from-stage generate.

If the endpoint is unstable or you want a cheap rerun:

  • Set PPTX_FORCE_DETERMINISTIC=1.

If a slide looks crowded or clipped:

  • Inspect layout_qa.json.
  • Rerender previews from the saved PPTX if possible.

Attribution

paper2ppt is derived from HKUDS/Paper2Slides and takes presentation-design inspiration from gejifeng/Paper2PPT. Keep the upstream attribution and license terms when redistributing or extending this project.

About

Convert research paper PDFs into editable PowerPoint decks and speaker scripts using text LLMs, LangGraph, and original paper figures.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages