Skip to content

xiaohou521/pdf-to-md

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

pdf-to-md

Zero-dependency Claude Code skill for converting PDF documents to well-structured Markdown.

Most PDF-to-Markdown tools rely on heavy dependencies — Python virtual environments, AI model downloads (~2GB+), or multi-step setup. This skill takes a different approach: it leverages Claude's native multimodal Read tool to visually parse PDF pages, and uses poppler's pdfimages to extract embedded figures losslessly. No venvs, no pip installs, no model downloads. Just copy one file and start converting.

How it works

  1. Claude's Read tool views the PDF visually - seeing the actual rendered pages, not lossy text extraction
  2. pdfimages (from poppler) extracts embedded figures at their original resolution - no cropping, no guesswork
  3. Claude structures the markdown with full awareness of the document layout - headings, tables, lists, references

Install

# Copy the skill file
mkdir -p ~/.claude/skills/pdf-to-md
cp SKILL.md ~/.claude/skills/pdf-to-md/

# Install poppler for image extraction (optional, skip if PDFs have no figures)
brew install poppler        # macOS
# apt install poppler-utils # Linux

Usage

Just ask Claude Code to convert a PDF:

把这个 PDF 转成 markdown
Convert paper.pdf to markdown
PDF to MD: report.pdf

The skill triggers automatically when it detects PDF-to-markdown intent.

Output

source-document.pdf
source-document.md          # Generated markdown
images/
  figure1_architecture.png  # Extracted figures (original resolution)
  figure2_results.png

Features

  • Heading hierarchy - maps section numbering (1, 1.1, 1.1.1) to markdown levels
  • Image extraction - lossless extraction of embedded figures via pdfimages
  • Table conversion - PDF tables to markdown tables, merged across pages
  • Reference preservation - numbered citations with DOI links
  • Equation support - LaTeX notation where recognizable
  • Gantt charts / timelines - converted to markdown tables
  • Large PDFs - chunked reading (20 pages at a time)
  • Multilingual - preserves original language, supports CJK

Prerequisites

License

MIT

About

Zero-dependency Claude Code skill for converting PDF to Markdown. No venvs, no AI models, works offline.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors