PDF Extractor

A small FastAPI service that extracts Markdown-ready text from PDF files using PyMuPDF and PyMuPDF4LLM.

Features

Accepts PDF uploads via POST /extract
Converts PDF content into Markdown text
Returns page count and extracted character count
Includes health check endpoint
Supports Docker and local development

Requirements

Python 3.12+
uv

Environment

Create a .env file at the project root to customize runtime settings.

Supported variables:

PORT — service port (default: 5000)
MAX_FILE_SIZE_MB — maximum upload size in megabytes (default: 50)
CORS_ORIGINS — allowed CORS origins, comma-separated (default: *)

Local development

uv venv
source .venv/bin/activate
uv sync

Run the API locally:

uv run uvicorn main:app --host 0.0.0.0 --port ${PORT:-5000} --reload

API Endpoints

Health check

GET /health

Response:

{ "status": "ok" }

Extract PDF

POST /extract
Content-Type: multipart/form-data

file: <PDF file>

Successful response:

{
  "text": "...",
  "pages": 10,
  "characters": 12345
}

Errors:

422 if the uploaded file is missing, empty, or not a PDF
413 if the file is larger than MAX_FILE_SIZE_MB
500 for extraction or server errors

Docker

Build and run with Docker Compose:

docker compose up --build

The app will be available at http://localhost:5000 by default.

Notes

Only application/pdf uploads are supported.
Extraction uses pymupdf4llm.to_markdown() to generate Markdown-friendly output.
The service is intentionally small and focused on PDF text extraction.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Extractor

Features

Requirements

Environment

Local development

API Endpoints

Health check

Extract PDF

Docker

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF Extractor

Features

Requirements

Environment

Local development

API Endpoints

Health check

Extract PDF

Docker

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages