可以將 PDF 電子書依照自己所選的章節切成多個 PDF 檔案,方便後續使用(例如餵給 ChatGPT 或 Notebooklm)
A desktop application that allows users to select a PDF file, view its table of contents, and chunk the PDF into multiple smaller PDF files based on selected chapters. The application runs locally and prioritizes ease of use for PDF processing.
- File Selection: Provides a button to open a file dialog for PDF selection
- ToC Extraction: Automatically extracts the Table of Contents (ToC) from the PDF
- Hierarchical Display: Shows the PDF ToC in a tree-like structure for easy selection
- Smart Selection:
- When a parent chapter is selected, all its child chapters are automatically included in the same chunk
- When a parent chapter is not selected, child chapters can be individually selected
- Automatic Chunking: Automatically determines page ranges and creates new PDF files based on selected chapters
- Friendly Naming: Chunked files are named using the original filename plus the chapter title
- CLI / Headless Mode: Scriptable command-line interface with JSON output, designed for AI agents and automation pipelines
- Programming Language: Python
- PDF Processing: PyMuPDF (fitz)
- GUI Framework: PySide6 (Qt for Python)
- Python 3.8 or higher
- Supported Operating Systems: Windows, macOS, Linux
-
Clone or download this project to your local machine
-
Install dependencies. Choose either method:
With uv (recommended):
uv sync # Installs CLI/runtime + dev dependencies
uv sync --extra gui # Also install PySide6 for the GUIWith pip:
pip install -r requirements.txt- Launch the application:
python pdf_chunker_gui.py-
Click the "Select PDF" button to choose a PDF file to process
-
Check the chapters you want to split in the ToC list:
- Checking a parent chapter will automatically include all its child chapters
- When a parent chapter is not checked, child chapters can be individually checked
-
Click the "Start Chunking" button
-
Select an output directory
-
Wait for the process to complete; the system will display a list of created PDF chunk files
For automation, scripting, or letting an AI agent drive the chunking, use pdf_chunker_cli.py. All commands support --json for structured output.
Thanks to PEP 723 inline metadata, you can run it with zero setup via uv:
uv run pdf_chunker_cli.py inspect book.pdf(uv will auto-create an isolated environment with the required dependencies on first run.)
If you have already run uv sync, you can also use the project environment directly:
uv run python pdf_chunker_cli.py inspect book.pdfLists the Table of Contents with the page range each entry would span if selected individually (span_pages) and how many descendants it has (children). This is what an AI agent reads first to decide how to chunk.
uv run pdf_chunker_cli.py inspect book.pdf # human-readable
uv run pdf_chunker_cli.py inspect book.pdf --json # JSON for scripts/agentsPure dry-run. Returns the chunks that would be produced, including page ranges, page counts, and output paths.
uv run pdf_chunker_cli.py plan book.pdf --level 1 -o ./out --json
uv run pdf_chunker_cli.py plan book.pdf --select 0,3-5 -o ./out --json
uv run pdf_chunker_cli.py plan book.pdf --match "第.*章" -o ./out --jsonSelection modes (mutually exclusive, one required):
--select <indices>: comma-separated indices/ranges, e.g.0,2,5-7. Indices come frominspect.--level <N>: select every ToC entry at level N (e.g.--level 1for top-level chapters).--match <regex>: select every ToC entry whose title matches the regex.
Output options:
--prefix-index: prefix output filenames with a zero-padded index (01_,02_, ...) so chunks sort in reading order in Finder / file managers / tools like NotebookLM. Width ismax(2, digits(total_chunks)).
uv run pdf_chunker_cli.py chunk book.pdf --level 1 -o ./out --prefix-index --json
# → book_01_Introduction.pdf, book_02_Chapter One.pdf, ...Same arguments as plan, but actually writes the output PDFs.
uv run pdf_chunker_cli.py chunk book.pdf --level 1 -o ./out --json-
Go to the GitHub Releases page and download the latest
.dmgfile -
Open the
.dmgfile and drag the PDFChunker application to your Applications folder -
Launch PDFChunker from the Applications folder or Launchpad
-
Follow steps 2-6 as described above
You can create a standalone macOS application (.app bundle) using PyInstaller. This allows users to run the application without needing to install Python or any dependencies.
-
Install PyInstaller: If you haven't already, install PyInstaller:
pip install pyinstaller
-
Navigate to the project directory: Open your terminal and change to the project's root directory:
cd path/to/your/chunk_pdf -
Run PyInstaller: Use the following command to build the application. This command creates a single executable file within an
.appbundle, suitable for GUI applications.pyinstaller --name "PDFChunker" --onefile --windowed --icon="path/to/your/icon.icns" pdf_chunker_gui.py
--name "PDFChunker": Sets the name of your application.--onefile: Bundles everything into a single executable inside the.app.--windowed: Prevents a terminal console window from appearing when the GUI app runs.--icon="path/to/your/icon.icns": (Optional) Specifies the path to your custom application icon (.icnsfile). If you don't have one, you can omit this or create one.pdf_chunker_gui.py: The main script for your application.
-
Find the application: After PyInstaller finishes, you will find the
PDFChunker.app(or the name you specified) inside thedistdirectory within your project folder. -
Distribute: You can then distribute this
.appfile. For wider distribution, consider code signing and notarization for macOS. The generated.appfile should not be committed to the Git repository; instead, use GitHub Releases to distribute it.
Note on .gitignore:
Ensure that PyInstaller's build artifacts are ignored by Git. The .gitignore file in this project should already include:
build/
dist/
*.specpdf_chunker.py: Core logic class for handling PDF loading, ToC extraction, and chunking functionalitypdf_chunker_gui.py: GUI implementation using PySide6 to create the user interfacepdf_chunker_cli.py: Command-line interface (inspect/plan/chunk) with JSON output, suitable for AI agents and automation. Includes PEP 723 inline metadata foruv runzero-setup execution.test_chunker.py: Smoke-test script for testing core logic functionalitytest_cli.py: pytest suite covering the CLI (JSON schema, selection modes, error paths)create_test_pdf.py: Script for creating test PDF filespyproject.toml: uv project configuration (runtime, optionalguiextra, anddevgroup)requirements.txt: Legacy dependency list forpip install
Run the pytest suite (uses uv to manage the dev environment):
uv sync # Installs pytest into .venv
uv run pytest # Runs test_cli.pyThe suite covers the CLI's JSON schema contract, all three selection modes, real PDF output verification, and error exit codes.
The application handles the following situations:
- No ToC: If the PDF has no table of contents, a warning message is displayed
- Encrypted/Unreadable PDF: If the PDF cannot be opened, an error message is displayed
- File I/O Errors: Handles potential errors when saving chunked files
- Filename Sanitization: Automatically cleans invalid characters in chapter titles to ensure valid filenames
This project is licensed under the MIT License.
Feel free to submit issue reports, feature requests, or contribute code directly.
