This script is the version 1 for certificate and receipt splitter. It automates the processing of certificate and receipt PDFs by splitting them into individual files named according to participant names from a csv file. It first merges input PDFs, then splits them page-by-page, and finally packages the result into a ZIP archive.
Note: This script is built specifically for my part time job. The use case is very restrictive and only for very specific task. Modify the code if there are any use cases for you.
- Reads participant names from a CSV file
- Merges multiple certificate and receipt PDFs into single documents
- Splits merged PDFs into individual pages named with participant information
- Organizes output into course-specific folders
- Creates ZIP archives of the output folders
- Optional cleanup of intermediate directories
- Comprehensive logging with rotation (max 10 MB per log file, 5 backups)
Install required packages:
pip install PyPDF2The script expects the following directory layout:
project/
├── data/
│ ├── participants.csv # Participant names (first column)
│ ├── Certificate/ # Input certificate PDFs
│ └── Receipt/ # Input receipt PDFs
└── main.py # This script
- Place your participant names in
data/participants.csvwith one name per row (first column only) - Place certificate PDFs in
data/Certificate/ - Place receipt PDFs in
data/Receipt/ - Run the script:
python main.py- Enter the course code when prompted
The script generates:
Certificate_{course_code}/containing individual certificate PDFsReceipt_{course_code}/containing individual receipt PDFs- ZIP archives:
Certificate_{course_code}.zipandReceipt_{course_code}.zip
By default, the script retains the output folders after creating ZIP files. To automatically remove the folders after zipping, call main(remove=True).
The CSV file must contain participant names in the first column. The first row is treated as a header and skipped.
Example participants.csv:
Name
John Doe
Jane Smith
Robert JohnsonLogs are written to:
- Console (stdout)
app.logfile with automatic rotation (10 MB max size, 5 backup files)
Log messages include timestamps, logger name, severity level, and message content.
The script includes error handling for:
- Missing CSV or PDF files
- Mismatched page counts between PDFs and participant names
- File system errors during read/write operations
- PDF processing errors
Errors are logged with descriptive messages and the script exits gracefully on critical failures.
Temporary merged PDF files (temp_combined_certificates.pdf and temp_combined_receipts.pdf) are automatically deleted after processing.
Output folders can be automatically removed after ZIP creation by setting remove=True in the main() call.
This project is licensed under the MIT License. See the LICENSE file for details.