Automated EPUB editor — batch checks and fixes for EPUB libraries at scale.
Processes a library of EPUB files (500–5000+) running a configurable pipeline of checks and fixes to normalise and improve EPUB quality.
- Chapter detection — Detects missing or empty chapter tables of contents. Scans XHTML content for heading patterns and injects chapter markers into
toc.ncx/nav.xhtml. - CSS border/margin normalisation — Scans all CSS for border, margin, and padding properties. Flags values that differ from configurable targets and normalises them.
- Missing metadata fixer — Reads OPF
<metadata>and infersdc:title,dc:creator,dc:language, anddc:datefrom filenames when absent. - CSS formatting normalisation — Detects non-standard font-size, line-height, text-align, and margin values on paragraph-level elements.
- Broken link checker — Verifies internal
hreftargets exist within the EPUB. Optionally validates external URLs. - Compression and cleanup — Strips extraneous files (
.DS_Store,thumbs.db) and re-packs with optimal compression.
git clone https://github.com/binhex/boozarr
cd boozarr
uv venv --quiet
uv syncRun in dry-run mode (check-only, no modifications):
boozarr --library-path /path/to/epub/libraryApply fixes:
boozarr --library-path /path/to/epub/library --fixBackups are created automatically (on by default). Disable with:
boozarr --library-path /path/to/epub/library --fix --no-backupCustomise CSS targets:
boozarr --library-path /path/to/epub/library --border 1px --margin 0 --padding 0| Flag | Default | Description |
|---|---|---|
--library-path |
required | Directory containing EPUB files to process (recursive scan) |
--fix |
off | Apply fixes; default is dry-run (check-only) |
--no-backup |
off | Disable automatic .bak backups (on by default) |
--db-path |
<project>/db/boozarr.db |
SQLite database for tracking processed files |
--log-path |
<project>/logs/boozarr.log |
Log file path |
--log-level |
INFO |
Logging level (DEBUG, INFO, SUCCESS, WARNING, ERROR) |
--border |
— | Target border value (only applied when specified) |
--margin |
— | Target margin value (only applied when specified) |
--padding |
— | Target padding value (only applied when specified) |
--font-size |
— | Target base font size (only applied when specified) |
--line-height |
1.2 | Target line height (only applied when specified) |
--text-indent |
0 | Target text indent, px (only applied when specified) |
--paragraph-spacing |
— | Target paragraph spacing (only applied when specified) |
--text-align |
left | Target text-align (left, center, right, justify) |
--check-external-links |
off | Validate external URLs via HEAD requests |
--compress |
— | Apply EPUB recompression level (0=store, 9=best, only when specified) |
For each EPUB file in the library:
- EpubWrapper validates the ZIP structure and extracts to a temp directory.
- Each enabled processor runs
check()against the extracted EPUB, reporting issues. - If
--fixis set, enabled processors apply their fixes. - The modified EPUB is re-packed with compression.
- A result is logged per-file:
[OK],[WARN],[ERR], or[SKIP].
Unchanged files are skipped on re-run (tracked by SHA-256 hash and CLI config hash in SQLite).
git clone https://github.com/binhex/boozarr
cd boozarr
uv venv --quiet
uv sync --extra devRun tests:
uv run pytest -vLint and type-check:
uv run ruff check src/boozarr/ tests/
uv run mypy src/boozarr/Pre-commit (run before committing):
uv run pre-commit run --all-filesGNU General Public License v3.0 or later. See LICENSE.