BlackFeatherPDF is a forensic PDF fingerprinting toolkit. It issues uniquely fingerprinted PDFs, stores local issuance history, and inspects leaked documents or scan images against that local history.
v1.0 is a CLI-first release:
- Track A is the stable workflow for direct leaked-PDF attribution.
- Track B is an experimental rendered-image crop workflow.
- PrintScan is an experimental full-page flatbed/ADF scanner workflow.
- FastAPI is available as a local automation wrapper, not as a public production service.
This project is a forensic attribution tool, not a claim of edit-proof, sanitize-proof, or print-proof PDF security.
Use Python 3.12 or newer.
python -m pip install -e ".[dev]"After installation, either command form works:
blackfeatherpdf --version
python .\run_blackfeatherpdf.py --versionRun the test suite:
pytest -qCheck runtime dependencies and local paths:
blackfeatherpdf doctor
blackfeatherpdf configThe first runtime command creates the local state files:
data/master_key.bindata/blackfeather.sqlite3
Set these environment variables when you need isolated state for tests or release smoke runs:
BLACKFEATHER_DATA_DIRBLACKFEATHER_DB_PATHBLACKFEATHER_KEY_PATH
Generated runtime data belongs in data/, dist/, or outputs/; those directories are intentionally ignored by Git.
Track A is the v1.0 stable path. It embeds an encrypted fingerprint token as invisible PDF text on each page by using PDF text render mode 3. The inspector extracts candidate hidden tokens from the PDF structure, decrypts the token with the local master key, and maps the payload back to the SQLite issue record.
Issue a PDF:
blackfeatherpdf issue `
'.\dist\sample_pdf.pdf' `
--recipient-id 'user-001' `
--output '.\dist\user-001.pdf'Inspect a leaked PDF:
blackfeatherpdf inspect '.\dist\user-001.pdf'List issue records:
blackfeatherpdf records --track official --limit 5Track A limitations:
- It is meant for PDFs that are leaked in digital form.
- Acrobat sanitize, object rewriting, rasterization, or document regeneration may remove or damage the fingerprint.
- It does not make a PDF tamper-proof.
Track B embeds a subtle repeated raster tile into each PDF page. It is meant for fast iteration on rendered screenshots and cropped page images, not for production guarantees.
Issue a Track B PDF:
blackfeatherpdf trackb-issue `
'.\dist\sample_pdf.pdf' `
--recipient-id 'user-101' `
--output '.\dist\user-101-trackb.pdf'Render, crop, and inspect a page image:
blackfeatherpdf trackb-render `
'.\dist\user-101-trackb.pdf' `
--page 1 `
--dpi 144 `
--output '.\dist\trackb-page-1.png'
blackfeatherpdf trackb-crop `
'.\dist\trackb-page-1.png' `
--left 220 --top 220 --width 540 --height 420 `
--output '.\dist\trackb-crop.png'
blackfeatherpdf trackb-inspect-image `
'.\dist\trackb-crop.png' `
--dpi 144 `
--top-k 3Run the synthetic crop benchmark:
blackfeatherpdf trackb-benchmark `
'.\dist\sample_pdf.pdf' `
--candidate-count 4 `
--page 1 `
--top-k 3Track B limitations:
- Matching is candidate-based against the local SQLite issue database.
- Scale and rotation search are bounded, not general geometric recovery.
- Larger crops currently separate candidates better than small crops.
- Arbitrary camera perspective and hostile image transformations are outside the v1.0 guarantee.
PrintScan targets full-page flatbed or ADF scanner images. It embeds a low-strength tiled raster carrier, stores each issued candidate locally, and inspects a scan by scoring local candidates plus a repeated header.
Issue a PrintScan PDF:
blackfeatherpdf printscan-issue `
'.\dist\sample_pdf.pdf' `
--recipient-id 'scan-user-001' `
--output '.\dist\scan-user-001-printscan.pdf'Inspect a 300 DPI full-page scan:
blackfeatherpdf printscan-inspect `
'.\outputs\scan-user-001-page-1.png' `
--dpi 300 `
--top-k 5 `
--jsonRun the synthetic full-page benchmark:
blackfeatherpdf printscan-benchmark `
'.\dist\sample_pdf.pdf' `
--candidate-count 4 `
--top-k 5Run a manifest-based dataset benchmark:
blackfeatherpdf printscan-dataset-benchmark '.\outputs\printscan-manifest.json' --top-k 5The printscan-inspect --json response includes:
acceptedfailure_reasontop_scorescore_gapconfidencepage_rectifieddecoded_headermatches
PrintScan limitations:
- Full-page flatbed/ADF scans are in scope.
- Phone photos, partial scan crops, and strong perspective camera shots are outside the v1.0 guarantee.
- The repeated header validates version/page/sync information; it does not recover the full payload by itself.
- Default strength prioritizes near-invisibility over robustness. Use
--strengthduring calibration.
Launch the local API:
blackfeatherpdf api --host 127.0.0.1 --port 8000The API exposes local-path based endpoints for the same workflows as the CLI:
GET /healthPOST /issuePOST /inspectPOST /trackb/issuePOST /trackb/renderPOST /trackb/cropPOST /trackb/inspect-imagePOST /trackb/print-scan/simulatePOST /trackb/print-scan/inspectPOST /trackb/benchmarkPOST /printscan/issuePOST /printscan/inspectPOST /printscan/benchmark
The v1.0 API is for local and internal automation only. It has no authentication, authorization, rate limiting, upload handling, or deployment hardening.
Run these before tagging a v1.0 release:
pytest -q
python -m build
blackfeatherpdf doctor
blackfeatherpdf --versionFor release smoke runs, prefer isolated output state:
$env:BLACKFEATHER_DATA_DIR='.\outputs\release\v1.0\data'
blackfeatherpdf issue '.\dist\sample_pdf.pdf' --recipient-id 'release-smoke' --output '.\outputs\release\v1.0\release-smoke.pdf'
blackfeatherpdf inspect '.\outputs\release\v1.0\release-smoke.pdf'BlackFeatherPDF is released under the MIT License. See LICENSE.