Skip to content

AR0NICA/BlackFeatherPDF

Repository files navigation

BlackFeatherPDF v1.0

BlackFeatherPDF is a forensic PDF fingerprinting toolkit. It issues uniquely fingerprinted PDFs, stores local issuance history, and inspects leaked documents or scan images against that local history.

v1.0 is a CLI-first release:

  • Track A is the stable workflow for direct leaked-PDF attribution.
  • Track B is an experimental rendered-image crop workflow.
  • PrintScan is an experimental full-page flatbed/ADF scanner workflow.
  • FastAPI is available as a local automation wrapper, not as a public production service.

This project is a forensic attribution tool, not a claim of edit-proof, sanitize-proof, or print-proof PDF security.

Install

Use Python 3.12 or newer.

python -m pip install -e ".[dev]"

After installation, either command form works:

blackfeatherpdf --version
python .\run_blackfeatherpdf.py --version

Run the test suite:

pytest -q

Check runtime dependencies and local paths:

blackfeatherpdf doctor
blackfeatherpdf config

The first runtime command creates the local state files:

  • data/master_key.bin
  • data/blackfeather.sqlite3

Set these environment variables when you need isolated state for tests or release smoke runs:

  • BLACKFEATHER_DATA_DIR
  • BLACKFEATHER_DB_PATH
  • BLACKFEATHER_KEY_PATH

Generated runtime data belongs in data/, dist/, or outputs/; those directories are intentionally ignored by Git.

Track A: Direct PDF Attribution

Track A is the v1.0 stable path. It embeds an encrypted fingerprint token as invisible PDF text on each page by using PDF text render mode 3. The inspector extracts candidate hidden tokens from the PDF structure, decrypts the token with the local master key, and maps the payload back to the SQLite issue record.

Issue a PDF:

blackfeatherpdf issue `
  '.\dist\sample_pdf.pdf' `
  --recipient-id 'user-001' `
  --output '.\dist\user-001.pdf'

Inspect a leaked PDF:

blackfeatherpdf inspect '.\dist\user-001.pdf'

List issue records:

blackfeatherpdf records --track official --limit 5

Track A limitations:

  • It is meant for PDFs that are leaked in digital form.
  • Acrobat sanitize, object rewriting, rasterization, or document regeneration may remove or damage the fingerprint.
  • It does not make a PDF tamper-proof.

Track B: Experimental Crop Workflow

Track B embeds a subtle repeated raster tile into each PDF page. It is meant for fast iteration on rendered screenshots and cropped page images, not for production guarantees.

Issue a Track B PDF:

blackfeatherpdf trackb-issue `
  '.\dist\sample_pdf.pdf' `
  --recipient-id 'user-101' `
  --output '.\dist\user-101-trackb.pdf'

Render, crop, and inspect a page image:

blackfeatherpdf trackb-render `
  '.\dist\user-101-trackb.pdf' `
  --page 1 `
  --dpi 144 `
  --output '.\dist\trackb-page-1.png'

blackfeatherpdf trackb-crop `
  '.\dist\trackb-page-1.png' `
  --left 220 --top 220 --width 540 --height 420 `
  --output '.\dist\trackb-crop.png'

blackfeatherpdf trackb-inspect-image `
  '.\dist\trackb-crop.png' `
  --dpi 144 `
  --top-k 3

Run the synthetic crop benchmark:

blackfeatherpdf trackb-benchmark `
  '.\dist\sample_pdf.pdf' `
  --candidate-count 4 `
  --page 1 `
  --top-k 3

Track B limitations:

  • Matching is candidate-based against the local SQLite issue database.
  • Scale and rotation search are bounded, not general geometric recovery.
  • Larger crops currently separate candidates better than small crops.
  • Arbitrary camera perspective and hostile image transformations are outside the v1.0 guarantee.

PrintScan: Experimental Full-Page Scanner Workflow

PrintScan targets full-page flatbed or ADF scanner images. It embeds a low-strength tiled raster carrier, stores each issued candidate locally, and inspects a scan by scoring local candidates plus a repeated header.

Issue a PrintScan PDF:

blackfeatherpdf printscan-issue `
  '.\dist\sample_pdf.pdf' `
  --recipient-id 'scan-user-001' `
  --output '.\dist\scan-user-001-printscan.pdf'

Inspect a 300 DPI full-page scan:

blackfeatherpdf printscan-inspect `
  '.\outputs\scan-user-001-page-1.png' `
  --dpi 300 `
  --top-k 5 `
  --json

Run the synthetic full-page benchmark:

blackfeatherpdf printscan-benchmark `
  '.\dist\sample_pdf.pdf' `
  --candidate-count 4 `
  --top-k 5

Run a manifest-based dataset benchmark:

blackfeatherpdf printscan-dataset-benchmark '.\outputs\printscan-manifest.json' --top-k 5

The printscan-inspect --json response includes:

  • accepted
  • failure_reason
  • top_score
  • score_gap
  • confidence
  • page_rectified
  • decoded_header
  • matches

PrintScan limitations:

  • Full-page flatbed/ADF scans are in scope.
  • Phone photos, partial scan crops, and strong perspective camera shots are outside the v1.0 guarantee.
  • The repeated header validates version/page/sync information; it does not recover the full payload by itself.
  • Default strength prioritizes near-invisibility over robustness. Use --strength during calibration.

Local FastAPI Wrapper

Launch the local API:

blackfeatherpdf api --host 127.0.0.1 --port 8000

The API exposes local-path based endpoints for the same workflows as the CLI:

  • GET /health
  • POST /issue
  • POST /inspect
  • POST /trackb/issue
  • POST /trackb/render
  • POST /trackb/crop
  • POST /trackb/inspect-image
  • POST /trackb/print-scan/simulate
  • POST /trackb/print-scan/inspect
  • POST /trackb/benchmark
  • POST /printscan/issue
  • POST /printscan/inspect
  • POST /printscan/benchmark

The v1.0 API is for local and internal automation only. It has no authentication, authorization, rate limiting, upload handling, or deployment hardening.

Release Checks

Run these before tagging a v1.0 release:

pytest -q
python -m build
blackfeatherpdf doctor
blackfeatherpdf --version

For release smoke runs, prefer isolated output state:

$env:BLACKFEATHER_DATA_DIR='.\outputs\release\v1.0\data'
blackfeatherpdf issue '.\dist\sample_pdf.pdf' --recipient-id 'release-smoke' --output '.\outputs\release\v1.0\release-smoke.pdf'
blackfeatherpdf inspect '.\outputs\release\v1.0\release-smoke.pdf'

License

BlackFeatherPDF is released under the MIT License. See LICENSE.

About

PDF fingerprinting prototype

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors

Languages