Skip to content

Caps3n/paperflow

Repository files navigation

πŸ“„ paperflow

Automatically fetch invoices from online providers and import them into Paperless-NGX.

Version License Docker Python Buy Me A Coffee

paperflow runs as a Docker container, periodically logs into your provider accounts, downloads invoices as PDFs, and uploads them to your Paperless-NGX instance β€” fully automatically. A SQLite database tracks which invoices have already been processed to avoid duplicates.

A built-in web interface (port 8085) lets you configure everything, manage providers, view the invoice history, and watch live logs β€” no terminal needed.


✨ Features

  • Automatic invoice download from Amazon.de / Amazon.com, IKEA, and Klarna
  • Paperless-NGX upload via REST API β€” sets tags, correspondent, date, and title automatically
  • Product title extraction β€” Paperless title shows the actual product name, not just the order number
  • Duplicate prevention β€” SQLite database tracks every processed invoice
  • Year-skip optimization β€” past years that were fully scanned are skipped on subsequent runs
  • Incremental scan mode β€” optionally scan only the last 30 days for fast daily runs
  • Parallel uploads β€” multiple PDFs uploaded simultaneously (configurable workers)
  • Correspondent dropdown β€” select the correct Paperless-NGX correspondent from a live list
  • Year tags β€” each invoice is automatically tagged with its year (e.g. 2024)
  • Progress bar β€” real-time upload progress shown in the web UI
  • Error categories β€” history shows whether failure was no PDF, Download βœ—, or Upload βœ—
  • Plugin architecture β€” add new providers by dropping a single .py file
  • CDP browser mode β€” connects to a persistent Chrome instance via Remote Debugging (no repeated logins, supports 2FA)
  • Cookie import β€” log in via Cookie Editor extension instead of VNC (useful for IKEA, Amazon)
  • Docker-first β€” two containers: paperflow (FastAPI + Python) + paperflow-chrome (Chrome + noVNC)

πŸš€ Quick Start (Docker Compose)

1. Clone the repo

git clone https://github.com/Caps3n/paperflow.git
cd paperflow

2. Configure

cp .env.example .env

Edit .env:

Variable Description Default
PAPERLESS_URL Your Paperless-NGX URL http://paperless:8000
PAPERLESS_TOKEN API token from Paperless-NGX admin β€”
AMAZON_EMAIL Amazon account email β€”
AMAZON_PASSWORD Amazon account password β€”
AMAZON_DOMAIN amazon.de or amazon.com amazon.de
AMAZON_MONTHS_BACK How many months back to scan 12
IKEA_EMAIL IKEA account email β€”
IKEA_PASSWORD IKEA account password β€”
UPLOAD_WORKERS Parallel upload threads 3
RUN_INTERVAL_HOURS How often to run (hours) 24

3. Start

docker compose up -d

Open the web interface at http://localhost:8085

On first run, open the browser at http://localhost:6080 (noVNC), log into Amazon or IKEA manually once β€” the session is then reused automatically.


πŸš€ Portainer Deployment

  1. In Portainer β†’ Stacks β†’ Add Stack β†’ Repository
  2. Set:
    • Repository URL: https://github.com/Caps3n/paperflow
    • Compose path: docker-compose.portainer.yml
  3. Add environment variables in the Environment variables tab (see table above)
  4. Click Deploy

Portainer builds the paperflow-chrome browser container from source and pulls paperflow from ghcr.io automatically.


πŸ–₯️ Web Interface

Page Description
Dashboard Stats, progress bar, last run status, manual trigger
Settings Edit all credentials and intervals in-browser
Providers Enable/disable providers, edit tags & correspondent, upload custom .py scripts
History Invoice history with status, error category, and link to Paperless document
Logs Live log output with auto-refresh

πŸ”’ Security

By default the web UI is accessible without authentication. To enable login protection:

UI_USER=admin
UI_PASSWORD=yourpassword

Or set it in Settings β†’ Security in the web UI.

Note: paperflow runs HTTP only. For external access, place it behind a reverse proxy with TLS (e.g. Caddy for automatic HTTPS).


πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    CDP     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   paperflow             β”‚ ─────────► β”‚   paperflow-chrome   β”‚
β”‚   FastAPI + Python      β”‚            β”‚   Chrome + noVNC     β”‚
β”‚   port 8085 (Web UI)    β”‚            β”‚   port 6080 (VNC)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚ REST API
           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Paperless-NGX         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

paperflow connects to Chrome over CDP (Chrome DevTools Protocol), uses the live browser session to download invoice PDFs, then uploads them to Paperless-NGX via REST API.


πŸ” Browser Login (Amazon, IKEA & Klarna)

paperflow uses a persistent Chrome browser (paperflow-chrome) so you only log in once:

  1. Open http://<server>:6080 in your browser (noVNC web UI)
  2. Log into Amazon, IKEA, or Klarna β€” including any 2FA prompts
  3. Start a scan from the web UI β€” your session is reused automatically

Alternative β€” Cookie import (no VNC needed):

  1. Install the Cookie Editor browser extension
  2. Log into the provider in your regular browser
  3. Export cookies as JSON via Cookie Editor
  4. Paste the JSON in Settings β†’ Amazon / IKEA β†’ Import Cookies

πŸ”Œ Adding Custom Providers

paperflow has a plugin system. To add a new provider:

  1. Create a file myprovider.py following this template:
from app.providers import BaseProvider, Invoice
from pathlib import Path

class MyproviderProvider(BaseProvider):
    provider_name = "myprovider"

    def fetch_invoices(self) -> list[Invoice]:
        # Your download logic here
        return [
            Invoice(
                invoice_id="2024-001",
                file_path=Path("/app/downloads/myprovider/invoice.pdf"),
                title="My Provider Invoice 2024-001",
                date="2024-01-15",
                extra_tags=["2024"],
            )
        ]
  1. Upload via the Providers page in the web UI, or place the file in providers_custom/
  2. Enable the provider in the web UI β€” done!

Convention: class name must be <Providername>Provider (capitalized), file name must be <providername>.py (lowercase).


πŸ—‚οΈ Project Structure

paperflow/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ main.py              # Entry point β€” scheduler + parallel uploads
β”‚   β”œβ”€β”€ web.py               # FastAPI web interface + API endpoints
β”‚   β”œβ”€β”€ ui.html              # Single-page web UI
β”‚   β”œβ”€β”€ database.py          # SQLite tracking (invoices + scanned years)
β”‚   β”œβ”€β”€ paperless_client.py  # Paperless-NGX API client
β”‚   β”œβ”€β”€ state.py             # Shared scan progress state
β”‚   └── providers/
β”‚       β”œβ”€β”€ __init__.py      # BaseProvider + Invoice dataclass
β”‚       β”œβ”€β”€ amazon.py        # Amazon provider (CDP mode + fallback)
β”‚       β”œβ”€β”€ ikea.py          # IKEA provider (CDP mode + cookie import)
β”‚       └── klarna.py        # Klarna provider (CDP mode, Kaufbelege)
β”œβ”€β”€ chrome-desktop/          # Chrome + noVNC Docker image (paperflow-chrome)
β”‚   β”œβ”€β”€ Dockerfile
β”‚   └── start.sh
β”œβ”€β”€ providers_custom/        # Drop custom provider .py files here
β”œβ”€β”€ data/                    # SQLite DB + logs + settings (persisted volume)
β”œβ”€β”€ downloads/               # Temporary PDF storage
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ docker-compose.yml       # Local development
β”œβ”€β”€ docker-compose.portainer.yml  # Portainer / production deployment
└── .env.example

πŸ›£οΈ Roadmap

  • eBay provider
  • Email/IMAP provider (catch invoices sent by email)
  • Notification on completion (Telegram / ntfy)
  • Dark/light mode toggle in web UI

🀝 Contributing

Contributions are welcome! Please read CONTRIBUTING.md first.

The easiest way to contribute is to write a provider for a service you use and open a pull request.


πŸ“œ License

MIT β€” see LICENSE

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages