AI-powered tool to extract financial transactions from PDF bank/credit card statements and populate Excel templates for tax purposes.
- Intelligent Parsing: Uses local Llama 3.1 8B model via Ollama to handle varied statement formats
- Auto-categorization: Automatically separates money in vs money out
- Batch Processing: Processes multiple PDFs in one run
- Template Preservation: Maintains your Excel template formatting
- Privacy: All processing happens locally - your financial data never leaves your machine
- Ollama installed (download from ollama.ai)
- Python 3.8+
- Llama 3.1 8B model (automatically downloaded during setup)
- Excel template with columns for date, description, money in, money out
-
Clone or download this repository
-
Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Ensure Ollama is running and Llama 3.1 8B is available:
ollama pull llama3.1:8bEdit config.json to match your Excel template structure:
{
"pdf_folder": "./statements",
"output_folder": "./output",
"excel_template": "./template.xlsx",
"model_name": "llama3.1:8b",
"column_mapping": {
"date": "A",
"description": "B",
"money_in": "C",
"money_out": "D"
},
"start_row": 2
}pdf_folder: Folder containing your PDF statementsoutput_folder: Where processed Excel files will be savedexcel_template: Path to your Excel templatecolumn_mapping: Map transaction fields to Excel columnsstart_row: Row number where data should start (skip header row)
- Start the web server:
./run_web.sh
# Or manually:
source venv/bin/activate
python -m uvicorn app:app --host 0.0.0.0 --port 8000 --reload- Open your browser and navigate to:
http://localhost:8000
- Use the web interface to:
- Drag and drop PDF files or click to select
- Monitor real-time processing progress
- Preview extracted transactions
- Download the Excel file
- Place your PDF statements in the
statements/folder - Place your Excel template in the project root (or update path in config.json)
- Run the script:
python main.py- Review the extraction summary
- Confirm to write transactions to Excel
Your Excel template should have columns for:
- Date: Transaction date
- Description: Transaction description/merchant
- Money In: Deposits, credits, refunds
- Money Out: Purchases, withdrawals, fees
The script will preserve all existing formatting, formulas, and styles in your template.
- Processing speed: ~2-5 seconds per PDF page
- Accuracy: Excellent for structured financial documents
- Cost: $0 - completely free, unlimited usage
- Privacy: All data stays on your local machine
If you get an error about the model not being found:
ollama pull llama3.1:8b- Ensure PDFs are text-based (not scanned images)
- Check that statements have clear transaction tables
- Review extracted data and manually correct if needed
- Ensure template file exists and is not open in Excel
- Check that column mappings in config.json match your template
- Verify start_row doesn't overwrite important data
.
├── main.py # Main orchestration script
├── pdf_extractor.py # PDF text extraction
├── ai_parser.py # AI transaction parsing
├── excel_writer.py # Excel template writer
├── config.json # Configuration file
├── requirements.txt # Python dependencies
├── statements/ # Place PDFs here
├── output/ # Processed Excel files appear here
└── template.xlsx # Your Excel template
Free to use for personal tax preparation purposes.