OZint takes a list of email addresses and builds enriched contact profiles using the ZAI GLM-4.6 AI model combined with free public data sources. It runs a 13-wave intelligence pipeline per contact: AI parsing, DNS analysis, certificate transparency, Gravatar, LinkedIn inference, Wayback Machine, PGP keyserver lookup, source conflict detection, and verification tier assignment.
Output is a CSV or JSON file with 25+ fields per contact. The tool is CLI-only, Python 3.8+, no server required.
You need Python 3.8+ and a ZAI API key. Without the key, Waves 2-13 still run using free public APIs, but results will be much less complete.
git clone <repository-url>
cd oh-sint
# Run directly (auto-creates venv on first run)
./run-source-linux.sh emails.txt results.csv
# Or manually
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
export ZAI_API_KEY="your_key_here"
python src/zai_email_processor.py emails.txt results.csvSee docs/QUICK_START.md for more detail.
# Basic - CSV output (default)
python src/zai_email_processor.py emails.txt results.csv
# JSON output
python src/zai_email_processor.py emails.txt results.json --format json
# Process only the first N contacts
python src/zai_email_processor.py emails.txt results.csv --limit 50OZint auto-detects the input format:
- Plain text: one email per line, lines starting with
#are ignored - Simple CSV: columns named
email(oremails) and optionallyphone(orphones) - Master data CSV: columns named
First Name,Last Name,Companies,Emails,Phones- passing pre-known data as context improves AI accuracy noticeably
Multiple emails on one line use | as delimiter.
Set these environment variables before running:
| Variable | Required | Description |
|---|---|---|
ZAI_API_KEY |
For AI | ZAI API key for GLM-4.6 (Wave 1) |
HIBP_API_KEY |
No | Have I Been Pwned key (Wave 8 currently disabled) |
OZINT_CACHE_DIR |
No | Cache directory, defaults to ./cache |
OZINT_LOG_LEVEL |
No | Logging verbosity (DEBUG/INFO/WARNING/ERROR) |
OZINT_RATE_LIMIT |
No | Advisory requests-per-hour limit |
oh-sint/
├── src/
│ ├── zai_email_processor.py # CLI entry point and batch orchestrator
│ ├── zai_intelligence_agent.py # Core AI engine + ZAIContactProfile dataclass
│ ├── thread_safe_agent_factory.py # Per-thread agent management
│ ├── sort_contacts.py # Standalone CSV sort utility
│ ├── cuda_integration.py # CUDA stub (placeholder, not implemented)
│ ├── cuda_batch_processor.py # CUDA batch stub (placeholder)
│ ├── cuda_phone_intelligence.py # CUDA phone stub (placeholder)
│ └── cuda_safe_integration.py # CUDA safe import wrapper (placeholder)
├── tests/
│ └── test_v2_features.py # v2.0 dataclass and method tests
├── docs/ # Documentation
├── examples/ # Sample input files
├── data/
│ ├── input/ # Drop input files here
│ └── output/ # Results land here
├── run-source-linux.sh # Linux launch script
├── run-source-mac.sh # macOS launch script
├── run-source-windows.bat # Windows launch script
└── requirements.txt
Each contact gets a confidence_score from 0.0 to 1.0 and a verification_tier label:
- 0.7-1.0 /
CONFIRMED: Multiple independent sources agree - 0.4-0.7 /
LIKELY: Strong signal from one authoritative source - 0.1-0.4 /
INFERRED: Derived from email pattern or domain heuristics - 0.0-0.1 /
UNCERTAIN: Insufficient data
The score drops 10% when a catch-all domain is detected (meaning email existence can't be confirmed).
With default settings (5 workers, 1.5s base delay), expect roughly 3-8 seconds per contact. 100 contacts takes 5-10 minutes wall-clock time.
The CUDA modules in src/ are placeholder stubs. They don't do GPU work; all processing runs on CPU.
python -m pytest tests/ -v
python -m pytest --cov=src tests/The existing tests validate the v2.0 data model fields and methods. See docs/TESTING.md for what's covered and what's missing.
- Quick Start - Get running in minutes
- Architecture - System design and 13-wave pipeline
- API Reference - Internal Python API and external services used
- Configuration - All config options
- Development - Dev setup, input formats, adding waves
- Deployment - Production setup and cron jobs
- Troubleshooting - Common issues
- Testing - Test guide
- Performance - Tuning guide
- FAQ - Common questions
MIT. See LICENSE.