OZint - Contact Intelligence System

OZint takes a list of email addresses and builds enriched contact profiles using the ZAI GLM-4.6 AI model combined with free public data sources. It runs a 13-wave intelligence pipeline per contact: AI parsing, DNS analysis, certificate transparency, Gravatar, LinkedIn inference, Wayback Machine, PGP keyserver lookup, source conflict detection, and verification tier assignment.

Output is a CSV or JSON file with 25+ fields per contact. The tool is CLI-only, Python 3.8+, no server required.

Quick Start

You need Python 3.8+ and a ZAI API key. Without the key, Waves 2-13 still run using free public APIs, but results will be much less complete.

git clone <repository-url>
cd oh-sint

# Run directly (auto-creates venv on first run)
./run-source-linux.sh emails.txt results.csv

# Or manually
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
export ZAI_API_KEY="your_key_here"
python src/zai_email_processor.py emails.txt results.csv

See docs/QUICK_START.md for more detail.

Usage

# Basic - CSV output (default)
python src/zai_email_processor.py emails.txt results.csv

# JSON output
python src/zai_email_processor.py emails.txt results.json --format json

# Process only the first N contacts
python src/zai_email_processor.py emails.txt results.csv --limit 50

Input Formats

OZint auto-detects the input format:

Plain text: one email per line, lines starting with # are ignored
Simple CSV: columns named email (or emails) and optionally phone (or phones)
Master data CSV: columns named First Name, Last Name, Companies, Emails, Phones - passing pre-known data as context improves AI accuracy noticeably

Multiple emails on one line use | as delimiter.

Configuration

Set these environment variables before running:

Variable	Required	Description
`ZAI_API_KEY`	For AI	ZAI API key for GLM-4.6 (Wave 1)
`HIBP_API_KEY`	No	Have I Been Pwned key (Wave 8 currently disabled)
`OZINT_CACHE_DIR`	No	Cache directory, defaults to `./cache`
`OZINT_LOG_LEVEL`	No	Logging verbosity (DEBUG/INFO/WARNING/ERROR)
`OZINT_RATE_LIMIT`	No	Advisory requests-per-hour limit

Project Structure

oh-sint/
├── src/
│   ├── zai_email_processor.py       # CLI entry point and batch orchestrator
│   ├── zai_intelligence_agent.py    # Core AI engine + ZAIContactProfile dataclass
│   ├── thread_safe_agent_factory.py # Per-thread agent management
│   ├── sort_contacts.py             # Standalone CSV sort utility
│   ├── cuda_integration.py          # CUDA stub (placeholder, not implemented)
│   ├── cuda_batch_processor.py      # CUDA batch stub (placeholder)
│   ├── cuda_phone_intelligence.py   # CUDA phone stub (placeholder)
│   └── cuda_safe_integration.py     # CUDA safe import wrapper (placeholder)
├── tests/
│   └── test_v2_features.py          # v2.0 dataclass and method tests
├── docs/                            # Documentation
├── examples/                        # Sample input files
├── data/
│   ├── input/                       # Drop input files here
│   └── output/                      # Results land here
├── run-source-linux.sh              # Linux launch script
├── run-source-mac.sh                # macOS launch script
├── run-source-windows.bat           # Windows launch script
└── requirements.txt

Confidence Scoring

Each contact gets a confidence_score from 0.0 to 1.0 and a verification_tier label:

0.7-1.0 / CONFIRMED: Multiple independent sources agree
0.4-0.7 / LIKELY: Strong signal from one authoritative source
0.1-0.4 / INFERRED: Derived from email pattern or domain heuristics
0.0-0.1 / UNCERTAIN: Insufficient data

The score drops 10% when a catch-all domain is detected (meaning email existence can't be confirmed).

Performance

With default settings (5 workers, 1.5s base delay), expect roughly 3-8 seconds per contact. 100 contacts takes 5-10 minutes wall-clock time.

The CUDA modules in src/ are placeholder stubs. They don't do GPU work; all processing runs on CPU.

Testing

python -m pytest tests/ -v
python -m pytest --cov=src tests/

The existing tests validate the v2.0 data model fields and methods. See docs/TESTING.md for what's covered and what's missing.

Documentation

Quick Start - Get running in minutes
Architecture - System design and 13-wave pipeline
API Reference - Internal Python API and external services used
Configuration - All config options
Development - Dev setup, input formats, adding waves
Deployment - Production setup and cron jobs
Troubleshooting - Common issues
Testing - Test guide
Performance - Tuning guide
FAQ - Common questions

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
data		data
docs		docs
examples		examples
resources/icons		resources/icons
scripts		scripts
src		src
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PIPELINE_LOG.md		PIPELINE_LOG.md
README.md		README.md
SECURITY.md		SECURITY.md
VERSION_MAP.md		VERSION_MAP.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OZint - Contact Intelligence System

Quick Start

Usage

Input Formats

Configuration

Project Structure

Confidence Scoring

Performance

Testing

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OZint - Contact Intelligence System

Quick Start

Usage

Input Formats

Configuration

Project Structure

Confidence Scoring

Performance

Testing

Documentation

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages