Skip to content

sanchez314c/oh-sint

OZint - Contact Intelligence System

Python Version License Version

OZint takes a list of email addresses and builds enriched contact profiles using the ZAI GLM-4.6 AI model combined with free public data sources. It runs a 13-wave intelligence pipeline per contact: AI parsing, DNS analysis, certificate transparency, Gravatar, LinkedIn inference, Wayback Machine, PGP keyserver lookup, source conflict detection, and verification tier assignment.

Output is a CSV or JSON file with 25+ fields per contact. The tool is CLI-only, Python 3.8+, no server required.

Quick Start

You need Python 3.8+ and a ZAI API key. Without the key, Waves 2-13 still run using free public APIs, but results will be much less complete.

git clone <repository-url>
cd oh-sint

# Run directly (auto-creates venv on first run)
./run-source-linux.sh emails.txt results.csv

# Or manually
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
export ZAI_API_KEY="your_key_here"
python src/zai_email_processor.py emails.txt results.csv

See docs/QUICK_START.md for more detail.

Usage

# Basic - CSV output (default)
python src/zai_email_processor.py emails.txt results.csv

# JSON output
python src/zai_email_processor.py emails.txt results.json --format json

# Process only the first N contacts
python src/zai_email_processor.py emails.txt results.csv --limit 50

Input Formats

OZint auto-detects the input format:

  • Plain text: one email per line, lines starting with # are ignored
  • Simple CSV: columns named email (or emails) and optionally phone (or phones)
  • Master data CSV: columns named First Name, Last Name, Companies, Emails, Phones - passing pre-known data as context improves AI accuracy noticeably

Multiple emails on one line use | as delimiter.

Configuration

Set these environment variables before running:

Variable Required Description
ZAI_API_KEY For AI ZAI API key for GLM-4.6 (Wave 1)
HIBP_API_KEY No Have I Been Pwned key (Wave 8 currently disabled)
OZINT_CACHE_DIR No Cache directory, defaults to ./cache
OZINT_LOG_LEVEL No Logging verbosity (DEBUG/INFO/WARNING/ERROR)
OZINT_RATE_LIMIT No Advisory requests-per-hour limit

Project Structure

oh-sint/
├── src/
│   ├── zai_email_processor.py       # CLI entry point and batch orchestrator
│   ├── zai_intelligence_agent.py    # Core AI engine + ZAIContactProfile dataclass
│   ├── thread_safe_agent_factory.py # Per-thread agent management
│   ├── sort_contacts.py             # Standalone CSV sort utility
│   ├── cuda_integration.py          # CUDA stub (placeholder, not implemented)
│   ├── cuda_batch_processor.py      # CUDA batch stub (placeholder)
│   ├── cuda_phone_intelligence.py   # CUDA phone stub (placeholder)
│   └── cuda_safe_integration.py     # CUDA safe import wrapper (placeholder)
├── tests/
│   └── test_v2_features.py          # v2.0 dataclass and method tests
├── docs/                            # Documentation
├── examples/                        # Sample input files
├── data/
│   ├── input/                       # Drop input files here
│   └── output/                      # Results land here
├── run-source-linux.sh              # Linux launch script
├── run-source-mac.sh                # macOS launch script
├── run-source-windows.bat           # Windows launch script
└── requirements.txt

Confidence Scoring

Each contact gets a confidence_score from 0.0 to 1.0 and a verification_tier label:

  • 0.7-1.0 / CONFIRMED: Multiple independent sources agree
  • 0.4-0.7 / LIKELY: Strong signal from one authoritative source
  • 0.1-0.4 / INFERRED: Derived from email pattern or domain heuristics
  • 0.0-0.1 / UNCERTAIN: Insufficient data

The score drops 10% when a catch-all domain is detected (meaning email existence can't be confirmed).

Performance

With default settings (5 workers, 1.5s base delay), expect roughly 3-8 seconds per contact. 100 contacts takes 5-10 minutes wall-clock time.

The CUDA modules in src/ are placeholder stubs. They don't do GPU work; all processing runs on CPU.

Testing

python -m pytest tests/ -v
python -m pytest --cov=src tests/

The existing tests validate the v2.0 data model fields and methods. See docs/TESTING.md for what's covered and what's missing.

Documentation

License

MIT. See LICENSE.

About

Takes a list of email addresses and builds enriched contact profiles using AI combined with free public data sources. Runs a 13-wave intelligence pipeline per contact for OSINT analysis.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors