Real-time URL threat analysis. Wire API extracts technical signals, Google Generative AI recognizes behavioral patterns, deterministic scoring outputs risk (0-100). Async architecture handles 120s Wire API + 45s AI calls without blocking.
Static URL threat detection is broken:
- Blacklist-based tools miss 60%+ of new phishing campaigns
- SaaS solutions cost $500/month and have 5-minute latencies
- Open-source tools use only regex patterns (too many false positives)
I needed hybrid intelligence: raw technical signals (domain age, SSL validity, redirect chains) + AI pattern recognition (behavioral clustering, social engineering vectors). And it had to be fast.
The Issue: Wire API takes 120s, Google Generative AI takes 45s. If I block the request thread waiting for both, the user sits staring at a loading spinner for 175 seconds.
The Solution: Async worker architecture.
POST /api/investigations/start → returns investigation ID immediately (300ms)
Background: Wire API (120s) → AI Analysis (45s) → Threat Scoring (10s)
Frontend polls GET /api/investigations/:id every 2s with 3-min graceful timeout
Status persisted: processing → completed
User gets results without waiting (8-15s typical, 180s max)
This pattern scales. Investigate 50 URLs and come back later. No polling hell, no WebSocket complexity.
Frontend: React 18 + Vite (3-4x faster than Webpack) + Tailwind + Framer Motion
Backend: Node.js/Express + MongoDB + Mongoose + Helmet.js + express-rate-limit
Intelligence: Wire API (technical metadata) + Google Generative AI (pattern analysis)
Auth: JWT (30-day expiry) + bcryptjs (salt: 10) + input validation
Deployment: Vercel (frontend) + Render (backend) + MongoDB Atlas (database)
Node.js v18+, npm/yarn, MongoDB (Atlas free tier works)git clone https://github.com/anasahhm/specter.git
cd specter
# Backend
cd backend
npm install
cp .env.example .env # Add your API keys
# Frontend
cd frontend
npm install
cp .env.example .envBackend (.env):
NODE_ENV=development
PORT=5000
MONGODB_URI=mongodb+srv://user:password@cluster.mongodb.net/specter
JWT_SECRET=your-super-secret-key-minimum-32-characters
WIRE_API_KEY=your-wire-api-key-here
GOOGLE_GENERATIVE_AI_KEY=your-key-here
FRONTEND_URL=http://localhost:5173Frontend (.env):
VITE_API_URL=http://localhost:5000
VITE_APP_NAME=SPECTERTerminal 1 (Backend):
cd backend && npm run dev
# Should output:
# ╔══════════════════════════════════════════╗
# ║ SPECTER - SERVER STARTED ║
# ║ Port: 5000 | Database: Connected ║
# ║ Wire API: ✓ | Google AI: ✓ ║
# ╚══════════════════════════════════════════╝Terminal 2 (Frontend):
cd frontend && npm run dev
# http://localhost:5173Terminal 3 (Test API):
curl http://localhost:5000/api/health
# {"status":"operational","timestamp":"2024-05-31T12:00:00.000Z"}Step 1: Wire API (120s timeout)
- Domain metadata, SSL certificates, age, MX records
- Redirect chains, technology stack detection
- Embedded links, forms, scripts
- Output: Raw technical signals
Step 2: AI Analysis (45s timeout)
- Google Generative AI pattern recognition
- Behavioral clustering against known threats
- Phishing vector identification
- Confidence scoring and summary generation
- Fallback: Rule-based analysis if AI unavailable
Step 3: Threat Scoring (10s timeout)
- Risk score (0-100)
- Threat classification (Critical/High/Medium/Low/Safe)
- Scam probability, toxicity rating, confidence
- Output: Final verdict
1. User submits URL
2. POST /api/investigations/start
3. Backend returns investigationId (status: processing)
4. Frontend polls GET /api/investigations/:id every 2s
5. Background: Step 1 → Step 2 → Step 3
6. Status changes to completed
7. Frontend renders results
- Problem: Wire API + AI = 165s. Blocking the request thread kills UX.
- Solution: Async workers + polling. POST returns instantly with ID, frontend polls every 2s.
- Lesson: For external APIs >10s, always use async + polling or WebSockets.
- Problem: Users hammer the API. Bots scrape URL intelligence.
- Solution: Dual-axis rate limiting:
- Global: 100 requests/15min (catches distributed attacks)
- Per-user: 5 investigations/min (prevents individual abuse)
- Sliding window (not fixed buckets)
- Lesson: Single rate limit is insufficient. Attack from one user looks different than botnet traffic.
- Problem: What if Wire API is down? What if Google AI returns an error?
- Solution: Graceful degradation:
- Wire API failure → Use cached domain reputation data
- Google AI timeout → Fall back to rule-based threat scoring
- Both failures → Return partial results with explicit warnings
- Lesson: Single point of failure cascades. Build fallbacks at every layer.
- Problem: Users investigate for hours but tokens expire after 30 days.
- Solution: Token refresh pattern:
- 30-day access tokens + refresh tokens
- Frontend axios interceptor refreshes automatically
- No sensitive data in error messages
- Lesson: Never leak token details in error responses.
- Problem: Wire API sees static HTML. Dynamic forms, obfuscated links, JS-rendered content are invisible.
- Solution: Hybrid approach:
- Wire API for structural/technical analysis
- Google AI for behavioral/pattern analysis
- Triangulation catches what either misses
- Lesson: No single tool is complete. Combine strengths.
- Problem: Mongoose default pool size (5) was too small under concurrent load.
- Solution: Tuned pool settings in connection URI, added connection monitoring.
- Lesson: Database bottlenecks surface under load, not in dev.
POST /api/auth/register
{ email, password, displayName? }
POST /api/auth/login
{ email, password }
GET /api/auth/profile
Headers: Authorization: Bearer {token}
POST /api/investigations/start
{ targetType: "url", targetValue: "https://..." }
Returns: { investigationId, status: "processing" }
GET /api/investigations/:investigationId
Returns: Complete threat analysis
GET /api/investigations?page=1&limit=10
Returns: User's investigation history
PUT /api/investigations/:investigationId/bookmark
{ isBookmarked: boolean }
GET /api/reports/:investigationId
GET /api/reports/:investigationId/export?format=pdf|json
GET /api/analytics/user-stats
GET /api/analytics/threat-distribution
| Metric | Range | Meaning |
|---|---|---|
| Risk Score | 0-100 | Overall threat severity |
| Threat Level | Critical/High/Medium/Low/Safe | Classification |
| Phishing Detected | Yes/No | Known phishing patterns |
| Scam Probability | 0-100% | Fraudulent intent likelihood |
| Toxicity Score | 0-100 | Content toxicity |
| Confidence Score | 0-100% | Analysis certainty |
| Metric | Target | Actual |
|---|---|---|
| Page Load | <2s | 1.2s |
| Investigation Start (API) | <500ms | 300ms |
| Results Available | <30s | 8-15s |
| API Response Time | <1s | 200-400ms |
| Database Query | <100ms | 50-80ms |
JWT Auth - 30-day token expiry + refresh rotation
Password Hashing - bcryptjs (salt rounds: 10)
Rate Limiting - 100 req/15min global + 5/min per user
Helmet.js - CSP, X-Frame-Options, HSTS, etc.
CORS - Whitelist frontend origin only
Input Validation - Email format, password entropy, URL structure
Error Handling - No sensitive data leakage
Environment Isolation - Secrets in .env, never in code
Frontend (Vercel):
- Push to GitHub
- vercel.com/new → Import repo
- Set
VITE_API_URLenv var - Deploy
Backend (Render):
- render.com → Create Web Service
- Connect GitHub repo
- Set all env vars (MONGODB_URI, WIRE_API_KEY, etc.)
- Deploy
Database (MongoDB Atlas):
- cloud.mongodb.com → Create cluster (free tier)
- Get connection string
- Whitelist your IP
- Set as MONGODB_URI
specter/
├── frontend/
│ ├── src/
│ │ ├── pages/ # Route components
│ │ ├── components/ # Reusable UI components
│ │ ├── hooks/ # Custom React hooks
│ │ ├── api/ # API client + interceptors
│ │ ├── context/ # Auth context
│ │ ├── utils/ # Helpers
│ │ └── styles/ # Global CSS
│ ├── vite.config.js
│ ├── tailwind.config.js
│ └── package.json
│
├── backend/
│ ├── src/
│ │ ├── routes/ # Express route handlers
│ │ ├── services/ # Business logic (Wire, AI, Scoring)
│ │ ├── models/ # Mongoose schemas
│ │ ├── config/ # Validation, constants
│ │ ├── scripts/ # Database seeders
│ │ ├── server.js # Express app setup
│ │ └── index.js # Entry point
│ └── package.json
│
└── docs/
├── ARCHITECTURE.md # System design
├── API.md # Endpoint reference
└── DEPLOYMENT.md # Production setup
# Register
curl -X POST http://localhost:5000/api/auth/register \
-H "Content-Type: application/json" \
-d '{"email":"test@example.com","password":"Test123!"}'
# Login
curl -X POST http://localhost:5000/api/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"test@example.com","password":"Test123!"}'
# Start investigation
curl -X POST http://localhost:5000/api/investigations/start \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"targetType":"url","targetValue":"https://example.com"}'
# Get results
curl http://localhost:5000/api/investigations/INVESTIGATION_ID \
-H "Authorization: Bearer YOUR_TOKEN"- Register account
- Test URLs:
example.com(safe),malicious-url.com(suspicious) - Verify threat scores, phishing detection, report generation
- Check investigation history and bookmarks
# Check backend is running
curl http://localhost:5000/api/health
# Check VITE_API_URL in frontend/.env matches backend
# Check FRONTEND_URL in backend/.env matches frontend origin (http://localhost:5173)
# Check browser console for CORS errors# Verify connection string
MONGODB_URI=mongodb+srv://user:password@cluster.mongodb.net/specter
# Check IP whitelist in MongoDB Atlas (add 0.0.0.0/0 for development)
# Verify database user has correct credentials- Check API key is valid and quota isn't exceeded
- Review Wire API docs for rate limits
- Enable debug logging:
DEBUG=* npm run dev
- Default timeout is 180 seconds
- Check backend logs for step-specific errors
- Test with simple URL first (e.g., example.com)
- 48 hour build (hackathon sprint)
- 2100+ LOC (1200 frontend, 900 backend)
- 12 API endpoints (Auth, Investigations, Reports, Analytics)
- 3-stage pipeline (Wire API → AI → Scoring)
- 8-15s typical latency (8-180s max with timeouts)
- 4 database collections (users, investigations, reports, analytics)
- 18 React components (modular, reusable)
- 3 backend services (Wire client, AI analyzer, threat scorer)
- Async beats blocking. External APIs >10s? Don't wait. Async + polling scales better.
- Graceful degradation saves systems. When Wire API fails, use cached data. When AI times out, use rules.
- Rate limiting is multidimensional. Global limits catch botnets. Per-user limits catch individual abuse.
- Hybrid intelligence works. One data source has blind spots. Wire API + AI catch what each misses.
- Security is layering. JWT + bcryptjs + Helmet + CORS + input validation = defense in depth.
MIT — see LICENSE
Questions? Open a GitHub issue