A production-oriented job aggregation system that uses offline scraping + fast API delivery.
No real-time scraping. No user latency. Designed for reliability and low cost.
Get Access here: https://job-scraper-puce.vercel.app
Scheduler → Scraper → PostgreSQL → FastAPI → Client
- Scraping runs periodically (not user-triggered)
- Data stored in PostgreSQL
- API is read-only → instant responses
- Dockerized for easy deployment
- Backend: FastAPI
- Scraping: Playwright
- Database: PostgreSQL (SQLAlchemy)
- Queue/Scheduler: Celery + Redis (optional)
- Containerization: Docker + Docker Compose
Traditional approach:
User request → scrape → wait 30–60s → return data ❌
This system:
Background scraping → store → instant API response ✅
- ⚡ Instant API (<100ms)
- 💸 No proxy dependency (or minimal)
- 🔒 Reduced blocking risk
- 📈 Scales easily
- Batch scraping pipeline
- Persistent job storage
- Fast search API
- Docker-based deployment
- Ready for multi-source scraping
- Low-cost VPS compatible (2GB RAM)
app/
├── api/ # FastAPI routes
├── scraper/ # Playwright scrapers
├── tasks/ # Scheduler / background jobs
├── db/ # Models & DB config
└── main.py # App entrypoint
git clone https://github.com/YOUR_USERNAME/YOUR_REPO.git
cd YOUR_REPO
Create .env:
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=jobs_db
DATABASE_URL=postgresql://postgres:postgres@postgres:5432/jobs_db
REDIS_URL=redis://redis:6379/0
docker compose up -d --build
docker ps
curl http://localhost:8000/health
GET /health
GET /jobs?query=backend
Response:
[
{
"title": "Backend Developer",
"company": "XYZ",
"location": "India",
"link": "..."
}
]- Runs periodically (cron / Celery beat)
- Stores jobs in DB
- Avoids duplicate entries
- Can be extended to multiple sources
-
Some sites (e.g., Indeed) use heavy bot protection
-
System is designed to:
- minimize scraping frequency
- avoid real-time scraping
-
Proxy support can be added if needed
Run locally:
uvicorn app.main:app --reload
- Multi-source aggregation
- Deduplication logic (unique jobs)
- Ranking / scoring system
- Proxy fallback system
- Caching layer (Redis)
- Frontend dashboard
“Scrape less. Store more. Serve instantly.”
MIT