Automatically search global job listings based on your CV, score matches with LLM, and deduplicate across multiple sources.
# Install uv (skip if already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh # macOS / Linux
# Windows (PowerShell): irm https://astral.sh/uv/install.ps1 | iex
git clone https://github.com/sangowu/JobRadar.git
cd JobRadar
uv sync
uv run jobradar serve # Launch Web UI (http://127.0.0.1:8765)
# Open your browser and configure API keys in the "API Config" page
# Or configure manually via .env:
cp .env.example .env # Fill in your API keys
uv run jobradar find cv.docx # CLI mode| Command | Description |
|---|---|
uv run jobradar serve |
Launch Web UI |
uv run jobradar serve --mock |
Test mode (isolated DB, won't affect real cache) |
uv run jobradar find cv.docx |
CLI: parse CV → discover titles → scrape → assess |
uv run jobradar find cv.docx --refresh |
Force re-search, ignore all caches |
uv run jobradar results |
Browse cached results from the last search |
uv run jobradar assess |
Re-run LLM assessment on cached JDs |
uv run jobradar model |
Interactively choose LLM provider and model |
uv run jobradar cache clear |
Clear all caches |
uv run jobradar --version |
Show current version |
CV file
│
▼ ① CV parsing (LLM → CVProfile) ← permanent SHA-256 cache
structured seniority bands + explicit language extraction
▼ ② User reviews & confirms title list
▼ ③ Scraping (Indeed + LinkedIn, JobSpy, no browser)
rate-limited serial (Indeed 2s / LinkedIn 3s) → URL dedup
▼ pre-JD LLM title relevance gate
conservative title-only semantic filter; default keep=true and reject only clearly different career paths
▼ batched LLM coarse filter
card-level keep/reject using title + location + snippet
▼ dynamic title seniority gate
blocks obvious level mismatch (e.g. new grad → lead / manager)
▼ experience-gap gate
directly skips roles whose explicit years-required exceeds candidate experience by more than 3 years
▼ ④ JD profile extraction
structured required/preferred skills, must-haves, years, seniority conflict, work mode, language requirements
▼ ⑤ Explainable CV↔JD matching
rubric-based dimension scores → programmatic weighted score → recommendation
city-to-city relocation / office attendance count as risk, not as location-score penalty
▼ ⑥ Artifact generation
interview prep / cover letter / CV optimization
▼ ⑦ Search stats + cache
history metrics, reports, filter events, Web UI / terminal display
Real-world funnel (actual data):
Indeed 741 + LinkedIn 255 = 996 scraped
→ LLM title filter 996 → 689 (30.8% removed)
→ Pre-filter funnel 689 → 76 (seniority / dedup / skills etc.)
→ LLM assessment 76 → 54 saved (71.1% pass rate)
→ Overall filter rate: 94.6% (only 54 of 996 require human review)
# LLM Provider (configure at least one)
ANTHROPIC_API_KEY=
GEMINI_API_KEY=
OPENAI_API_KEY=
DEEPSEEK_API_KEY=
DASHSCOPE_API_KEY=
# Local models
LLAMACPP_BASE_URL=http://localhost:8080/v1
LOCAL_LLM_BASE_URL=http://localhost:1234/v1
# Default model (auto-written by `jobradar model`)
DEFAULT_PROVIDER=gemini
DEFAULT_MODEL=gemini-2.0-flash- Live progress: jobs streamed card-by-card via SSE during search
- Pipeline funnel stats: per-stage breakdown after each search (scraped → LLM title filter → pre-filter funnel → LLM assessment → saved / filter rate)
- Three-column layout: job list + detail + CV upload/search panel
- Multi-source dedup: jobs appearing on both Indeed and LinkedIn are merged; source badges are clickable links; Apply button becomes a dropdown when multiple source URLs exist
- Search history: each record has a 📊 button to expand the full pipeline funnel, with per-source breakdown (Indeed / LinkedIn)
- Normalized search history metrics: each record stores total scraped, deduped, filtered, newly saved jobs, and token consumption
- Module-level telemetry: history records and search completion events now include
module_metricswith per-modulecalls / input_tokens / output_tokens / elapsed, plusprocessed / rejected / keptwhere applicable in the search pipeline - Funnel benchmark summary: history now tracks pipeline/prompt versions and shows derived efficiency metrics such as post-filter rate, new-job yield, and tokens per new job
- Pre-JD title semantic filter: before JD assessment, job titles are batch-scored by an LLM using a conservative relevance gate; history funnel shows
skip_irrelevant - Title gate behavior: the gate only rejects titles that are clearly outside the candidate's career path from the title alone; broad, adjacent, or ambiguous technical titles are kept for later JD assessment
- Persisted filter events: each run now writes
filter_eventswithrun_id / stage / title / reason / details, so you can inspect exactly where a title was filtered out - Forced refresh in Web UI: search panel includes a force-refresh option so you can rerun without reusing the current search-session cache
- Dynamic title seniority gate: uses CV eligible/stretch/blocked levels to remove obvious title-level mismatches before JD matching
- Experience-gap hard reject: prefilter records
skip_exp, andjd_assessmentdirectly rejects roles whose explicit years-required exceeds the candidate's experience by more than 3 years - Explainable matching: job details include score breakdown, risks, skill matches, and recommendation tiers
- Risk-only relocation / office attendance handling: same-country relocation and office-attendance requirements are recorded in
risks / risk_penaltybut are not treated aslocation_scorepenalties - Artifact hub: generate and reuse Interview Prep, Cover Letter, and CV Optimization from the job detail panel
- Log panel: level filtering, keyword highlight, auto-refresh
- Config page: manage LLM API keys, select default model, clear cache — new users can complete all setup without editing
.env - Multilingual: UI supports Chinese / English / Español
After every search, stats are automatically written to the reports/ directory:
| File | Description |
|---|---|
pipeline_stats.jsonl |
Append-only log — one JSON line per search, full history preserved |
pipeline_stats_latest.json |
Always overwritten with the most recent search report |
- Every history row stores version metadata. The backend writes
app_version,cv_prompt_version,jd_summary_prompt_version,match_prompt_version,title_gate_version, andcoarse_filter_versionintosearch_stats. - Every history row also stores
run_idso you can join a history record to persistedfilter_events. - The UI Funnel Benchmark compares grouped version signatures. A signature is built from the version fields above, so “current version” vs “previous version” in the benchmark is really a comparison between two different history groups.
- The history table does not currently render every version field per row, but the benchmark card shows the active signature, and
/api/statsreturnsversionsfor each record.
- API:
GET /api/statsreturns history rows includingrun_idGET /api/filter-events?run_id=<run_id>returns persisted per-title filter events for that search
- Storage:
search_statsstores run-level summaryfilter_eventsstoresstage / title / reason / details
- Mock cache clearing:
- In mock mode, clearing cache now also clears
search_statsandfilter_eventsfrom the mock database
- In mock mode, clearing cache now also clears
Use scripts/show_filter_events.py to inspect persisted filter events from data/jobradar_test_cache.db without rerunning a search.
Examples:
python scripts/show_filter_events.py
python scripts/show_filter_events.py --stage jd_assessment --out reports/filter_report.md
python scripts/show_filter_events.py --run-id <run_id> --json --out reports/filter_report.jsonUseful options:
--run-id: inspect a specific history run--stage: only show one stage such astitle_relevance,coarse_filter,experience_gap,jd_assessment, orfinal_match--md: print Markdown to the terminal--out: save.mdor.json
What you can already inspect:
- Effect:
skip_irrelevantin the funnel history shows how many titles the pre-JD title relevance gate rejected.new_job_yield,tokens_per_filtered_job,tokens_per_new_job, andassessment_efficiencyin the benchmark help compare before/after behavior.
- Total cost:
- search history already stores per-run
tokens_inandtokens_out, so you can compare total token usage across version signatures.
- search history already stores per-run
Current limitation:
- There is no isolated token counter for the title relevance gate yet. The system records total search tokens, not per-module token spend. So you can compare overall before/after cost, but not the exact token cost of the title gate alone from the UI today.
Recommended evaluation workflow:
- Run several searches with similar
roles + location + provider/modelbefore and after the version change. - Compare:
- whether
skip_irrelevantincreased - whether
llm_assessedwent down - whether
tokens_per_filtered_job/tokens_per_new_jobimproved - whether
new_job_yieldstayed acceptable
- whether
- If you need exact module-level cost attribution, add dedicated telemetry tags or a separate token counter for the title gate.
Use scripts/compare_title_gate.py to run a controlled A/B comparison between:
baseline_gate_offtitle_gate_on
The script uses a fixed experiment setup:
- Titles:
AI Engineer,Machine Learning Engineer,LLM Engineer,Software Engineer,Backend Engineer 30jobs per title168hours (7days)Indeedonly- Location fixed to
Ireland
Example:
python scripts/compare_title_gate.py --cv-path "path/to/test_cv.md" --keep-db --out reports/compare_report.jsonUseful options:
--cv-path: test CV file (.md,.txt,.docx)--out: save the full JSON comparison report--keep-db: persist baseline/improved sqlite files underreports/compare_runs/<timestamp>/--db-dir: explicitly choose where those sqlite artifacts are stored
The generated reports/compare_report.json includes:
summary: top-line counts and token costdiff.baseline_only_jobs: jobs kept only by baselinediff.improved_only_jobs: jobs kept only by the title-gate versiondiff.title_gate_rejections: titles explicitly rejected by the title gate
location_scorenow represents broad geographic compatibility only.- Same-country city relocation and office-attendance requirements such as
hybrid,onsite, or3 days in officeare treated as practical risks. - Those factors contribute to
risks / risk_penalty, but are not supposed to lowerlocation_score.
- CV content is sent to your configured LLM API (Anthropic / Google / OpenAI, etc.) for parsing and assessment. Please ensure you trust your chosen provider's data policy.
- All data is stored locally: parsed CV profiles and job listings are stored in a local SQLite database (
data/jobradar_cache.db) and are never uploaded to any third-party server. - Log file (
logs/jobradar.log) records search terms and timestamps only — it does not contain CV personal data or API keys, and is excluded from git via.gitignore. - PII in CV: If you are concerned about sending personal information to an external LLM provider, remove it from your CV before uploading (name, email, phone, address). LinkedIn / GitHub links carry no additional risk.
- Prompt injection protection: Job description content scraped from external sources is wrapped in
<jd_content>boundary tags, and the system prompt explicitly instructs the LLM to treat tag contents as data only, ignoring any embedded instructions.
This is a solo side project maintained in spare time. Some features — particularly location-based filtering — may produce inconsistent results depending on the job source.
LLM provider support: 17 providers are integrated, but not all have been fully tested end-to-end. If you encounter a bug with a specific provider or model, please open an issue and include the provider name, model, and error message.
- Smarter LLM Assessment — Stabilise strengths/weaknesses output: ensure experience-gap mismatches consistently surface as weaknesses, and reduce variance across repeated evaluations of the same JD
- CV Tailoring — One-click CV rewrite optimised for a specific JD, highlighting the most relevant experience and keywords
- Cover Letter Generator — Auto-generate a tailored cover letter per JD, ready to copy-paste or export
This tool scrapes publicly available job data from Indeed and other platforms via python-jobspy.
Please note: Web scraping may violate the Terms of Service of the targeted websites. This tool is intended for personal job searching, learning, and research only. Users are solely responsible for ensuring compliance with applicable terms. The author accepts no liability for any misuse. Please scrape responsibly and avoid high-frequency or commercial use.