JobRadar

中文 · English · Español

Automatically search global job listings based on your CV, score matches with LLM, and deduplicate across multiple sources.

Quick Start

# Install uv (skip if already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh   # macOS / Linux
# Windows (PowerShell): irm https://astral.sh/uv/install.ps1 | iex

git clone https://github.com/sangowu/JobRadar.git
cd JobRadar
uv sync
uv run jobradar serve       # Launch Web UI (http://127.0.0.1:8765)
# Open your browser and configure API keys in the "API Config" page
# Or configure manually via .env:
cp .env.example .env         # Fill in your API keys
uv run jobradar find cv.docx  # CLI mode

Commands

Command	Description
`uv run jobradar serve`	Launch Web UI
`uv run jobradar serve --mock`	Test mode (isolated DB, won't affect real cache)
`uv run jobradar find cv.docx`	CLI: parse CV → discover titles → scrape → assess
`uv run jobradar find cv.docx --refresh`	Force re-search, ignore all caches
`uv run jobradar results`	Browse cached results from the last search
`uv run jobradar assess`	Re-run LLM assessment on cached JDs
`uv run jobradar model`	Interactively choose LLM provider and model
`uv run jobradar cache clear`	Clear all caches
`uv run jobradar --version`	Show current version

Pipeline Overview

CV file
  │
  ▼ ① CV parsing (LLM → CVProfile)  ← permanent SHA-256 cache
         structured seniority bands + explicit language extraction
  ▼ ② User reviews & confirms title list
  ▼ ③ Scraping (Indeed + LinkedIn, JobSpy, no browser)
         rate-limited serial (Indeed 2s / LinkedIn 3s) → URL dedup
  ▼    pre-JD LLM title relevance gate
         conservative title-only semantic filter; default keep=true and reject only clearly different career paths
  ▼    batched LLM coarse filter
         card-level keep/reject using title + location + snippet
  ▼    dynamic title seniority gate
         blocks obvious level mismatch (e.g. new grad → lead / manager)
  ▼    experience-gap gate
         directly skips roles whose explicit years-required exceeds candidate experience by more than 3 years
  ▼ ④ JD profile extraction
         structured required/preferred skills, must-haves, years, seniority conflict, work mode, language requirements
  ▼ ⑤ Explainable CV↔JD matching
         rubric-based dimension scores → programmatic weighted score → recommendation
         city-to-city relocation / office attendance count as risk, not as location-score penalty
  ▼ ⑥ Artifact generation
          interview prep / cover letter / CV optimization
  ▼ ⑦ Search stats + cache
         history metrics, reports, filter events, Web UI / terminal display

Real-world funnel (actual data):

Indeed 741 + LinkedIn 255 = 996 scraped
  → LLM title filter   996 → 689  (30.8% removed)
  → Pre-filter funnel  689 → 76   (seniority / dedup / skills etc.)
  → LLM assessment      76 → 54 saved  (71.1% pass rate)
  → Overall filter rate: 94.6%  (only 54 of 996 require human review)

Environment Variables

# LLM Provider (configure at least one)
ANTHROPIC_API_KEY=
GEMINI_API_KEY=
OPENAI_API_KEY=
DEEPSEEK_API_KEY=
DASHSCOPE_API_KEY=

# Local models
LLAMACPP_BASE_URL=http://localhost:8080/v1
LOCAL_LLM_BASE_URL=http://localhost:1234/v1

# Default model (auto-written by `jobradar model`)
DEFAULT_PROVIDER=gemini
DEFAULT_MODEL=gemini-2.0-flash

Web UI Features

Live progress: jobs streamed card-by-card via SSE during search
Pipeline funnel stats: per-stage breakdown after each search (scraped → LLM title filter → pre-filter funnel → LLM assessment → saved / filter rate)
Three-column layout: job list + detail + CV upload/search panel
Multi-source dedup: jobs appearing on both Indeed and LinkedIn are merged; source badges are clickable links; Apply button becomes a dropdown when multiple source URLs exist
Search history: each record has a 📊 button to expand the full pipeline funnel, with per-source breakdown (Indeed / LinkedIn)
Normalized search history metrics: each record stores total scraped, deduped, filtered, newly saved jobs, and token consumption
Module-level telemetry: history records and search completion events now include module_metrics with per-module calls / input_tokens / output_tokens / elapsed, plus processed / rejected / kept where applicable in the search pipeline
Funnel benchmark summary: history now tracks pipeline/prompt versions and shows derived efficiency metrics such as post-filter rate, new-job yield, and tokens per new job
Pre-JD title semantic filter: before JD assessment, job titles are batch-scored by an LLM using a conservative relevance gate; history funnel shows skip_irrelevant
Title gate behavior: the gate only rejects titles that are clearly outside the candidate's career path from the title alone; broad, adjacent, or ambiguous technical titles are kept for later JD assessment
Persisted filter events: each run now writes filter_events with run_id / stage / title / reason / details, so you can inspect exactly where a title was filtered out
Forced refresh in Web UI: search panel includes a force-refresh option so you can rerun without reusing the current search-session cache
Dynamic title seniority gate: uses CV eligible/stretch/blocked levels to remove obvious title-level mismatches before JD matching
Experience-gap hard reject: prefilter records skip_exp, and jd_assessment directly rejects roles whose explicit years-required exceeds the candidate's experience by more than 3 years
Explainable matching: job details include score breakdown, risks, skill matches, and recommendation tiers
Risk-only relocation / office attendance handling: same-country relocation and office-attendance requirements are recorded in risks / risk_penalty but are not treated as location_score penalties
Artifact hub: generate and reuse Interview Prep, Cover Letter, and CV Optimization from the job detail panel
Log panel: level filtering, keyword highlight, auto-refresh
Config page: manage LLM API keys, select default model, clear cache — new users can complete all setup without editing .env
Multilingual: UI supports Chinese / English / Español

Pipeline Stats Reports

After every search, stats are automatically written to the reports/ directory:

File	Description
`pipeline_stats.jsonl`	Append-only log — one JSON line per search, full history preserved
`pipeline_stats_latest.json`	Always overwritten with the most recent search report

Benchmark And Version Tracking

Every history row stores version metadata. The backend writes app_version, cv_prompt_version, jd_summary_prompt_version, match_prompt_version, title_gate_version, and coarse_filter_version into search_stats.
Every history row also stores run_id so you can join a history record to persisted filter_events.
The UI Funnel Benchmark compares grouped version signatures. A signature is built from the version fields above, so “current version” vs “previous version” in the benchmark is really a comparison between two different history groups.
The history table does not currently render every version field per row, but the benchmark card shows the active signature, and /api/stats returns versions for each record.

Filter Event Inspection

API:
- GET /api/stats returns history rows including run_id
- GET /api/filter-events?run_id=<run_id> returns persisted per-title filter events for that search
Storage:
- search_stats stores run-level summary
- filter_events stores stage / title / reason / details
Mock cache clearing:
- In mock mode, clearing cache now also clears search_stats and filter_events from the mock database

Inspection Script

Use scripts/show_filter_events.py to inspect persisted filter events from data/jobradar_test_cache.db without rerunning a search.

Examples:

python scripts/show_filter_events.py
python scripts/show_filter_events.py --stage jd_assessment --out reports/filter_report.md
python scripts/show_filter_events.py --run-id <run_id> --json --out reports/filter_report.json

Useful options:

--run-id: inspect a specific history run
--stage: only show one stage such as title_relevance, coarse_filter, experience_gap, jd_assessment, or final_match
--md: print Markdown to the terminal
--out: save .md or .json

How To Measure The New Title-Gate Module

What you can already inspect:

Effect:
- skip_irrelevant in the funnel history shows how many titles the pre-JD title relevance gate rejected.
- new_job_yield, tokens_per_filtered_job, tokens_per_new_job, and assessment_efficiency in the benchmark help compare before/after behavior.
Total cost:
- search history already stores per-run tokens_in and tokens_out, so you can compare total token usage across version signatures.

Current limitation:

There is no isolated token counter for the title relevance gate yet. The system records total search tokens, not per-module token spend. So you can compare overall before/after cost, but not the exact token cost of the title gate alone from the UI today.

Recommended evaluation workflow:

Run several searches with similar roles + location + provider/model before and after the version change.
Compare:
- whether skip_irrelevant increased
- whether llm_assessed went down
- whether tokens_per_filtered_job / tokens_per_new_job improved
- whether new_job_yield stayed acceptable
If you need exact module-level cost attribution, add dedicated telemetry tags or a separate token counter for the title gate.

Compare Script

Use scripts/compare_title_gate.py to run a controlled A/B comparison between:

baseline_gate_off
title_gate_on

The script uses a fixed experiment setup:

Titles: AI Engineer, Machine Learning Engineer, LLM Engineer, Software Engineer, Backend Engineer
30 jobs per title
168 hours (7 days)
Indeed only
Location fixed to Ireland

Example:

python scripts/compare_title_gate.py --cv-path "path/to/test_cv.md" --keep-db --out reports/compare_report.json

Useful options:

--cv-path: test CV file (.md, .txt, .docx)
--out: save the full JSON comparison report
--keep-db: persist baseline/improved sqlite files under reports/compare_runs/<timestamp>/
--db-dir: explicitly choose where those sqlite artifacts are stored

The generated reports/compare_report.json includes:

summary: top-line counts and token cost
diff.baseline_only_jobs: jobs kept only by baseline
diff.improved_only_jobs: jobs kept only by the title-gate version
diff.title_gate_rejections: titles explicitly rejected by the title gate

Matching Semantics

location_score now represents broad geographic compatibility only.
Same-country city relocation and office-attendance requirements such as hybrid, onsite, or 3 days in office are treated as practical risks.
Those factors contribute to risks / risk_penalty, but are not supposed to lower location_score.

Privacy & Security

CV content is sent to your configured LLM API (Anthropic / Google / OpenAI, etc.) for parsing and assessment. Please ensure you trust your chosen provider's data policy.
All data is stored locally: parsed CV profiles and job listings are stored in a local SQLite database (data/jobradar_cache.db) and are never uploaded to any third-party server.
Log file (logs/jobradar.log) records search terms and timestamps only — it does not contain CV personal data or API keys, and is excluded from git via .gitignore.
PII in CV: If you are concerned about sending personal information to an external LLM provider, remove it from your CV before uploading (name, email, phone, address). LinkedIn / GitHub links carry no additional risk.
Prompt injection protection: Job description content scraped from external sources is wrapped in <jd_content> boundary tags, and the system prompt explicitly instructs the LLM to treat tag contents as data only, ignoring any embedded instructions.

Known Limitations

This is a solo side project maintained in spare time. Some features — particularly location-based filtering — may produce inconsistent results depending on the job source.

LLM provider support: 17 providers are integrated, but not all have been fully tested end-to-end. If you encounter a bug with a specific provider or model, please open an issue and include the provider name, model, and error message.

Roadmap

Smarter LLM Assessment — Stabilise strengths/weaknesses output: ensure experience-gap mismatches consistently surface as weaknesses, and reduce variance across repeated evaluations of the same JD
CV Tailoring — One-click CV rewrite optimised for a specific JD, highlighting the most relevant experience and keywords
Cover Letter Generator — Auto-generate a tailored cover letter per JD, ready to copy-paste or export

Legal Disclaimer

This tool scrapes publicly available job data from Indeed and other platforms via python-jobspy.

Please note: Web scraping may violate the Terms of Service of the targeted websites. This tool is intended for personal job searching, learning, and research only. Users are solely responsible for ensuring compliance with applicable terms. The author accepts no liability for any misuse. Please scrape responsibly and avoid high-frequency or commercial use.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
jobradar		jobradar
scripts		scripts
tests		tests
.gitignore		.gitignore
.python-version		.python-version
BUGFIX_LOG.md		BUGFIX_LOG.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.es.md		README.es.md
README.md		README.md
README.zh.md		README.zh.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock
升级建议.md		升级建议.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JobRadar

Quick Start

Commands

Pipeline Overview

Environment Variables

Web UI Features

Pipeline Stats Reports

Benchmark And Version Tracking

Filter Event Inspection

Inspection Script

How To Measure The New Title-Gate Module

Compare Script

Matching Semantics

Privacy & Security

Known Limitations

Roadmap

Legal Disclaimer

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

JobRadar

Quick Start

Commands

Pipeline Overview

Environment Variables

Web UI Features

Pipeline Stats Reports

Benchmark And Version Tracking

Filter Event Inspection

Inspection Script

How To Measure The New Title-Gate Module

Compare Script

Matching Semantics

Privacy & Security

Known Limitations

Roadmap

Legal Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages