feat(linkedin): add recommended jobs adapter with GraphQL pagination support by RickSanchez88E · Pull Request #51 · nashsu/AutoCLI

RickSanchez88E · 2026-04-29T01:26:36Z

Description

Adds a new linkedin recommended command that crawls LinkedIn's personalized job recommendation feed (JYMBII algorithm, at /jobs/collections/recommended/). Unlike the existing linkedin search adapter (REST Voyager API), this endpoint uses GraphQL (/voyager/api/graphql) and requires a browser session.

File: adapters/linkedin/recommended.yaml

Technical Details

API: LinkedIn uses GraphQL with queryId voyagerJobsDashJobCards.* (version-hashed, discovered dynamically via Performance API)
Auth: strategy: header with CSRF token extracted from JSESSIONID cookie
Pagination: Batches of 24 items, automatic multi-page crawl via start offset
Unlimited mode: --limit 0 crawls until no more items (limit > 0 ? limit - fetched : BATCH loop)
Easy Apply detection: Checks footerItems[].type === "EASY_APPLY_TEXT" (not easyApplyUrl which doesn't exist in this API)
Workplace type: Parsed from secondaryDescription.text parentheses, e.g. "London (Hybrid)" → workplace_type: "Hybrid"

Output Columns

rank, title, company, location, workplace_type, salary, posted_time, applicant_count, easy_apply, url

Usage

# Default 200 results
autocli linkedin recommended -f json

# Specify count
autocli linkedin recommended --limit 50 -f json

# Unlimited (crawls all available)
autocli linkedin recommended --limit 0 -f json

# Table format
autocli linkedin recommended --limit 20

# CSV
autocli linkedin recommended --limit 100 -f csv

How to Test

Prerequisites: Chrome must be open with LinkedIn signed in, and the AutoCLI Chrome extension must be installed.

# Quick smoke test (5 results)
autocli linkedin recommended --limit 5

# Verify Easy Apply detection (should see "true" values)
autocli linkedin recommended --limit 24 -f json | grep easy_apply

# Verify pagination (should return exactly 50)
autocli linkedin recommended --limit 50 -f json | python3 -c "import json,sys; d=json.load(sys.stdin); print(f'{len(d)} results')"

# Diagnostic (verify auth & API discovery)
autocli linkedin recommended --limit 3 -f json

Known Quirks / Pitfalls

GraphQL variable encoding: LinkedIn requires colons (:) and parentheses to remain raw (not URL-encoded) in GraphQL variables. Full encodeURIComponent causes HTTP 400. The adapter uses a partial-encode-then-decode approach.
No total count: The API doesn't return a totalCount field. --limit 0 fetches incrementally until the server returns an empty batch.
No applicant_count: Unlike the REST search API, this GraphQL endpoint's jobPostingCard doesn't include applicant count. Column is preserved but always returns "N/A".
No easyApplyUrl field: Easy Apply detection uses footerItems type — verified via 200-job crawl with ~30% Easy Apply rate.
Dynamic queryId: The GraphQL queryId includes a version hash that may change. The adapter discovers it dynamically via performance.getEntriesByType('resource'), so no hardcoded ID to maintain.
Workplace type parsing: Workplace type is embedded in the location string in parentheses. Regex extracts On-site/Hybrid/Remote and strips it from the location field.

Adds `linkedin recommended` adapter for crawling LinkedIn JYMBII algorithm recommended jobs via GraphQL API. Supports automatic pagination, Easy Apply detection via footerItems EASY_APPLY_TEXT, workplace type parsing, and unlimited mode (--limit 0). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…quest signatures, pagination, and test commands

Local LLM (qwen3) → structured JSON → Supabase pipeline: - 5-module Python pipeline: config, preprocess, LLM, db, orchestrator - Grammar-constrained generation via llama.cpp json_schema - 3-attempt retry at temp=0: standard → repair → minimal - Atomic claim/upsert via Supabase RPC functions - Stale processing reaper, dead-letter queue, extraction_runs tracking - Per-run report: console summary + failed-jobs detail + JSON report Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

linkedin recommended --limit 0 --with_jd triggers long-running commands that scroll the full job list and fetch descriptions for each, which can exceed the previous 30-second HTTP timeout. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Add clean_linkedin_jobs.py pipeline that extracts URLs from multiple fields, normalizes URLs, validates LinkedIn records (require easy_apply or external_url), and maps apply_url/source_channel/apply_type correctly. Includes: - clean_linkedin_jobs.py: HTML cleaning, URL extraction cascade, salary parsing, batch dedup, dead letter queue - sync_autocli_jobs.py: Supabase RPC upsert with source_channel/apply_type - 23 unit tests with TDD (clean + sync + validation + URL mapping) - 5 migrations: schema, url_hash, source_channel/apply_type, drop url_hash unique constraint, old data cleanup - daemon health check wait in main.rs bad_count invariant: 776 -> 0 (after cleanup + pipeline fix)

Chrome debugger can detach mid-command on SPA pages (e.g. LinkedIn), returning "Detached while handling command". This error was not in the retry list, causing the extension to give up immediately instead of re-attaching and retrying. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Extension `WINDOW_IDLE_TIMEOUT` (30s) would fire during evaluate steps that run longer than the timeout (e.g. --limit 0 fetching all LinkedIn recommended jobs). Added activeCommands counter per workspace so the idle timer only starts when no commands are in-flight. Added `scripts/autocli-baseline.sh` with 8 pre-flight checks (autocli binary, Chrome process, daemon, extension, LinkedIn reachability, DNS, output dir, disk space) with structured timestamped logging and --json output. Includes 13-test suite at `scripts/test_baseline.sh`.

`check_extension_freshness` compares dist/background.js mtime against a refresh marker file (.baseline-last-refresh). On first run (no marker) it warns; when dist is newer than last refresh it fails with a clear hint to use --refresh-extension. `--refresh-extension` uses browser-harness CDP to navigate to chrome://extensions, find the AutoCLI card, and click its reload button, then updates the marker. Test suite now has 15 tests covering all freshness scenarios.

sync_autocli_jobs.py looked for "apply_type" key in raw records, but LinkedIn raw data uses "easy_apply". Records from this pipeline were silently defaulted to apply_type='unknown'. Added a fallback check for the "easy_apply" field to correctly classify LinkedIn easy-apply jobs. Also ran a SQL migration to fix 271 existing rows that were affected.

…pply_url When the same Workday (ATS) job arrives with different LinkedIn apply_url shapes, the identity_hash now uses a canonical ATS URL rather than the raw apply_url. New _extract_canonical_job_url() prefers ATS external_urls over LinkedIn referrer URLs, and _canonicalize_url() normalizes scheme/host case, strips trailing slashes, and removes tracking params (utm_*, source, share_id, gh_src, lever-source, etc.). LinkedIn URLs are preserved as metadata on the apply_url field without affecting identity. --dry-run report now includes canonical_distinct_jobs count and duplicate groups grouped by identity_hash. 22 tests covering URL helpers, canonicalization, and the Ameresco regression case where same Workday URL produces same identity_hash regardless of LinkedIn apply_url presence. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Create scripts/job_priority_config.py with all configuration constants, regex patterns, and keyword sets for the deterministic job priority scoring system. Contains no scoring logic -- only configuration to be imported by the scorer, sync pipeline, backfill scripts, and tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Pure, deterministic scoring engine for AutoCLI jobs with 8 components: compensation, role fit, seniority, work arrangement, application path, freshness, data completeness, and source quality. Includes penalty system, hard-reject guard, and tier mapping (high/medium/low/reject). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

REPEATED_PUNCT_RE used {2,} which matches 3+ total consecutive punctuation chars (e.g. "!!!" -> "!"). Changed to {1,} so 2+ consecutive chars are collapsed (e.g. "!!" -> "!", "!!!" -> "!"). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Import score_job in sync_autocli_jobs.py and call it per-record - Pass ScoreResult fields (priority_score, priority_tier, priority_version, priority_signals) to upsert_job RPC - Add --disable-scoring flag for testing - Report priority score distribution in dry-run mode - Add comprehensive test suite (104 tests across 14 classes) covering all 8 scoring components, penalties, hard-reject guard, edge cases, and integration scenarios Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Migration 20260509182000: add priority scoring columns to jobs.jobs table (priority_score, priority_tier, priority_version, priority_signals, priority_scored_at) - Migration 20260509184000: add update_job_priority_score RPC that only touches scoring fields (not the full row), with schema-scoped and public wrappers - scripts/backfill_priority_scores.py: batch backfill script with --force, --limit, --dry-run, --env-file options; reconstructs job_data from raw_record or DB columns; reports per-row scores, tiers, and errors Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-authored-by: Codex <noreply@openai.com>

- Rename priority_version column to priority_scorer_version in both migrations - Add 'unknown' to priority_tier check constraint - Fix indices to include last_seen_at desc and priority_score desc per spec - Add --min-priority-score and --priority-tier CLI flags for optional filtering - Enhance dry-run with top_priority_jobs, low_priority_count, priority_tiers - Add source-quality summary (recruiter/aggregator/raw-jd-fallback counts) - Update backfill RPC param name to match column rename

Design covers: - 6-container stack: chrome (Stagehand), daily (cron+FastAPI), cloudflared, prometheus, grafana - Cloudflare Tunnel + Access for public exposure of /vnc /cdp /api /jobs /grafana - GHCR + Watchtower pull-based deploy - Phased acceptance criteria with verification commands Worktree: feat/daily-microservice (branched from main).

Critical fixes: - Add prereq section for autocli BrowserBridge CDP-wiring patch - Fix cargo build to use package name 'autocli' (was '-cli') - Switch /jobs to client.schema('jobs').table('jobs') API - Use /json/list + page target (was /json/version, browser-level) - Rewrite ws host localhost->autocli-chrome:9222 - Standardize on SUPABASE_SERVICE_ROLE_KEY - Make API_RUN_TOKEN actually enforced + tested in Phase 4 - Add machine-verifiable Cloudflare Access gate before cdp ingress High-severity fixes: - Feature branches publish :branch-*+:sha-* only; :main from main - Pin cloudflared/prometheus/grafana to specific semver - Switch Cloudflare Tunnel to --token mode (no config.yml mix) - Replace path routes with 5 subdomains (avoids prefix-strip) - Split Access into two policies per Application (Token OR Email) - Drop Grafana Infinity plugin dependency - VNC password generated random in prod (no 'stagehand' default) - shred only temp copy of operator secrets, never the source - Unify retry to 3-attempts/15-60-240s across code+runbook+metrics - Add explicit restart: unless-stopped to autocli-daily - Specify Prometheus metrics_path: /api/metrics - Unified CI build context = repo root for both Dockerfiles - Note GHCR creds already configured on target host

Bugs: - L103: component table referenced stale /metrics path -> /api/metrics - L209/L236: github.ref_name with '/' produces invalid Docker tags; switch to docker/metadata-action's type=ref,event=branch which slugifies - L321: /json/new requires PUT, not POST (Chrome >= M86) - L354: jobs.autocli/ routed to backend root but /jobs is the actual route; drop the jobs subdomain entirely, serve via api.autocli/jobs (4 subdomains) - L473: Phase 0 build context disagreed with CI; unify on repo root - L522: Phase 4 step 2 implied Service Token works on vnc/grafana where no machine policy exists; split per-subdomain expectations - L526: Phase 4 probed cdp.autocli before the spec said cdp ingress was added; split Phase 4 into 4a (pre-CDP gate) / 4b (add cdp ingress) / 4c (cdp probes) - L549: Phase 5 status call missing Bearer Risks: - L486: Phase 1 status call missing Bearer; added Also: - Fix '6 services' / '6 new containers' counts; actual count is 5 - Update §2.2 boundaries note from /json/version to /json/list + PUT /json/new

Find or create a CDP page target on autocli-chrome:9222. - GET /json/list, pick first type:page - if list is empty, PUT /json/new?about:blank (Chrome >= M86) - rewrite host (localhost:9223 -> autocli-chrome:9222) so the WS URL is reachable from the daily container's network namespace - write to /run/cdp-endpoint.env (sourced by run-daily.sh) - 60s retry budget; exit 1 on timeout (entrypoint exits non-zero, restart: unless-stopped recreates container until chrome ready).

- flock -n to prevent cron + /api/run from colliding - per-attempt cdp-discover refresh (page id may have rotated) - runs autocli linkedin recommended -> JSON -> sync_autocli_jobs.py - unified retry: 3 attempts at 15s/60s/240s (SPEC §5.2) - writes /data/output/last_run.json consumed by /api/status.

Boot-time cdp-discover gate, then runs supercronic + uvicorn in parallel under tini. wait -n exits as soon as either child dies, so compose's restart policy can pick up failure modes (e.g. uvicorn panic, supercronic crash).

03:00 daily LinkedIn pull + 04:00 30-day output retention sweep (SPEC §5.2). TZ resolved by the container's TZ=Europe/London.

After rebase onto local main, scripts/job_priority_scorer.py and scripts/job_priority_config.py are present. sync_autocli_jobs.py imports them at runtime, so the daily image must ship all three.

uv-managed; pins fastapi/uvicorn/supabase/prometheus-client/httpx to compatible ranges. Lockfile checked in so the Dockerfile's 'uv sync --frozen' is reproducible.

Used by POST /api/run to spawn run-daily.sh non-blockingly. is_running() is a non-destructive flock probe so /api/status can report in_progress without affecting the actual run.

Routes per SPEC §5.1: GET /api/health [open] chrome reachability + cdp file probe GET /api/metrics [open] Prometheus exposition (delta-aware counters) GET /api/status [Bearer] last_run.json + in_progress POST /api/run [Bearer] spawn run-daily.sh, 409 if already running GET /api/logs [Bearer] tail of latest log (default 200 lines) GET /jobs [Bearer] Supabase 'jobs.jobs' read proxy via client.schema('jobs').table('jobs'). Import style B: 'import trigger' (flat), because entrypoint.sh does 'cd /app/api && uvicorn main:app' — no package context, flat import works.

9 tests covering: - /api/status, /api/run, /api/logs, /jobs all return 401 without Bearer and 401 with wrong Bearer - /api/status default-shape + reflects last_run.json - /api/metrics is open and contains the autocli_daily_ family - /api/health returns 503 when chrome:9222 unreachable. conftest.py adds deploy/daily/api to sys.path (flat import, matching entrypoint.sh's 'cd /app/api && uvicorn main:app' invocation). Prometheus registry is cleared before each fresh module import to avoid duplicate-timeseries errors across test fixtures.

Single job scraping autocli-daily:8080/api/metrics every 15s. metrics_path is required because FastAPI mounts under /api/*.

- Datasource: Prometheus at prometheus:9090 (uid prom-autocli) - Dashboard provider points at /etc/grafana/provisioning/dashboards - autocli.json: time-since-last-run, last exit code, rows-upserted-today, CDP-up %, daily scraped/upserted/skipped time series, duration - No plugin dependencies (Infinity dropped per L313 review).

5 services on shared autocli-net bridge: - autocli-chrome (Stagehand, watchtower-tracked, healthcheck on 9222) - autocli-daily (cron+FastAPI, watchtower-tracked, depends_on chrome healthy, env scoped to Supabase creds only) - cloudflared (Tunnel token mode, depends_on daily healthy) - prometheus (pinned, 90-day retention) - grafana (pinned, anon disabled, signup disabled, admin from env) Named volumes for profile / output / tsdb / grafana state.

Binds host ports under non-conflicting numbers (6081/5902/9223/8081/ 9091/3001) so the operator can keep their existing local Chrome and Grafana running alongside. cloudflared moved to a 'disabled' profile.

All required environment variables with empty values + inline generator hints. Real .env never committed (.gitignore already covers it under '.env').

Quickstart, Cloudflare dashboard checklist, forced-run snippet, common-failure table. Points back at SPEC + PLAN for the why.

3 jobs: 1. build-autocli-binary: cargo build --release -p autocli on ubuntu-latest (linux/amd64) with Swatinem cache; uploads artifact 2. build-chrome-image: builds deploy/chrome from repo-root context; docker/metadata-action generates :main on main, :branch-<slug> on feature branches, :sha-<short> always 3. build-daily-image: downloads the autocli artifact, builds deploy/daily from repo-root context, same tag policy Path filters include rust-toolchain.toml so a toolchain bump triggers a rebuild.

The placeholder value was wrong (build failed with 'computed checksum did NOT match'). Verified by downloading the GitHub release asset and computing sha1sum from the operator's laptop.

CI builds the binary as a separate job and uploads as artifact; Phase 0 locally rebuilds inside a Docker rust container and writes to deploy/daily/bin/. Never commit this file (it's ~8MB).

rick-ubuntu-ssh tunnel's running replica is 2026.3.0 (per Zero Trust dashboard). Our container joins as a 2nd HA replica; matching the connector version avoids mixed-version edge cases.

Prod host (100.108.80.9) already has a process bound to :5900, so the 5900:5900 mapping failed container networking. Native VNC is only a local convenience and is NOT part of the Cloudflare ingress; noVNC on 6080 (+ vnc.autocli route) is the real access path. Container still listens on 5900 internally for websockify -> noVNC.

Chrome DevTools rejects /json* and /devtools Host headers that aren't an IP or localhost. Reaching autocli-chrome by docker service name failed with 'Host header is specified and is not an IP address or localhost'. - cdp-discover.sh: resolve CHROME_HOST -> container IP (getent, python fallback); use the IP for the /json probe AND the rewritten ws:// URL so every Host header Chrome sees is an IP. Re-resolved each run. - main.py /api/health: send Host: localhost on the liveness probe (yes/no check, body unused). Found during Phase 3 server bring-up; daily container was crash-looping on 'chrome unreachable after 60s' despite DNS + same-network OK.

Free Cloudflare zones get Universal SSL covering only <zone> + one-level *.<zone>. Two-level subdomains like vnc.autocli.<zone> handshake-fail ('Unauthorized' / sslv3 alert) until the operator upgrades to Pro, Total TLS, or ACM. Rename across SPEC / PLAN / README: vnc.autocli.<zone> -> autocli-vnc.<zone> cdp.autocli.<zone> -> autocli-cdp.<zone> api.autocli.<zone> -> autocli-api.<zone> grafana.autocli.<zone> -> autocli-grafana.<zone> §9 risk nashsu#4 now documents the Free-plan SSL constraint as the reason for the flat naming.

Host ubuntu-latest gives GLIBC 2.39 binaries that fail to load in the daily runtime image (Debian Bookworm = GLIBC 2.36) with 'GLIBC_2.39 not found'. Pin build container to rust:1.94-slim-bookworm so binary GLIBC requirements match runtime. Also adds a readelf-based check that fails the build if the binary's max GLIBC requirement exceeds 2.36.

`source /run/cdp-endpoint.env` only sets a shell variable; without export, the autocli child process never sees AUTOCLI_CDP_ENDPOINT and falls through to BrowserBridge's daemon path ("Chrome is not running"). Wrap source with `set -a`/`set +a` so the assignment auto-exports as an env var that survives across fork/exec.

sync_autocli_jobs.py pretty-prints its summary with indent=2: { "input_rows": 573, "upserted": 573, ... } The old run-daily.sh did 'grep "^{" log | tail -1' which matched only the opening '{' line, yielding invalid JSON. Subsequent jq parses failed silently, --argjson got empty values, the final jq -n -> dev/null overwrote LAST_RUN_JSON with an empty file. Fix: redirect sync stdout to /tmp/sync-DATE-N.json, also append to log, then jq parses the captured JSON directly. Status now correctly reflects rows_scraped/upserted/skipped from each run.

When run-daily.sh did 'exec 9>LOCK; flock 9' and then invoked autocli, bash's FD 9 inherited into the autocli process by default. If autocli took the daemon-path fallback (pre-env-export fix; or any future code path that spawns a daemon), the detached 'autocli --daemon' child inherited FD 9 too and held the lock for its lifetime. is_running() then returned True forever, breaking /api/status. Add '9>&-' to autocli and uv invocations so children can't see or hold the lock. Verified by /proc/<pid>/fd inspection in production.

cdp.rs (item 1): IPage::close was sending Browser.close, which kills the SHARED Chrome in CDP-direct mode (and every other consumer attached to it). Made it a no-op with explanation. Callers that need per-page cleanup should send Target.closeTarget directly. entrypoint-vnc.sh (item 2): -nopw was overriding -rfbauth and leaving VNC open with no password. Anyone reaching :5900/6080 (via Tailscale or any leaked path) could drive the logged-in browser. Removed the flag; password auth from /root/.vnc/passwd is now enforced. docker-compose.yml (item 3 + defense-in-depth on 6080): bound both 6080 and 9222 host ports to 127.0.0.1 only. Public path is Cloudflare Tunnel + Access; direct host-port access would bypass every auth layer. Backup: 'ssh -L 6080:localhost:6080' from a Tailscale-connected box. backfill_priority_scores.py (items 5 + 6): client.table('jobs.jobs') queried a literal 'jobs.jobs' name in public schema (always 0 rows); fixed to client.schema('jobs').table('jobs'). Filter also moved from priority_score.is.null (already NOT NULL DEFAULT 0 post-migration, so matches nothing) to priority_scored_at.is.null (the only honest 'never scored' signal). crontab + Dockerfile + .env.example (items 8 + 9): CRON_SCHEDULE and OUTPUT_RETENTION_DAYS env vars were placebos — supercronic reads /etc/cron.d/autocli verbatim and does not env-substitute. Dropped the misleading env knobs from compose / Dockerfile / .env.example and added a comment in crontab explaining the contract. NOT addressed in this commit: - Item 4 (migration upsert priority overwrite) — needs a follow-up migration; pre-existing in main. - Item 7 (/jobs schema) — empirically returns 500 rows with a loose filter; PostgREST DOES expose the jobs schema in this project. The reviewer's hypothesis was incorrect for this Supabase config. Pushing back on this one with evidence. - Items 10, 11 — pre-existing sync_autocli_jobs.py issues from main; worth a separate cleanup PR.

Items 1, 2, 3 from PR review #4466756456: 1) New migration 20260516120000_fix_priority_upsert_data_loss.sql: recreates jobs.upsert_job so the ON CONFLICT DO UPDATE branches on the function PARAMETER (p_priority_score IS NOT NULL) instead of excluded.priority_score (which the INSERT body had already coerced from NULL to 0, making the case-when always true and silently zeroing prior scores). Same correction for priority_tier / scorer_version / signals / scored_at. Applied to production via Supabase MCP — verified success: True. 2) New migration 20260516120100_enable_jobs_jobs_rls.sql + GRANT migration: turns on RLS on jobs.jobs with a select-only policy for anon/authenticated, grants USAGE on the jobs schema and SELECT on the table to those roles. Server .env now uses the real anon JWT for SUPABASE_ANON_KEY (sync writes still use SUPABASE_SERVICE_ROLE_KEY which bypasses RLS). Combined with Cloudflare Access + Bearer this gives defence in depth. 3) /jobs endpoint now filters on created_at (database insert time) instead of post_time (LinkedIn original posting date — almost always older than today for fresh scrapes). Doc string updated; created_at added to the SELECT projection so clients can see it. Verified by direct REST against PostgREST + by python-in-container test (3 rows returned for since=today).

Companion to 20260516120100. RLS policies don't grant SELECT; PostgREST also needs the role to have USAGE on the schema and SELECT on the table. Already applied to production via Supabase MCP but the file was missing from the PR — without it a fresh project provisioning from these migrations would have count=0 on /jobs until the GRANT was applied manually.

feat: daily LinkedIn microservice + autocli CDP wiring + supporting fixes

Rick Sanchez and others added 30 commits April 29, 2026 02:18

docs: add LinkedIn native recommended implementation plan

9d35915

docs: rework LinkedIn plan — fix capture timing, detail responses, re…

e338d64

…quest signatures, pagination, and test commands

linkedin recommended: add --with_jd and fetch descriptions

9f98ebd

docs: add JD pipeline changelog

dd9f1eb

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(linkedin): add external_url for offsite apply

7a0bfda

docs: design ATS form intelligence worker

aa1f540

fix(linkedin): retry detail fetches to fill external_url

bf0f474

Merge branch 'codex/ats-form-intelligence'

08ee442

Normalize job fields before scoring

0c25ca8

Co-authored-by: Codex <noreply@openai.com>

fix: priority scoring backfill + JD fallback

51b9a5c

Merge branch 'feat/job-priority-scoring-v1'

7450fe9

chore: remove tracked local artifacts

fbb21f9

RickSanchez88E_a8cc and others added 30 commits May 16, 2026 02:14

feat(deploy): daily entrypoint.sh

e19ff41

Boot-time cdp-discover gate, then runs supercronic + uvicorn in parallel under tini. wait -n exits as soon as either child dies, so compose's restart policy can pick up failure modes (e.g. uvicorn panic, supercronic crash).

feat(deploy): supercronic crontab

6cf906a

03:00 daily LinkedIn pull + 04:00 30-day output retention sweep (SPEC §5.2). TZ resolved by the container's TZ=Europe/London.

fix(deploy): copy job_priority_scorer + config into daily image

2b9a1ab

After rebase onto local main, scripts/job_priority_scorer.py and scripts/job_priority_config.py are present. sync_autocli_jobs.py imports them at runtime, so the daily image must ship all three.

feat(deploy): FastAPI project metadata + lockfile

52c7277

uv-managed; pins fastapi/uvicorn/supabase/prometheus-client/httpx to compatible ranges. Lockfile checked in so the Dockerfile's 'uv sync --frozen' is reproducible.

feat(deploy): trigger.py — shared run-daily executor

05ad14f

Used by POST /api/run to spawn run-daily.sh non-blockingly. is_running() is a non-destructive flock probe so /api/status can report in_progress without affecting the actual run.

feat(deploy): prometheus scrape config

f64ffb3

Single job scraping autocli-daily:8080/api/metrics every 15s. metrics_path is required because FastAPI mounts under /api/*.

feat(deploy): local-only override

1e9f37b

Binds host ports under non-conflicting numbers (6081/5902/9223/8081/ 9091/3001) so the operator can keep their existing local Chrome and Grafana running alongside. cloudflared moved to a 'disabled' profile.

feat(deploy): .env.example template

fac1f4d

All required environment variables with empty values + inline generator hints. Real .env never committed (.gitignore already covers it under '.env').

docs(deploy): operator-facing README + runbook

5619a6d

Quickstart, Cloudflare dashboard checklist, forced-run snippet, common-failure table. Points back at SPEC + PLAN for the why.

fix(deploy): correct supercronic v0.2.30 sha1sum

a5f55f5

The placeholder value was wrong (build failed with 'computed checksum did NOT match'). Verified by downloading the GitHub release asset and computing sha1sum from the operator's laptop.

chore: gitignore Phase 0 local autocli binary output

e8a9063

CI builds the binary as a separate job and uploads as artifact; Phase 0 locally rebuilds inside a Docker rust container and writes to deploy/daily/bin/. Never commit this file (it's ~8MB).

fix(deploy): pin cloudflared to 2026.3.0 (match live tunnel replica)

e692a40

rick-ubuntu-ssh tunnel's running replica is 2026.3.0 (per Zero Trust dashboard). Our container joins as a 2nd HA replica; matching the connector version avoids mixed-version edge cases.

Merge pull request #2 from RickSanchez88E/feat/daily-microservice

d074028

feat: daily LinkedIn microservice + autocli CDP wiring + supporting fixes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(linkedin): add recommended jobs adapter with GraphQL pagination support#51

feat(linkedin): add recommended jobs adapter with GraphQL pagination support#51
RickSanchez88E wants to merge 71 commits into
nashsu:mainfrom
RickSanchez88E:main

RickSanchez88E commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RickSanchez88E commented Apr 29, 2026

Description

Technical Details

Output Columns

Usage

How to Test

Known Quirks / Pitfalls

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant