Add Cursor agent transcript adapter by MoMitwalyIgniteTech · Pull Request #121 · firstbatchxyz/watchmen

MoMitwalyIgniteTech · 2026-06-05T13:36:57Z

Why

Cursor sessions were invisible to watchmen — the wishlist pencilled a Cursor adapter in as SQLite/state.vscdb reverse-engineering, but Cursor's agent now writes plain JSONL transcripts under ~/.cursor/projects/<slug>/agent-transcripts/, which makes coverage much cheaper than expected.

What changed

New src/watchmen/adapters/cursor.py following the NAME/discover()/scan() contract:
- prompts extracted from <user_query> tags; editor-injected context tags (<attached_files>, <plugin_info>, …) excluded from prompt mining
- human-format <timestamp> tags (Tuesday, Apr 28, 2026, 10:15 AM (UTC+3)) parsed to UTC ISO; older transcripts carry no tags at all, so scan() falls back to file mtime to keep sessions datable
- <manually_attached_skills> entries recorded as pseudo-Skill tool calls — Cursor inlines the skill content, so the SKILL.md-read detection in _shared alone would go blind; regular tool_use paths still go through extract_skill_from_args
- project-dir slugs decode via the existing decode_project_dir (same dash-flattened encoding as Claude Code, minus the leading dash)
- the format records no tokens, model names, or tool results — those columns honestly stay at defaults
Registered in ADAPTERS; cr / Cursor / violet added to the display touchpoints (util.py, metrics.py, insights.py, pipeline.py)
Fixture-driven tests in tests/test_adapter_cursor.py + tests/fixtures/cursor_session.jsonl (12 tests: timestamp parsing incl. noon/midnight/negative/fractional offsets, prompt extraction, skill attribution, mtime fallback, discovery walk, corrupt-file tolerance)
test_unknown_agent_falls_through_to_raw_slug now uses windsurf as its unknown slug — cursor stopped being one
CONTRIBUTING refreshed: the stale "Cursor stores sessions in state.vscdb, no hooks" entry replaced with the two real remaining frontiers (IDE state.vscdb chat history, Cursor 1.7+ hooks host in hooks_setup.py)

Testing

uv run pytest tests/ — 300 passed
uv tool run ruff check src/watchmen tests — clean
uv build — wheel + sdist build
Validated against a real corpus: 126 transcripts across 5 projects (WSL2), 0 parse errors → 979 prompts, 9,310 tool calls, all project dirs decoded to real paths, skill attribution populated (coderabbit-respond ×34, search-company-knowledge ×14, …); watchmen ingest + mission control render the new agent slice end-to-end

Notes

The transcript schema is reverse-engineered and undocumented — Cursor may change it between releases. The adapter degrades gracefully (unparseable lines skipped, missing install silent).
No privacy implications beyond the existing adapters: reads local transcripts only, same as claude_code/codex/pi.
No release/migration/plugin-install implications; the corpus schema is untouched (agent column already generic).

🤖 Generated with Claude Code

The cursor adapter so far reads the IDE chat DB (state.vscdb). Cursor's agent also writes per-session JSONL transcripts to a second, disjoint store — ~/.cursor/projects/<slug>/agent-transcripts/<sid>/<sid>.jsonl — that the IDE DB never sees. This adds that source behind the same NAME="cursor": discover() yields both, scan() dispatches on the entry shape (composer entries carry a composer_id). Transcript-source specifics: - prompts extracted from <user_query> tags; editor-injected context (<attached_files>, <plugin_info>, ...) excluded - human-format <timestamp> tags parsed to UTC ISO; older transcripts carry no tags at all, so scan falls back to file mtime - <manually_attached_skills> entries recorded as pseudo-Skill tool calls (Cursor inlines skill content, so SKILL.md path detection alone goes blind); regular tool_use paths still go through extract_skill_from_args - no tokens/model/tool results exist in this format — those columns stay at defaults - project-dir slugs decode via decode_project_dir (same dash-flattened encoding as Claude Code, minus the leading dash) Also adds cr/Cursor/violet to the display touchpoints (util.py, metrics.py, insights.py, pipeline.py) — the IDE-store adapter landed without them — and refreshes the CONTRIBUTING wishlist (remaining gap is the cursor-agent CLI store at ~/.cursor/chats/, plus hooks). Validated against 126 real transcripts across 5 projects: 0 parse errors, 979 prompts, 9310 tool calls, all project dirs decoded. test_unknown_agent_falls_through_to_raw_slug now uses "windsurf" as its unknown slug — "cursor" stopped being one. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

aktasbatuhan · 2026-06-08T11:59:04Z

Thanks for this, genuinely clean work. I pulled it down and ran it locally and it holds up well:

ruff clean, full suite green (529 passing on my machine, the "300" in the description is just stale)
the two-source design is the right call: one adapter, discover() fans out to both the IDE chat DB and the agent-transcript JSONLs, and scan() dispatches on entry shape. Composer ids stay namespaced cursor/<id> and transcripts use bare UUID stems, so they can't collide in corpus.db. I checked that against the session_id PRIMARY KEY + INSERT OR REPLACE upsert and it's solid.
the schema notes in the header are excellent, and I appreciate the honest defaults: no invented tokens/models/cost, mtime fallback so undateable transcripts still show up, half-hour UTC offsets handled even though they don't appear in real data yet.

I just approved the workflow run so CI can go. This was your first PR here so GitHub held it for manual approval, nothing on your side.

A few small, non-blocking things if you feel like it. None of these gate the merge:

The description still reads like a brand new file, but the rebase landed it as an extension of the existing SQLite adapter. A one-line tweak would make the history read cleaner.
Cross-store question: if a single Cursor agent run ever gets persisted in both the IDE chat DB and the agent-transcript JSONL, we'd surface it twice under two ids. You document the stores as disjoint surfaces, which matches what I'd expect, just flagging it in case you've seen any overlap in your own corpus.
_scan_transcript builds its session dict inline rather than going through _empty(). Minor drift risk if the schema columns change later, totally optional.

Once CI is green I'll merge. Thanks again for picking this up.

aktasbatuhan

Verified locally and CI is green across all platforms including Windows. Clean two-source design, solid docs. Merging. The optional follow-ups above can be a future PR if you want them.

MoMitwalyIgniteTech force-pushed the adapters/cursor branch from fc4c10c to 9dddc1e Compare June 5, 2026 13:46

aktasbatuhan approved these changes Jun 8, 2026

View reviewed changes

aktasbatuhan merged commit 9bdc9c6 into firstbatchxyz:main Jun 8, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Cursor agent transcript adapter#121

Add Cursor agent transcript adapter#121
aktasbatuhan merged 1 commit into
firstbatchxyz:mainfrom
MoMitwalyIgniteTech:adapters/cursor

MoMitwalyIgniteTech commented Jun 5, 2026

Uh oh!

aktasbatuhan commented Jun 8, 2026

Uh oh!

aktasbatuhan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MoMitwalyIgniteTech commented Jun 5, 2026

Why

What changed

Testing

Notes

Uh oh!

aktasbatuhan commented Jun 8, 2026

Uh oh!

aktasbatuhan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants