-
Notifications
You must be signed in to change notification settings - Fork 2
feat(kb-enrich): pluggable collector architecture #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| --- | ||
| name: gh | ||
| priority: 4 | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not sure we're getting a ton from priority?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The main place it matters is openspec (0) needing to run before opencode (5) so the session exclusion set is built first. The middle-ground ones (slack, linear, gh) don't have hard ordering requirements. Happy to simplify — could drop priority and just document the openspec-before-opencode dependency explicitly in the orchestrator instead. |
||
| authoritative_for: [shipped-code, reviews] | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not sure i understand how this is used
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's metadata the orchestrator doesn't act on — mainly documentation of what each collector is the source of truth for (to help a reader understand dedup decisions, like why openspec sessions take precedence over opencode session transcripts). Happy to drop it if it feels like overhead without benefit. |
||
| description: GitHub PRs you authored or reviewed in the enrichment window | ||
| # skip_bots: commit authors / PR actors to ignore | ||
| skip_bots: [dependabot] | ||
| --- | ||
|
Comment on lines
+6
to
+8
|
||
|
|
||
| ## Enabled check | ||
|
|
||
| Read GitHub orgs from chezmoi data: | ||
|
|
||
| ```bash | ||
| ORGS=$(chezmoi data --format json | jq -r '[.orgs | keys[]] | join(" ")') | ||
| ``` | ||
|
|
||
| If `ORGS` is empty, search across all repos you have access to (no org filter). If non-empty, scope searches with an `org:NAME` filter per org. | ||
|
|
||
| ## How to query | ||
|
|
||
| ```bash | ||
| # PRs you opened or updated | ||
| gh search prs --author "@me" --state all --updated ">=YYYY-MM-DD" \ | ||
| --json number,title,repository,state,updatedAt,body | ||
|
|
||
| # PRs you were asked to review | ||
| gh search prs --review-requested "@me" --updated ">=YYYY-MM-DD" \ | ||
| --json number,title,repository,state,updatedAt | ||
| ``` | ||
|
|
||
| Add `org:NAME` to each query for every org in `ORGS` (run one query per org, or combine with multiple `org:` terms in a single search string). | ||
|
|
||
| ## What to extract | ||
|
|
||
| - Merged PRs — what shipped | ||
| - Review comments that surfaced decisions | ||
| - Linked issues | ||
|
|
||
| ## What to skip | ||
|
|
||
| - Draft PRs | ||
| - Commits and PRs authored by bots listed in `skip_bots` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| --- | ||
| name: linear | ||
| priority: 3 | ||
| authoritative_for: [tickets, completed-work] | ||
| description: Linear issues you touched in the enrichment window | ||
| --- | ||
|
|
||
| ## Enabled check | ||
|
|
||
| Load the `linear` skill. If no Linear API token is available (the skill cannot authenticate), skip this collector and log "linear: no auth, skipping". | ||
|
|
||
| ## How to query | ||
|
|
||
| Use the Linear GraphQL API (endpoint `https://api.linear.app/graphql`) via the `linear` skill. Query issues assigned to or created by you that were updated within the enrichment window: | ||
|
|
||
| ```graphql | ||
| { | ||
| issues( | ||
| filter: { | ||
| updatedAt: { gte: "YYYY-MM-DDT00:00:00Z" } | ||
| or: [ | ||
| { assignee: { isMe: { eq: true } } } | ||
| { creator: { isMe: { eq: true } } } | ||
| ] | ||
| } | ||
| first: 50 | ||
| ) { | ||
| nodes { | ||
| identifier | ||
| title | ||
| state { name } | ||
| updatedAt | ||
| description | ||
| url | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ## What to extract | ||
|
|
||
| - Newly created tickets | ||
| - Status changes (especially to Done/Completed) | ||
| - Decisions captured in descriptions or comments | ||
| - Any ticket closed in the window (signals completed work not otherwise visible in git) | ||
|
|
||
| ## What to skip | ||
|
|
||
| - Bot-generated or auto-updated tickets | ||
| - Tickets you are only a watcher on with no direct activity |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| --- | ||
| name: opencode | ||
| priority: 5 | ||
| authoritative_for: [coding-sessions] | ||
| description: opencode coding sessions from the local session store | ||
| --- | ||
|
|
||
| ## How to query | ||
|
|
||
| Query the session store SQLite database at `~/.local/share/opencode/opencode.db`. | ||
|
|
||
| > **Note:** The session exclusion / dedup step is handled by the orchestrator before this collector runs. By the time this collector is called, it receives a list of directories to exclude (sessions covered by archived OpenSpec changes). Apply the `NOT IN (...)` clause below. | ||
|
|
||
| ```sql | ||
| SELECT id, directory, title, time_updated | ||
| FROM session | ||
| WHERE time_updated BETWEEN :start_ms AND :end_ms | ||
| AND directory NOT IN ('/abs/worktree/a', '/abs/worktree/b'); | ||
| -- The excluded directories are passed in by the orchestrator. | ||
| -- Only sessions NOT in the exclusion set get a transcript read. | ||
| -- For excluded sessions, the orchestrator narrates from the change's design.md/specs instead. | ||
| ``` | ||
|
|
||
| `time_updated` is epoch-milliseconds. | ||
|
|
||
| ## What to extract | ||
|
|
||
| - Work done in sessions not covered by an OpenSpec archived change | ||
| - Project context, technical decisions made interactively, approaches tried | ||
|
|
||
| ## What to skip | ||
|
|
||
| - Sessions whose `directory` is in the exclusion set (covered by the durable OpenSpec change artifacts — see the orchestrator's dedup step) | ||
| - Sessions with no substantive content (e.g. very short duration, no meaningful tool calls) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| --- | ||
| name: openspec | ||
| priority: 0 | ||
| authoritative_for: [implement-work, design-decisions, rejected-alternatives] | ||
| description: OpenSpec durable store — authoritative source for /implement work; read BEFORE other collectors to build the session exclusion set | ||
| --- | ||
|
|
||
| ## Why priority 0 | ||
|
|
||
| This collector runs first. Its primary job is building the **session exclusion set** used by the `opencode` collector — a list of worktree paths whose sessions are already covered by an archived OpenSpec change and should not get a redundant (token-expensive) transcript read. | ||
|
|
||
| ## How to query | ||
|
|
||
| For each date being enriched, read every `$KB_ROOT/openspec/*/changes/archive/<date>-*/kb-meta.yaml`. Collect the `worktree:` value from each. That set is the exclusion list passed to the `opencode` collector. | ||
|
|
||
| ```bash | ||
| # Collect worktrees for a given date | ||
| for meta in $KB_ROOT/openspec/*/changes/archive/<date>-*/kb-meta.yaml; do | ||
| grep '^worktree:' "$meta" | awk '{print $2}' | ||
| done | ||
|
Comment on lines
+18
to
+20
|
||
| ``` | ||
|
|
||
| Then read each archived change's `design.md`: | ||
|
|
||
| ``` | ||
| $KB_ROOT/openspec/*/changes/archive/<date>-*/design.md | ||
| ``` | ||
|
|
||
| READ these (do not copy them — the artifacts are already in the KB via the symlink). Extract decisions, the "why", and rejected alternatives for the decisions log. | ||
|
|
||
| Also read the durable `specs/` for standing requirements: | ||
|
|
||
| ``` | ||
| $KB_ROOT/openspec/*/specs/ | ||
| ``` | ||
|
|
||
| ## What to extract | ||
|
|
||
| - Decisions, rationale, and rejected alternatives from `design.md` files | ||
| - The set of worktree paths (→ exclusion list for the `opencode` collector) | ||
|
|
||
| ## What to skip | ||
|
|
||
| - Re-narrating or duplicating the full design content in the journal — reference the durable store artifacts instead | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| --- | ||
| name: slack | ||
| priority: 2 | ||
| authoritative_for: [informal-decisions, action-items, contact-info] | ||
| description: Slack messages — your sent messages and mentions, in the enrichment window | ||
| --- | ||
|
|
||
| ## How to query | ||
|
|
||
| Token is in `~/.config/team-context-mcp/.env` as `SLACK_USER_TOKEN`. Set `SLACK_USER_ID` to your Slack user ID (find it in your Slack profile → "Copy member ID"). | ||
|
|
||
| ```bash | ||
| SLACK_TOKEN=$(grep SLACK_USER_TOKEN ~/.config/team-context-mcp/.env | cut -d= -f2) | ||
| SLACK_USER_ID="<your-slack-user-id>" # e.g. U01ABC23DEF | ||
|
Comment on lines
+10
to
+14
|
||
|
|
||
| # Your messages in the date window | ||
| curl -s "https://slack.com/api/search.messages?query=from:me+after:YYYY-MM-DD+before:YYYY-MM-DD&count=20&sort=timestamp" \ | ||
| -H "Authorization: Bearer $SLACK_TOKEN" | ||
|
|
||
| # Mentions of you in the date window | ||
| curl -s "https://slack.com/api/search.messages?query=%3C${SLACK_USER_ID}%3E+after:YYYY-MM-DD+before:YYYY-MM-DD&count=20&sort=timestamp" \ | ||
| -H "Authorization: Bearer $SLACK_TOKEN" | ||
|
|
||
| # Read a thread (get replies) | ||
| curl -s "https://slack.com/api/conversations.replies?channel=CHANNEL_ID&ts=THREAD_TS" \ | ||
| -H "Authorization: Bearer $SLACK_TOKEN" | ||
| ``` | ||
|
|
||
| Slack is high-volume; read selectively to keep token cost manageable. | ||
|
|
||
| ## What to extract | ||
|
|
||
| - Decisions announced or confirmed in Slack that didn't appear in a Granola meeting | ||
| - Action items assigned to you or by you that aren't already in Linear | ||
| - New contact info (Slack handles, email addresses) for people profiles | ||
| - Customer or partner names that surfaced in conversation | ||
|
|
||
| ## What to skip | ||
|
|
||
| - Routine standup threads already covered by Granola | ||
| - Emoji reactions and short acknowledgments ("👍", "sounds good") | ||
| - HR, compensation, or personal channels (privacy) | ||
| - Anything already captured from a Granola meeting for the same day | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line conflicts with the re-run guard above. i like this version better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed — the date-range logic in Step 1 already covers this (existing journal files become the last-run marker, so a date that's already enriched is simply last date + 1 and won't re-run).