feat(kb-enrich): pluggable collector architecture by stephengolub · Pull Request #31 · athal7/dotfiles

stephengolub · 2026-06-26T17:25:23Z

Summary

Refactors /kb-enrich into a two-layer design: a thin orchestrator command plus a directory of self-contained collector files, one per data source.

Changes

`commands/kb-enrich.md` (rewritten as 5-step orchestrator)

The command no longer hard-codes any sources. It now:

Resolves KB_ROOT and a re-run guard
Resolves the date range (same gap-healing logic as before)
Loads collectors from ~/.config/opencode/kb-collectors/*.md — reads frontmatter to get name, enabled, priority; sorts by priority
Runs collectors in order, applying each one's embedded query recipe and triage rules
Writes journal, profiles, decisions, and action items

`kb-collectors/` (new directory, 7 collector files)

File	Priority	Source
`openspec.md`	0	OpenSpec durable store — builds session exclusion set first
`granola.md`	1	Meeting notes, decisions, people-contact
`slack.md`	2	Informal decisions, action items
`linear.md`	3	Tickets, completed work
`gh.md`	4	Shipped PRs, reviews
`opencode.md`	5	Coding sessions (exclusion applied)
`google-chat.md`	6	Disabled by default; opt-in for Chat

Each file is self-contained: YAML frontmatter (name, enabled, priority, authoritative_for) + a body with query recipe, triage rules, and extraction rules.

Why

Adding a new source = drop a .md file in kb-collectors/. Zero changes to the orchestrator.
Per-machine config (org lists, workspace slugs, token paths) lives in each collector's frontmatter — no need to touch the orchestrator.
Disabling a source = enabled: false in frontmatter.
The orchestrator stays stable; collectors evolve independently.

The session exclusion (OpenSpec dedup) logic is preserved and works the same way — the openspec collector runs at priority 0 and builds the exclusion set before opencode runs at priority 5.

Refactors /kb-enrich into a two-layer design: a thin orchestrator command plus a directory of self-contained collector files, one per data source. Changes: - commands/kb-enrich.md: rewritten as a 5-step orchestrator (resolve config, resolve date range, load collectors, run collectors, enrichment). Sources are no longer hard-coded; the orchestrator reads whatever collectors are present in kb-collectors/. - kb-collectors/ (new): one .md file per source, each with YAML frontmatter (name, enabled, priority, authoritative_for) and a body with query recipe, triage rules, and extraction rules. - openspec.md (priority 0) — builds session exclusion set first - granola.md (priority 1) — meetings, decisions, people-contact - slack.md (priority 2) — informal decisions, action items - linear.md (priority 3) — tickets, completed work - gh.md (priority 4) — shipped PRs, reviews - opencode.md (priority 5) — coding sessions (with exclusion applied) - google-chat.md (priority 6, disabled by default) — opt-in for Chat To add a new source: drop a .md file into kb-collectors/. No changes to the orchestrator needed. To disable a source or configure per-machine values (org lists, workspace slugs, token paths), edit the collector's frontmatter.

athal7

i like the overall shape, a few things i would probably do differently, and a few things that aren't relevant to my dotfiles

athal7 · 2026-06-26T18:25:10Z

-1. **Collect excluded worktrees.** For each date being enriched, read every `~/.local/share/kb/openspec/*/changes/archive/<date>-*/kb-meta.yaml` and collect its `worktree:` value (the absolute repo/worktree root, stamped at archive). That set is the exclusion list.
-2. **Skip those sessions.** When scanning **opencode** sessions, a session is identified by its `directory` column in the `session` table of `~/.local/share/opencode/opencode.db`. SKIP any session whose `directory` is in the exclusion set — for those, narrate from the change's `design.md`/specs, not the transcript. Only sessions NOT covered by an archived change get a transcript read.
-3. **Filter at query time.** Pass the collected worktrees as the `NOT IN (...)` list and bound by the date window (`time_updated` is epoch-ms):
+Enrich the gap since the last run, not a hard single day. The most recent `$KB_ROOT/journal/YYYY-MM-DD.md` is the last-run marker: enrich each date from (last journal date + 1) through today, inclusive. This makes a Monday run sweep the trailing weekend and lets a skipped run self-heal on the next run. If no prior journal exists, default to today. An explicit date or range in `$ARGUMENTS` overrides this.


this line conflicts with the re-run guard above. i like this version better

Removed — the date-range logic in Step 1 already covers this (existing journal files become the last-run marker, so a date that's already enriched is simply last date + 1 and won't re-run).

athal7 · 2026-06-26T18:27:24Z


-## Enrichment Steps
+1. Parse the YAML frontmatter to get `name`, `enabled`, `priority`, `authoritative_for`.
+2. Skip any collector with `enabled: false`.


i think external config may be better, so as to not dirty chezmoi state

how do you mean?

Dropped enabled from frontmatter entirely — presence of the file is now the toggle. Disabling a collector means not adding/removing the file, which never touches a chezmoi-tracked source file.

athal7 · 2026-06-26T18:28:46Z

+# orgs: GitHub orgs to scope searches to. When set, each org is added as
+# `org:NAME` to the search query. Leave empty for no org filter (searches
+# across all repos you have access to).
+orgs: []


there is a separate github org config in local vars, would be great to reuse that

oh yea, you have that set up for chezmoi, I don't. I probably SHOULD.

Updated — now reads from chezmoi data --format json | jq '[.orgs | keys[]]'. Falls back to no org filter if orgs is empty.

athal7 · 2026-06-26T18:29:32Z

+name: gh
+enabled: true
+priority: 4
+authoritative_for: [shipped-code, reviews]


not sure i understand how this is used

It's metadata the orchestrator doesn't act on — mainly documentation of what each collector is the source of truth for (to help a reader understand dedup decisions, like why openspec sessions take precedence over opencode session transcripts). Happy to drop it if it feels like overhead without benefit.

athal7 · 2026-06-26T18:31:11Z

+---
+name: gh
+enabled: true
+priority: 4


not sure we're getting a ton from priority?

The main place it matters is openspec (0) needing to run before opencode (5) so the session exclusion set is built first. The middle-ground ones (slack, linear, gh) don't have hard ordering requirements. Happy to simplify — could drop priority and just document the openspec-before-opencode dependency explicitly in the orchestrator instead.

athal7 · 2026-06-26T18:31:45Z

@@ -0,0 +1,38 @@
+---


i think collectors should be user specific, and i don't use this (or granola)

Oh, yea, those shouldn't have gotten shipped to you, I'll remove them.

Removed both. Agreed — collectors should be user-specific files you add to your own dotfiles, not shipped in the base set.

athal7 · 2026-06-26T18:32:30Z

+If `workspace` is set in frontmatter, pass it with `--workspace`:
+
+```bash
+linear issue mine --updated-after YYYY-MM-DD --all-states --no-pager


i don't use linear cli :)

Switched to the GraphQL API via the linear skill — same approach as your other Linear integrations.

- Remove re-run guard (redundant with Step 1 date-range logic) - Drop 'enabled' frontmatter field; presence/absence of file is the toggle - gh.md: read orgs from chezmoi data instead of hardcoded frontmatter - linear.md: switch from linear CLI to GraphQL API via 'linear' skill - Remove google-chat.md and granola.md (user-specific collectors)

Copilot

Pull request overview

Refactors the /kb-enrich OpenCode command into an orchestrator-style recipe that discovers and runs per-source “collector” markdown files under ~/.config/opencode/kb-collectors/, enabling source-specific query/extraction rules and preserving the OpenSpec-first session dedup flow.

Changes:

Rewrites kb-enrich into stepwise orchestration: resolve config, resolve date window, load collectors, apply OpenSpec-based session exclusion, run collectors, then write KB outputs.
Adds collector recipe files for OpenSpec, OpenCode sessions, Slack, Linear, and GitHub.
Moves source-specific querying/triage/extraction guidance into self-contained collector documents.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
dot_config/opencode/commands/kb-enrich.md	Orchestrator-style KB enrichment flow, including collector loading + OpenSpec dedup sequencing.
dot_config/opencode/kb-collectors/openspec.md	Priority-0 collector defining OpenSpec archive reads and building the session exclusion set.
dot_config/opencode/kb-collectors/opencode.md	Collector for querying the local OpenCode session DB with exclusion applied upstream.
dot_config/opencode/kb-collectors/slack.md	Collector recipe for Slack search + thread reads and extraction/skip guidance.
dot_config/opencode/kb-collectors/linear.md	Collector recipe for Linear issue activity via the Linear skill/GraphQL.
dot_config/opencode/kb-collectors/gh.md	Collector recipe for GitHub PR/review activity via `gh search prs`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+Collectors live in `~/.config/opencode/kb-collectors/`. Each file is a self-contained markdown recipe that describes one data source. Read all `*.md` files in that directory and sort them by the `priority` field in their YAML frontmatter (lower number = runs first). Read each file's body now so you can apply its query recipe and extraction rules during collection.

-**Benign failure modes** (neither loses correctness): a missed match (stale/absent `kb-meta.yaml`) just wastes one transcript read; an over-match (a session in an excluded worktree that wasn't really part of the change) just relies on the better, distilled artifact instead of the transcript.
+Some collectors perform a **runtime enabled check** at the start of their body (e.g. verifying a token or config value exists before proceeding). Honor those checks: if a collector's body says to skip, log the reason and move on.

-## Enrichment Steps
+> **To add a new data source:** drop a new `.md` file into `~/.config/opencode/kb-collectors/`. No changes to this orchestrator needed. To disable a source, remove or don't add its collector file.


+## Step 2 — Load collectors

-   ```sql
-   SELECT id, directory, title, time_updated
-   FROM session
-   WHERE time_updated BETWEEN :start_ms AND :end_ms
-     AND directory NOT IN ('/abs/worktree/a', '/abs/worktree/b');
-   -- returned sessions are the ONLY ones that need a transcript read;
-   -- excluded directories are covered by the durable change artifacts instead.
-   ```
+Collectors live in `~/.config/opencode/kb-collectors/`. Each file is a self-contained markdown recipe that describes one data source. Read all `*.md` files in that directory and sort them by the `priority` field in their YAML frontmatter (lower number = runs first). Read each file's body now so you can apply its query recipe and extraction rules during collection.


+for meta in $KB_ROOT/openspec/*/changes/archive/<date>-*/kb-meta.yaml; do
+  grep '^worktree:' "$meta" | awk '{print $2}'
+done


+
+Before running the `opencode` collector, build the session exclusion set from the `openspec` collector output (priority 0, runs first):
+
+For each date being enriched, read `$KB_ROOT/openspec/*/changes/archive/<date>-*/kb-meta.yaml` and collect every `worktree:` value. This set is passed into the `opencode` collector as the `NOT IN (...)` list. Sessions in excluded worktrees are already covered by the durable OpenSpec change artifacts (`design.md`/specs); they do not need a transcript read.


+Token is in `~/.config/team-context-mcp/.env` as `SLACK_USER_TOKEN`. Set `SLACK_USER_ID` to your Slack user ID (find it in your Slack profile → "Copy member ID").
+
+```bash
+SLACK_TOKEN=$(grep SLACK_USER_TOKEN ~/.config/team-context-mcp/.env | cut -d= -f2)
+SLACK_USER_ID="<your-slack-user-id>"  # e.g. U01ABC23DEF


+# skip_bots: commit authors / PR actors to ignore
+skip_bots: [dependabot]
+---


athal7 reviewed Jun 26, 2026

View reviewed changes

athal7 requested a review from Copilot June 27, 2026 12:11

Copilot started reviewing on behalf of athal7 June 27, 2026 12:11 View session

Copilot AI reviewed Jun 27, 2026

View reviewed changes


		Before running the `opencode` collector, build the session exclusion set from the `openspec` collector output (priority 0, runs first):

		For each date being enriched, read `$KB_ROOT/openspec//changes/archive/<date>-/kb-meta.yaml` and collect every `worktree:` value. This set is passed into the `opencode` collector as the `NOT IN (...)` list. Sessions in excluded worktrees are already covered by the durable OpenSpec change artifacts (`design.md`/specs); they do not need a transcript read.

Conversation

stephengolub commented Jun 26, 2026

Summary

Changes

commands/kb-enrich.md (rewritten as 5-step orchestrator)

kb-collectors/ (new directory, 7 collector files)

Why

Uh oh!

athal7 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`commands/kb-enrich.md` (rewritten as 5-step orchestrator)

`kb-collectors/` (new directory, 7 collector files)