Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 36 additions & 33 deletions dot_config/opencode/commands/kb-enrich.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,44 +7,51 @@ Run the daily knowledge base enrichment. By default enrich every date since the

$ARGUMENTS

## Sources
## Step 0 — Resolve configuration

Check activity across all available sources:
**KB_ROOT** — the knowledge base root directory. Resolve in this order:
1. The env var `KB_ROOT` if set and non-empty
2. Default: `~/.local/share/kb`

- **opencode** coding sessions
- **slack** chat messages and threads
- **zoom** meeting transcripts — captions live at `~/Documents/Zoom/YYYY-MM-DD HH.MM.SS <Title>/meeting_saved_closed_caption.txt`. For each transcript whose dir-date falls in the enrich window, distill it with the local on-device model first (see Extract step) instead of reading the full raw caption text.
- **linear** issues and comments
- **gh** code reviews, PRs, and issues
- **openspec durable store (AUTHORITATIVE for `/implement` work)** — each worktree's `openspec/` carries two narrow symlinks into a durable per-repo store at `~/.local/share/kb/openspec/<repo-slug>/` (`openspec/specs` → store `specs/`, `openspec/changes/archive` → store `changes/archive/`). At Ship, `openspec archive` moves a completed change through the `changes/archive` symlink into the store, so its artifacts persist regardless of when work shipped. For each date being enriched, read `~/.local/share/kb/openspec/*/changes/archive/<date>-*/design.md` for decisions, the "why", and rejected alternatives, and read each store's durable `specs/` for the standing requirements. These structured artifacts are the source of truth for the reasoning behind completed `/implement` work — use them instead of reconstructing it from full (token-expensive, lossy) session transcripts.
All KB reads and writes in this run use `$KB_ROOT`. Every collector file also receives `$KB_ROOT` as context.

### Session exclusion — the core token-saving dedup
## Step 1Resolve the date range

The openspec store is authoritative for `/implement` work, so the sessions that PRODUCED an archived change must be EXCLUDED from transcript reads. Build the exclusion set and skip those sessions:
Enrich the gap since the last run, not a hard single day. The most recent `$KB_ROOT/journal/YYYY-MM-DD.md` is the last-run marker: enrich each date from (last journal date + 1) through today, inclusive. This makes a Monday run sweep the trailing weekend and lets a skipped run self-heal on the next run. If no prior journal exists, default to today. An explicit date or range in `$ARGUMENTS` overrides this.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line conflicts with the re-run guard above. i like this version better

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed — the date-range logic in Step 1 already covers this (existing journal files become the last-run marker, so a date that's already enriched is simply last date + 1 and won't re-run).


1. **Collect excluded worktrees.** For each date being enriched, read every `~/.local/share/kb/openspec/*/changes/archive/<date>-*/kb-meta.yaml` and collect its `worktree:` value (the absolute repo/worktree root, stamped at archive). That set is the exclusion list.
2. **Skip those sessions.** When scanning **opencode** sessions, a session is identified by its `directory` column in the `session` table of `~/.local/share/opencode/opencode.db`. SKIP any session whose `directory` is in the exclusion set — for those, narrate from the change's `design.md`/specs, not the transcript. Only sessions NOT covered by an archived change get a transcript read.
3. **Filter at query time.** Pass the collected worktrees as the `NOT IN (...)` list and bound by the date window (`time_updated` is epoch-ms):
## Step 2 — Load collectors

```sql
SELECT id, directory, title, time_updated
FROM session
WHERE time_updated BETWEEN :start_ms AND :end_ms
AND directory NOT IN ('/abs/worktree/a', '/abs/worktree/b');
-- returned sessions are the ONLY ones that need a transcript read;
-- excluded directories are covered by the durable change artifacts instead.
```
Collectors live in `~/.config/opencode/kb-collectors/`. Each file is a self-contained markdown recipe that describes one data source. Read all `*.md` files in that directory and sort them by the `priority` field in their YAML frontmatter (lower number = runs first). Read each file's body now so you can apply its query recipe and extraction rules during collection.
Comment on lines +22 to +24

**Benign failure modes** (neither loses correctness): a missed match (stale/absent `kb-meta.yaml`) just wastes one transcript read; an over-match (a session in an excluded worktree that wasn't really part of the change) just relies on the better, distilled artifact instead of the transcript.
Some collectors perform a **runtime enabled check** at the start of their body (e.g. verifying a token or config value exists before proceeding). Honor those checks: if a collector's body says to skip, log the reason and move on.

## Enrichment Steps
> **To add a new data source:** drop a new `.md` file into `~/.config/opencode/kb-collectors/`. No changes to this orchestrator needed. To disable a source, remove or don't add its collector file.
Comment on lines +24 to +28

1. **Extract** people facts, project updates, and decisions from each source.
- **Zoom transcripts:** for each in-window transcript, run `~/.config/opencode/bin/kb-distill <caption-file> "<title>" <date>` and use the returned JSON facts (participants, topics, decisions, action_items, open_questions, summary) in place of the raw caption text when extracting people facts, decisions, and action items. The raw transcript is sent only to the on-device local model (privacy positive); the authoritative do-not-store privacy filter below still applies at the WRITE step. **If `kb-distill` exits non-zero, read the raw transcript yourself instead and note the fallback in the journal.**
2. **Journal** — write one cross-project rollup journal file per enriched date, each with diff stats. By construction each is THIN: feed it only from the NON-excluded sessions (those not covered by an archived change) plus git diff-stats. For `/implement` work, do NOT re-narrate the openspec change — reference the durable store artifacts (`design.md`/specs already in the kb via the symlink). The journal's role is the cross-project rollup + non-`/implement` activity, not a reconstruction of openspec work. Keep it; just don't duplicate the store.
3. **Profiles** — merge new facts into knowledge-base people and project profiles
4. **Decisions** — add any decisions to the decisions log. Pull key design decisions and rejected alternatives from the durable store's `~/.local/share/kb/openspec/*/changes/archive/<date>-*/design.md` (READ, don't copy — the artifacts are already in the kb). The decisions log is a distilled record anchored to its product/project, not a dump of the design files.
5. **Action items** — extract action items from the enriched window's activity. Cross-reference within the same activity data — if the activity shows you already took the action (replied to the thread, reviewed the PR, closed the issue), skip the reminder. Only create reminders for items that were not resolved within the enriched window.
## Step 3 — Session exclusion (cross-collector dedup)

Before running the `opencode` collector, build the session exclusion set from the `openspec` collector output (priority 0, runs first):

For each date being enriched, read `$KB_ROOT/openspec/*/changes/archive/<date>-*/kb-meta.yaml` and collect every `worktree:` value. This set is passed into the `opencode` collector as the `NOT IN (...)` list. Sessions in excluded worktrees are already covered by the durable OpenSpec change artifacts (`design.md`/specs); they do not need a transcript read.

**Benign failure modes:** a missed match (stale/absent `kb-meta.yaml`) just wastes one transcript read. An over-match just relies on the better, distilled artifact. Neither loses correctness.

## Step 4 — Run collectors

For each date in the enrichment window, run each collector in priority order. Apply the collector's own query recipe, triage rules, and extraction rules exactly as written in its body. Carry the results forward into Step 5.

## Step 5 — Enrichment

### Journal
Write one cross-project rollup journal file per enriched date at `$KB_ROOT/journal/YYYY-MM-DD.md`. By construction each is THIN: feed it only from the non-excluded sessions (not covered by an archived OpenSpec change) plus git diff-stats plus any meeting summaries from collectors. For `/implement` work, do NOT re-narrate the OpenSpec change — reference the durable store artifacts (`design.md`/specs already in the KB via the symlink). The journal's role is the cross-project rollup and non-`/implement` activity, not a reconstruction of OpenSpec work.

### Profiles
Merge new facts into `$KB_ROOT/people/` and `$KB_ROOT/projects/` profiles. Load the `knowledge-base` skill for the canonical profile shape and merge rules.

### Decisions
Add any decisions to the decisions log. Pull key design decisions and rejected alternatives from `$KB_ROOT/openspec/*/changes/archive/<date>-*/design.md` (READ, don't copy — the artifacts are already in the KB). Also extract any decisions surfaced by meeting or chat collectors. The decisions log is a distilled record anchored to its product/project, not a dump of design files or transcripts.

### Action items
Extract action items from the enriched window's activity. Cross-reference within the same activity data — if the activity shows the action was already taken (replied, reviewed, closed, appeared as done in a later meeting), skip the reminder. Only create reminders for items not resolved within the enriched window.

## Privacy

Expand All @@ -54,7 +61,3 @@ Do not extract or store:
- Performance evaluations
- Legal or attorney-client privileged content
- Content from HR-related discussions

## Date range

Enrich the gap since the last run, not a hard single day. The most recent `~/.local/share/kb/journal/YYYY-MM-DD.md` is the last-run marker: enrich each date from (last journal date + 1) through today, inclusive. This makes a Monday run sweep the trailing weekend and lets a skipped run self-heal on the next run. If no prior journal exists, default to today. An explicit date or range in arguments overrides this.
43 changes: 43 additions & 0 deletions dot_config/opencode/kb-collectors/gh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
name: gh
priority: 4

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we're getting a ton from priority?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main place it matters is openspec (0) needing to run before opencode (5) so the session exclusion set is built first. The middle-ground ones (slack, linear, gh) don't have hard ordering requirements. Happy to simplify — could drop priority and just document the openspec-before-opencode dependency explicitly in the orchestrator instead.

authoritative_for: [shipped-code, reviews]

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure i understand how this is used

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's metadata the orchestrator doesn't act on — mainly documentation of what each collector is the source of truth for (to help a reader understand dedup decisions, like why openspec sessions take precedence over opencode session transcripts). Happy to drop it if it feels like overhead without benefit.

description: GitHub PRs you authored or reviewed in the enrichment window
# skip_bots: commit authors / PR actors to ignore
skip_bots: [dependabot]
---
Comment on lines +6 to +8

## Enabled check

Read GitHub orgs from chezmoi data:

```bash
ORGS=$(chezmoi data --format json | jq -r '[.orgs | keys[]] | join(" ")')
```

If `ORGS` is empty, search across all repos you have access to (no org filter). If non-empty, scope searches with an `org:NAME` filter per org.

## How to query

```bash
# PRs you opened or updated
gh search prs --author "@me" --state all --updated ">=YYYY-MM-DD" \
--json number,title,repository,state,updatedAt,body

# PRs you were asked to review
gh search prs --review-requested "@me" --updated ">=YYYY-MM-DD" \
--json number,title,repository,state,updatedAt
```

Add `org:NAME` to each query for every org in `ORGS` (run one query per org, or combine with multiple `org:` terms in a single search string).

## What to extract

- Merged PRs — what shipped
- Review comments that surfaced decisions
- Linked issues

## What to skip

- Draft PRs
- Commits and PRs authored by bots listed in `skip_bots`
50 changes: 50 additions & 0 deletions dot_config/opencode/kb-collectors/linear.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
name: linear
priority: 3
authoritative_for: [tickets, completed-work]
description: Linear issues you touched in the enrichment window
---

## Enabled check

Load the `linear` skill. If no Linear API token is available (the skill cannot authenticate), skip this collector and log "linear: no auth, skipping".

## How to query

Use the Linear GraphQL API (endpoint `https://api.linear.app/graphql`) via the `linear` skill. Query issues assigned to or created by you that were updated within the enrichment window:

```graphql
{
issues(
filter: {
updatedAt: { gte: "YYYY-MM-DDT00:00:00Z" }
or: [
{ assignee: { isMe: { eq: true } } }
{ creator: { isMe: { eq: true } } }
]
}
first: 50
) {
nodes {
identifier
title
state { name }
updatedAt
description
url
}
}
}
```

## What to extract

- Newly created tickets
- Status changes (especially to Done/Completed)
- Decisions captured in descriptions or comments
- Any ticket closed in the window (signals completed work not otherwise visible in git)

## What to skip

- Bot-generated or auto-updated tickets
- Tickets you are only a watcher on with no direct activity
34 changes: 34 additions & 0 deletions dot_config/opencode/kb-collectors/opencode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
name: opencode
priority: 5
authoritative_for: [coding-sessions]
description: opencode coding sessions from the local session store
---

## How to query

Query the session store SQLite database at `~/.local/share/opencode/opencode.db`.

> **Note:** The session exclusion / dedup step is handled by the orchestrator before this collector runs. By the time this collector is called, it receives a list of directories to exclude (sessions covered by archived OpenSpec changes). Apply the `NOT IN (...)` clause below.

```sql
SELECT id, directory, title, time_updated
FROM session
WHERE time_updated BETWEEN :start_ms AND :end_ms
AND directory NOT IN ('/abs/worktree/a', '/abs/worktree/b');
-- The excluded directories are passed in by the orchestrator.
-- Only sessions NOT in the exclusion set get a transcript read.
-- For excluded sessions, the orchestrator narrates from the change's design.md/specs instead.
```

`time_updated` is epoch-milliseconds.

## What to extract

- Work done in sessions not covered by an OpenSpec archived change
- Project context, technical decisions made interactively, approaches tried

## What to skip

- Sessions whose `directory` is in the exclusion set (covered by the durable OpenSpec change artifacts — see the orchestrator's dedup step)
- Sessions with no substantive content (e.g. very short duration, no meaningful tool calls)
44 changes: 44 additions & 0 deletions dot_config/opencode/kb-collectors/openspec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
name: openspec
priority: 0
authoritative_for: [implement-work, design-decisions, rejected-alternatives]
description: OpenSpec durable store — authoritative source for /implement work; read BEFORE other collectors to build the session exclusion set
---

## Why priority 0

This collector runs first. Its primary job is building the **session exclusion set** used by the `opencode` collector — a list of worktree paths whose sessions are already covered by an archived OpenSpec change and should not get a redundant (token-expensive) transcript read.

## How to query

For each date being enriched, read every `$KB_ROOT/openspec/*/changes/archive/<date>-*/kb-meta.yaml`. Collect the `worktree:` value from each. That set is the exclusion list passed to the `opencode` collector.

```bash
# Collect worktrees for a given date
for meta in $KB_ROOT/openspec/*/changes/archive/<date>-*/kb-meta.yaml; do
grep '^worktree:' "$meta" | awk '{print $2}'
done
Comment on lines +18 to +20
```

Then read each archived change's `design.md`:

```
$KB_ROOT/openspec/*/changes/archive/<date>-*/design.md
```

READ these (do not copy them — the artifacts are already in the KB via the symlink). Extract decisions, the "why", and rejected alternatives for the decisions log.

Also read the durable `specs/` for standing requirements:

```
$KB_ROOT/openspec/*/specs/
```

## What to extract

- Decisions, rationale, and rejected alternatives from `design.md` files
- The set of worktree paths (→ exclusion list for the `opencode` collector)

## What to skip

- Re-narrating or duplicating the full design content in the journal — reference the durable store artifacts instead
43 changes: 43 additions & 0 deletions dot_config/opencode/kb-collectors/slack.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
name: slack
priority: 2
authoritative_for: [informal-decisions, action-items, contact-info]
description: Slack messages — your sent messages and mentions, in the enrichment window
---

## How to query

Token is in `~/.config/team-context-mcp/.env` as `SLACK_USER_TOKEN`. Set `SLACK_USER_ID` to your Slack user ID (find it in your Slack profile → "Copy member ID").

```bash
SLACK_TOKEN=$(grep SLACK_USER_TOKEN ~/.config/team-context-mcp/.env | cut -d= -f2)
SLACK_USER_ID="<your-slack-user-id>" # e.g. U01ABC23DEF
Comment on lines +10 to +14

# Your messages in the date window
curl -s "https://slack.com/api/search.messages?query=from:me+after:YYYY-MM-DD+before:YYYY-MM-DD&count=20&sort=timestamp" \
-H "Authorization: Bearer $SLACK_TOKEN"

# Mentions of you in the date window
curl -s "https://slack.com/api/search.messages?query=%3C${SLACK_USER_ID}%3E+after:YYYY-MM-DD+before:YYYY-MM-DD&count=20&sort=timestamp" \
-H "Authorization: Bearer $SLACK_TOKEN"

# Read a thread (get replies)
curl -s "https://slack.com/api/conversations.replies?channel=CHANNEL_ID&ts=THREAD_TS" \
-H "Authorization: Bearer $SLACK_TOKEN"
```

Slack is high-volume; read selectively to keep token cost manageable.

## What to extract

- Decisions announced or confirmed in Slack that didn't appear in a Granola meeting
- Action items assigned to you or by you that aren't already in Linear
- New contact info (Slack handles, email addresses) for people profiles
- Customer or partner names that surfaced in conversation

## What to skip

- Routine standup threads already covered by Granola
- Emoji reactions and short acknowledgments ("👍", "sounds good")
- HR, compensation, or personal channels (privacy)
- Anything already captured from a Granola meeting for the same day