Skip to content

faxik/codebugs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

codebugs

Persistent code finding & requirements tracker for AI assistants. SQLite-backed, MCP server + CLI.

AI assistants lose context between sessions. codebugs gives them persistent memory for code review findings, bug reports, tech debt, and requirements tracking — with minimal token overhead.

Session 1:  Review code → log 50 findings → forget them
Session 2:  summary → instant orientation → fix 20 → update status
Session 3:  summary → "30 open, 20 fixed" → continue

No context lost. No re-reading files. No token-heavy recaps.

Install

# Global install (recommended)
pipx install codebugs

# Or with pip/uv
pip install codebugs

Setup

Claude Code (MCP)

Add to ~/.claude.json (global) or .mcp.json (per-project):

{
  "mcpServers": {
    "codebugs": {
      "command": "codebugs-mcp"
    }
  }
}

The database lives at .codebugs/findings.db in the current working directory — each project gets its own. Add .codebugs/ to your .gitignore.

Running Modules Independently

Use --mode to load only the tools you need:

{
  "mcpServers": {
    "codebugs": {
      "command": "codebugs-mcp",
      "args": ["--mode", "findings"]
    }
  }
}

Available modes: findings (7 tools), reqs (11 tools), sweep (9 tools), all (27 tools, default).

The CLI supports the same flag: codebugs --mode findings summary.

Other MCP Clients

Any MCP-compatible client can connect to codebugs-mcp via stdio transport.

Usage

MCP Tools (for AI assistants)

Findings (code review, bugs, tech debt):

Tool Purpose
summary Dashboard overview — start here for orientation
add Log a finding with severity, category, file, description
batch_add Log multiple findings at once
update Change status, add notes, update tags or metadata
query Search/filter with pagination and group-by
stats Cross-tabulated counts (severity x category/file/status)
categories List existing categories — call before add for consistency

Requirements (specification tracking):

Tool Purpose
reqs_summary Requirements dashboard — start here
reqs_add Add a requirement (FR-001, priority, status, test coverage)
reqs_update Change status, description, priority, test coverage
reqs_query Search/filter by status, priority, section, free text
reqs_stats Cross-tabulated counts (status x priority)
reqs_verify Automated checks: ghost test files, duplicate IDs, status contradictions
reqs_import Import from REQUIREMENTS.md (parses markdown tables)
reqs_embed Store an embedding vector for a requirement
reqs_batch_embed Store embeddings for multiple requirements
reqs_search_similar Semantic search across requirements by cosine similarity
reqs_embedding_stats Report on embedding coverage

Sweeps (batch iteration with recurrence-aware lifecycle tracking):

Tool Purpose
codesweep_create Create a new sweep. Optional lifecycle=[...], terminal_states=[...], transitions={state: [allowed_next, ...]} for state-machine sweeps (defaults to ["pending","done"])
codesweep_add Add items. Atomic upsert: existing items have recurrence_count bumped, last_seen refreshed, archived_at cleared (re-detection un-archives)
codesweep_next Get next batch of unprocessed (non-terminal, non-archived) items
codesweep_mark Mark items by state. Legacy processed=True/False still works; state="..." for explicit transitions, validated against the lifecycle DAG
codesweep_status Progress overview — total, processed, remaining, archived, per-tag and per-state breakdowns
codesweep_archive Archive an entire sweep
codesweep_archive_items Selectively archive entries (soft-delete) by items=, where_status=, or older_than="30d". Re-adding an archived entry un-archives it with recurrence carried forward
codesweep_list_items List entries in a sweep, filterable by state/tag/archived
codesweep_list List all sweeps with summary counts

CLI (for humans)

Findings:

# Add a finding
codebugs add -s high -c n_plus_one -f src/api.py -d "Query in loop at line 42"

# Dashboard
codebugs summary

# Search
codebugs query --status open --severity critical
codebugs query --group-by file
codebugs query --category n_plus_one

# Update
codebugs update CB-1 --status fixed --notes "Fixed in PR #42"

# Check categories before adding (avoids inconsistent naming)
codebugs categories

# Import/export
codebugs import-csv findings.csv
codebugs export-csv

Requirements:

# Import from existing REQUIREMENTS.md
codebugs reqs-import REQUIREMENTS.md

# Dashboard
codebugs reqs-summary

# Verify — find ghost test files, duplicate IDs, status contradictions
codebugs reqs-verify
codebugs reqs-verify --checks tests,status --project-dir /path/to/project

# Search
codebugs reqs-query --status Implemented --priority Must
codebugs reqs-query --search "entity" --group-by section

# Update
codebugs reqs-update FR-090 --status Superseded --notes "Replaced by vault architecture"

# Add
codebugs reqs-add FR-700 -d "System shall support licensing" --section "1.72 Licensing" --priority Must

# Export back to markdown
codebugs reqs-export REQUIREMENTS.md

Sweeps:

# Create a sweep
codebugs sweep-create --name lint-pass --batch-size 5

# Add items (with optional tags)
codebugs sweep-add lint-pass src/*.py --tags critical
codebugs sweep-add lint-pass tests/*.py --tags test

# Iterate in batches
codebugs sweep-next lint-pass
codebugs sweep-next lint-pass --limit 10 --tags critical

# Mark items as processed
codebugs sweep-mark lint-pass src/api.py src/db.py

# Undo a mark
codebugs sweep-mark lint-pass src/api.py --undo

# Check progress
codebugs sweep-status lint-pass

# Archive when done
codebugs sweep-archive lint-pass

# List active sweeps
codebugs sweep-list
codebugs sweep-list --all  # include archived

Sweeps with custom lifecycle (e.g. retro findings):

# Create a sweep with a state-machine lifecycle
codebugs sweep-create --name retro-findings \
    --lifecycle DETECTED,CONFIRMED,ESCALATED,RESOLVED,DROPPED \
    --terminal-states RESOLVED,DROPPED

# Add a finding (re-adding bumps recurrence_count)
codebugs sweep-add retro-findings finding-2026-04-todo-bypassed --tags silent_abandonment

# Transition state explicitly
codebugs sweep-mark retro-findings finding-2026-04-todo-bypassed --state CONFIRMED
codebugs sweep-mark retro-findings finding-2026-04-todo-bypassed --state RESOLVED

# Selectively archive resolved findings older than 30 days (soft-delete —
# re-detection in a future retro un-archives the entry with recurrence carried forward)
codebugs sweep-archive-items retro-findings --state RESOLVED --older-than 30d

# Inspect entries (filter by state/tag, include archived)
codebugs sweep-list-items retro-findings --state RESOLVED
codebugs sweep-list-items retro-findings --archived-only

How It Works

The Problem

AI code review sessions produce findings that get lost:

  • Findings live in chat context → gone when the session ends
  • Re-reading files wastes tokens on re-discovery
  • No way to track progress across sessions

The Solution

codebugs stores findings in a local SQLite database. AI assistants write findings as they discover them, then query the database in future sessions for instant context recovery.

Token savings: A summary call returns a structured JSON overview in ~200 tokens. Without codebugs, re-establishing the same context costs 2,000-10,000+ tokens of file reading and conversation history.

Typical Workflow

  1. Review: AI reviews code, calls categories to check naming, then add for each finding
  2. Fix: Next session, AI calls summary → sees 50 open findings → query --severity critical → fixes the worst ones → update each as fixed
  3. Track: Over time, categories reveals patterns — "12 tz_naive_datetime fixed across 9 files → time for a lint rule"

Schema

Both tables share the same SQLite database (.codebugs/findings.db) with flexible JSON columns.

Findings

Field Type Description
id text Auto-generated (CB-1, CB-2, ...) or user-provided
severity text critical, high, medium, low
category text User-defined (e.g. n_plus_one, missing_validation)
file text File path relative to project root
status text open, fixed, not_a_bug, wont_fix, stale
description text What's wrong
source text Who created it (claude, ruff, human, mypy, ...)
tags json Array of strings for ad-hoc grouping
meta json Anything else: lines, module, rule_code, cwe_id, ...

Requirements

Field Type Description
id text User-provided (FR-001, FR-002, ...)
section text Grouping (e.g. 1.10 Document Sorting)
description text What the system shall do
priority text Must, Should, Could
status text Planned, Partial, Implemented, Verified, Superseded, Obsolete
source text Origin (e.g. Take 26, NEW, R&A)
test_coverage text Test file name(s)
embedding blob Optional float32 vector for semantic search
tags json Array of strings
meta json Anything else: notes, superseded_by, ...

Sweeps

Field Type Description
sweep_id text Auto-generated (SW-1, SW-2, ...)
name text Optional human-readable name (unique)
description text What this sweep is for
default_batch_size int Default items per batch (default: 10)
status text active, archived
lifecycle json Ordered list of allowed entry states (default ["pending","done"])
terminal_states json States that count as "processed" (default ["done"])
transitions json Optional {state: [allowed_next, ...]} DAG. null = unconstrained

Sweep items:

Field Type Description
item text Arbitrary string identifier — also the stable key. Re-adding bumps recurrence
tags json Array of strings for filtering
state text Current state (must be in the sweep's lifecycle)
processed int 0 or 1, mirrors state IN terminal_states
recurrence_count int Bumped atomically on every re-add (R2)
first_seen / last_seen text Timestamps; last_seen updates on each re-add
archived_at / archive_reason text Soft-delete metadata. Archived entries are excluded from next_batch/status but still match on add for un-archive (R5)
position int Insertion order within the sweep

Pattern Detection

The killer feature for findings emerges over time. Categories reveal systemic issues:

$ codebugs categories
category                        total  open  fixed
tz_naive_datetime                  15     3     12
n_plus_one                          8     2      6
missing_input_validation            6     4      2

If you keep fixing the same category → it's time for a lint rule, pre-commit check, or architectural fix. codebugs turns reactive bug-fixing into proactive prevention.

Requirements Verification

The killer feature for requirements is reqs-verify — automated detection of documentation rot:

$ codebugs reqs-verify
Verified 683 requirements.

12 issue(s) found:

check   sev       id      message
------  --------  ------  --------------------------------------------------
tests   high      FR-350  Test file not found: test_entity_graph.py
tests   high      FR-351  Test file not found: test_entity_graph.py
status  high      FR-090  Description mentions 'superseded' but status is 'Planned'
status  medium    FR-006  Must-priority requirement implemented without test coverage
ids     medium    --      Numbering gaps (5+): FR-025..FR-029, FR-316..FR-329

Run it after any documentation change to catch contradictions before they become misleading.

Semantic Requirements Search

Store embeddings for requirements to enable semantic search — find related requirements even when the wording is different:

# Via MCP: store embeddings (caller generates vectors via embedding API)
reqs_embed(req_id="FR-001", embedding=[0.1, 0.2, ...])
reqs_batch_embed(embeddings={"FR-001": [...], "FR-002": [...]})

# Search for similar requirements
reqs_search_similar(query_embedding=[0.1, 0.2, ...], limit=5, min_similarity=0.3)

Embeddings are stored as float32 BLOBs in the same SQLite database. Search uses brute-force cosine similarity — fast enough for thousands of requirements.

Requirements

  • Python 3.11+
  • No external dependencies beyond the MCP SDK (for the server)
  • SQLite (bundled with Python)

License

MIT

About

AI-native code finding tracker — SQLite-backed, MCP server + CLI

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages