Skip to content

SimonOyaneder/canon

Repository files navigation

Canon

Automated code review that enforces your repo's existing patterns β€” so humans can focus on what matters.

Code reviews are the bottleneck of modern development. Teams ship faster than ever, but every PR still waits in a queue for a senior dev to check naming conventions, duplicated utilities, and style inconsistencies. That's hours of human time spent on things a machine can catch in seconds.

The problem gets worse with new team members. A developer joining a large codebase doesn't know that fetchUser should follow the getEntity pattern, or that there's already a formatCurrency helper three directories away. The result: review bouncebacks, frustration, and slower ramp-up β€” often weeks of back-and-forth before new devs internalize the unwritten rules.

Canon fixes this. It reads your codebase, understands its patterns, and reviews every PR against them β€” automatically. No configuration files to maintain, no linter rules to write. Canon learns directly from the code that already exists.


How it works

  1. A PR is opened in any repo where Canon is installed
  2. Canon fetches the diff and generates embeddings for the changed files
  3. It combines semantic search and lexical matching to find the most relevant existing files in the repo β€” the ones that should inform how the new code is written
  4. A RAG prompt is built with the diff, the similar files as context, team conventions (CANON.md), and any learned rules from past feedback
  5. An LLM analyzes the diff for pattern deviations against the existing codebase
  6. Inline review comments are posted on the exact lines, with one-click suggested fixes
  7. When a human replies to a Canon comment β€” agreeing, correcting, or adding context β€” that feedback is captured and linked to the original finding
  8. Once enough feedback accumulates, Canon distills it into learned rules via LLM: concise team preferences that are injected into all future reviews for that repo

The more your team interacts with Canon, the better it gets. No setup beyond installing the GitHub App β€” Canon works on day one and improves from day two.


See it in action

In this example PR, a new file is submitted with several pattern violations. Canon reviewed it automatically and flagged 4 findings (2 high, 2 medium), requesting changes:

  • πŸ”΄ Duplicated hashing logic β€” reimplements sha256Hash inline instead of importing the existing shared utility from src/lib/hash
  • πŸ”΄ Duplicated concurrency helper β€” rewrites mapWithConcurrency locally instead of using the one from src/lib/concurrency
  • 🟑 Wrong naming convention β€” interfaces use snake_case (file_analysis_result) when the entire codebase uses PascalCase (IndexProgress, RepoParams)
  • 🟑 Catch variable pattern β€” uses err instead of error in transaction catch blocks, breaking the repo's established convention

Each finding is posted as an inline comment on the exact diff line, with a one-click suggested fix. See the full review on the PR.


Key features

Pattern-aware reviews, not generic linting

Canon doesn't check for "best practices" β€” it checks for your practices. If your repo uses camelCase for services and snake_case for DB columns, Canon knows. If you have a retry() wrapper and someone reimplements retry logic, Canon catches it.

One-click suggested fixes

Every finding includes a GitHub suggestion block. Reviewers (or the author) can apply the fix with a single click β€” no copy-paste, no manual edits.

Learns from your team

When a human replies to a Canon comment (agreeing, disagreeing, or clarifying), Canon captures that feedback. After enough feedback accumulates, it distills team preferences into learned rules that improve future reviews. Canon gets smarter the more your team uses it.

CANON.md β€” codify your conventions

Run /canon init on any repo to automatically generate a CANON.md file that documents your team's patterns. Canon samples representative files, analyzes them with an LLM, and opens a PR with proposed conventions. Your team reviews, edits, and merges. From then on, Canon enforces those conventions with higher priority.

Multi-repo, single instance

One Canon deployment serves every repo where the GitHub App is installed. Embeddings are stored per-repo in PostgreSQL with pgvector. No per-repo configuration needed.

Smart review rounds

Canon tracks review iterations per PR. On each subsequent review, the confidence threshold increases β€” only surfacing new, high-confidence findings to avoid noise. Resets automatically or manually with /canon reset.

Automatic index updates

When code is pushed to the default branch, Canon incrementally updates its embeddings index. No manual reindexing needed β€” the similarity search stays current as the codebase evolves.


Commands

Canon responds to comments on PRs:

Command Description
/canon review Re-trigger a review on the current PR
/canon index Full repo re-indexing (regenerate all embeddings)
/canon init Analyze repo patterns and propose a CANON.md via PR
/canon reset Reset review count and trigger a fresh review

Canon reacts with πŸ‘€ when processing and πŸš€ when done.


Tech stack

Component Technology
Runtime Node.js + TypeScript (strict mode)
GitHub integration Probot v14
LLM OpenAI gpt-5.3-codex (structured outputs via Zod)
Embeddings OpenAI text-embedding-3-small
Vector storage PostgreSQL + pgvector (cosine similarity, HNSW index)
Schema validation Zod (structured LLM outputs)
Tests Jest + ts-jest (330 tests, 30 suites) + testcontainers
Containerization Docker Compose

Supports both the Responses API (gpt-5.3-codex, o3, o4-mini) and Chat Completions API (gpt-4o, gpt-4.1, etc.) β€” auto-detected from the model name.


Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         GitHub Events                                 β”‚
β”‚                                                                       β”‚
β”‚ PR Opened  Comment (/canon *)  Push (default branch)  Reply to Canon  β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
       β”‚              β”‚                     β”‚                      β”‚
       β–Ό              β–Ό                     β–Ό                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     src/index.ts            β”‚   β”‚ push-handlerβ”‚    β”‚  feedback-handler    β”‚
β”‚     Event Router            β”‚   β”‚             β”‚    β”‚                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚ Incremental β”‚    β”‚ Links human replies  β”‚
           β”‚                      β”‚ embedding   β”‚    β”‚ to original findings β”‚
           β–Ό                      β”‚ updates     β”‚    β”‚                      β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚ Triggers distillationβ”‚
β”‚     pr-handler.ts           β”‚                      β”‚ when threshold met   β”‚
β”‚     Review Orchestrator     β”‚                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                  β”‚
           β”‚                                                     β–Ό
           β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
           β”‚  β”‚                   PostgreSQL + pgvector                        β”‚
           β”‚  β”‚                                                                β”‚
           β”‚  β”‚  repo_embeddings     β”‚ pr_reviews         β”‚ pr_review_feedback β”‚
           β”‚  β”‚  (owner, repo,       β”‚ pr_review_findings β”‚ repo_learned_rules β”‚
           β”‚  β”‚   file_path,         β”‚                    β”‚                    β”‚
           β”‚  β”‚   embedding,         β”‚                    β”‚                    β”‚
           β”‚  β”‚   content_hash)      β”‚                    β”‚                    β”‚
           β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β”œβ”€β”€β”€ 1. Fetch PR files ──────────────────── github-client.ts
           β”‚
           β”œβ”€β”€β”€ 2. Parse diff ──────────────────────── diff-parser.ts
           β”‚
           β”œβ”€β”€β”€ 3. Generate embeddings ─────────────── embeddings.ts ──► OpenAI Embeddings API
           β”‚
           β”œβ”€β”€β”€ 4. Find similar files ──────────────── similarity.ts ──► pgvector cosine search 
           β”‚
           β”œβ”€β”€β”€ 5. Fetch file contents ─────────────── github-client.ts
           β”‚
           β”œβ”€β”€β”€ 6. Load context ────────────────────── CANON.md + learned rules
           β”‚
           β”œβ”€β”€β”€ 7. Build RAG prompt ────────────────── prompt-builder.ts
           β”‚
           β”œβ”€β”€β”€ 8. LLM analysis ────────────────────── reviewer.ts ────► OpenAI
           β”‚
           β”œβ”€β”€β”€ 9. Post inline comments ────────────── commenter.ts
           β”‚
           └── 10. Persist review ──────────────────── review-store.ts

Feedback loop

Human replies to Canon comment
        β”‚
        β–Ό
feedback-handler.ts ──► storeFeedback() ──► pr_review_feedback table
        β”‚
        β”‚  (when feedback count >= DISTILL_THRESHOLD)
        β–Ό
distiller.ts ──► LLM summarizes feedback into rules
        β”‚
        β–Ό
repo_learned_rules table ──► injected into future review prompts

Getting started

Prerequisites

  • Node.js >= 18
  • npm >= 9
  • OpenAI API key
  • GitHub account with permissions to create GitHub Apps
  • PostgreSQL with pgvector (provided via Docker Compose, or install manually)

1. Clone and install

git clone <repo-url>
cd canon
npm install

2. Create a GitHub App

  1. Go to github.com/settings/apps and click New GitHub App
  2. Fill in:
    • Name: Canon (or any name)
    • Homepage URL: any URL
    • Webhook URL: for local dev, create a channel at smee.io/new and use that URL
    • Webhook secret: generate a random string and save it
  3. Set permissions:
    • Pull requests: Read & Write
    • Contents: Read & Write
    • Issues: Read
  4. Subscribe to events:
    • Pull request
    • Issue comment
    • Push
    • Pull request review comment
  5. Click Create GitHub App
  6. Note the App ID from the app page
  7. Under Private keys, click Generate a private key β€” save the .pem file
  8. Go to Install App and install on the repositories you want Canon to review

3. Configure environment

cp .env.example .env

Required

Variable Description
APP_ID GitHub App ID
PRIVATE_KEY_PATH Path to the .pem file (local dev)
WEBHOOK_SECRET Secret from GitHub App creation
OPENAI_API_KEY OpenAI API key
DATABASE_URL PostgreSQL connection string

In production, use PRIVATE_KEY (full .pem content as string) instead of PRIVATE_KEY_PATH. Probot reads PRIVATE_KEY first.

Optional

Variable Default Description
PORT 3000 Server port
OPENAI_MODEL gpt-5.3-codex Model for reviews
EMBEDDING_MODEL text-embedding-3-small Model for embeddings
MIN_CONFIDENCE 0.9 Minimum confidence to report a finding
MAX_FINDINGS 10 Max findings per review
MAX_REVIEW_ROUNDS 3 Max review iterations per PR
TOP_K_SIMILAR 5 Similar files used as context
MAX_FILES_TO_EMBED 20 Max files to embed per review
MAX_REPO_FILES 10000 Max files to traverse during indexing
MAX_PROMPT_TOKENS 120000 Token budget for the prompt
RETRY_MAX_ATTEMPTS 3 Retries for transient OpenAI errors
RETRY_BASE_DELAY_MS 1000 Base delay for exponential backoff
EMBEDDING_DIMENSIONS auto Override embedding dimensions
DISTILL_THRESHOLD 5 Feedback count before distilling learned rules

4. Run

Local development with Docker

docker compose up
  1. Set WEBHOOK_PROXY_URL in .env to your smee.io channel
  2. The GitHub App's Webhook URL must match the same channel
  3. Open a PR in an installed repo to test

Production

npm run build
npm start

Scripts

Script Description
npm run dev Hot-reload dev server (nodemon watches src/)
npm run build Compile TypeScript to dist/
npm start Run compiled build with Probot
npm run lint Type-check without compiling (tsc --noEmit)
npm test Run all tests (330 tests, 30 suites)
npm run test:integration Integration tests with real PostgreSQL (testcontainers)

Project structure

src/
β”œβ”€β”€ index.ts                     # Entry point β€” routes GitHub events to handlers
β”œβ”€β”€ config/
β”‚   └── index.ts                 # Environment variable loading and validation
β”œβ”€β”€ github/
β”‚   β”œβ”€β”€ github-client.ts         # GitHub API helpers, CanonContext type
β”‚   β”œβ”€β”€ pr-handler.ts            # Review orchestrator (the main flow)
β”‚   β”œβ”€β”€ comment-handler.ts       # /canon commands and @canon mentions
β”‚   β”œβ”€β”€ feedback-handler.ts      # Captures human replies to Canon comments
β”‚   β”œβ”€β”€ init-handler.ts          # /canon init β€” generates CANON.md
β”‚   └── push-handler.ts          # Incremental embedding updates on push
β”œβ”€β”€ review/
β”‚   β”œβ”€β”€ diff-parser.ts           # Parses PR diffs, builds valid line map
β”‚   β”œβ”€β”€ prompt-builder.ts        # RAG prompt with numbered file content
β”‚   β”œβ”€β”€ reviewer.ts              # LLM call with Zod structured outputs
β”‚   └── commenter.ts             # Posts inline comments with deduplication
β”œβ”€β”€ intelligence/
β”‚   β”œβ”€β”€ embeddings.ts            # Generates and stores embeddings (pgvector)
β”‚   β”œβ”€β”€ similarity.ts            # Cosine similarity search
β”‚   β”œβ”€β”€ indexer.ts               # Full repo indexing
β”‚   └── distiller.ts             # Distills feedback into learned rules
β”œβ”€β”€ db/
β”‚   β”œβ”€β”€ migrate.ts               # Schema migration (pgvector setup)
β”‚   β”œβ”€β”€ review-store.ts          # Persists reviews + findings + feedback
β”‚   β”œβ”€β”€ review-counts.ts         # Tracks review rounds per PR
β”‚   └── learned-rules.ts         # Team preference rules storage
└── lib/
    β”œβ”€β”€ concurrency.ts           # Parallel execution with concurrency limit
    β”œβ”€β”€ content-cache.ts         # In-memory file content cache
    β”œβ”€β”€ db-client.ts             # PostgreSQL connection pool
    β”œβ”€β”€ hash.ts                  # SHA256 content hashing
    β”œβ”€β”€ openai-client.ts         # Singleton OpenAI client
    β”œβ”€β”€ openai-errors.ts         # Transient vs fatal error handling
    β”œβ”€β”€ openai-models.ts         # Model type detection
    β”œβ”€β”€ openai-runner.ts         # Dual-API text generation wrapper
    β”œβ”€β”€ retry.ts                 # Exponential backoff with jitter
    └── truncate.ts              # Text and token truncation

tests/                           # Mirrors src/ (30 suites, 330 tests)
tests/integration/               # Real PostgreSQL via testcontainers

Multi-repo

Canon is multi-repo by design. One deployment serves all repos where the GitHub App is installed. Embeddings are stored per-repo in PostgreSQL, keyed by (owner, repo, file_path) with SHA256 content hashing for cache invalidation. Run /canon index on any repo to index it upfront.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors