Automated code review that enforces your repo's existing patterns β so humans can focus on what matters.
Code reviews are the bottleneck of modern development. Teams ship faster than ever, but every PR still waits in a queue for a senior dev to check naming conventions, duplicated utilities, and style inconsistencies. That's hours of human time spent on things a machine can catch in seconds.
The problem gets worse with new team members. A developer joining a large codebase doesn't know that fetchUser should follow the getEntity pattern, or that there's already a formatCurrency helper three directories away. The result: review bouncebacks, frustration, and slower ramp-up β often weeks of back-and-forth before new devs internalize the unwritten rules.
Canon fixes this. It reads your codebase, understands its patterns, and reviews every PR against them β automatically. No configuration files to maintain, no linter rules to write. Canon learns directly from the code that already exists.
- A PR is opened in any repo where Canon is installed
- Canon fetches the diff and generates embeddings for the changed files
- It combines semantic search and lexical matching to find the most relevant existing files in the repo β the ones that should inform how the new code is written
- A RAG prompt is built with the diff, the similar files as context, team conventions (
CANON.md), and any learned rules from past feedback - An LLM analyzes the diff for pattern deviations against the existing codebase
- Inline review comments are posted on the exact lines, with one-click suggested fixes
- When a human replies to a Canon comment β agreeing, correcting, or adding context β that feedback is captured and linked to the original finding
- Once enough feedback accumulates, Canon distills it into learned rules via LLM: concise team preferences that are injected into all future reviews for that repo
The more your team interacts with Canon, the better it gets. No setup beyond installing the GitHub App β Canon works on day one and improves from day two.
In this example PR, a new file is submitted with several pattern violations. Canon reviewed it automatically and flagged 4 findings (2 high, 2 medium), requesting changes:
- π΄ Duplicated hashing logic β reimplements
sha256Hashinline instead of importing the existing shared utility fromsrc/lib/hash - π΄ Duplicated concurrency helper β rewrites
mapWithConcurrencylocally instead of using the one fromsrc/lib/concurrency - π‘ Wrong naming convention β interfaces use
snake_case(file_analysis_result) when the entire codebase uses PascalCase (IndexProgress,RepoParams) - π‘ Catch variable pattern β uses
errinstead oferrorin transaction catch blocks, breaking the repo's established convention
Each finding is posted as an inline comment on the exact diff line, with a one-click suggested fix. See the full review on the PR.
Canon doesn't check for "best practices" β it checks for your practices. If your repo uses camelCase for services and snake_case for DB columns, Canon knows. If you have a retry() wrapper and someone reimplements retry logic, Canon catches it.
Every finding includes a GitHub suggestion block. Reviewers (or the author) can apply the fix with a single click β no copy-paste, no manual edits.
When a human replies to a Canon comment (agreeing, disagreeing, or clarifying), Canon captures that feedback. After enough feedback accumulates, it distills team preferences into learned rules that improve future reviews. Canon gets smarter the more your team uses it.
Run /canon init on any repo to automatically generate a CANON.md file that documents your team's patterns. Canon samples representative files, analyzes them with an LLM, and opens a PR with proposed conventions. Your team reviews, edits, and merges. From then on, Canon enforces those conventions with higher priority.
One Canon deployment serves every repo where the GitHub App is installed. Embeddings are stored per-repo in PostgreSQL with pgvector. No per-repo configuration needed.
Canon tracks review iterations per PR. On each subsequent review, the confidence threshold increases β only surfacing new, high-confidence findings to avoid noise. Resets automatically or manually with /canon reset.
When code is pushed to the default branch, Canon incrementally updates its embeddings index. No manual reindexing needed β the similarity search stays current as the codebase evolves.
Canon responds to comments on PRs:
| Command | Description |
|---|---|
/canon review |
Re-trigger a review on the current PR |
/canon index |
Full repo re-indexing (regenerate all embeddings) |
/canon init |
Analyze repo patterns and propose a CANON.md via PR |
/canon reset |
Reset review count and trigger a fresh review |
Canon reacts with π when processing and π when done.
| Component | Technology |
|---|---|
| Runtime | Node.js + TypeScript (strict mode) |
| GitHub integration | Probot v14 |
| LLM | OpenAI gpt-5.3-codex (structured outputs via Zod) |
| Embeddings | OpenAI text-embedding-3-small |
| Vector storage | PostgreSQL + pgvector (cosine similarity, HNSW index) |
| Schema validation | Zod (structured LLM outputs) |
| Tests | Jest + ts-jest (330 tests, 30 suites) + testcontainers |
| Containerization | Docker Compose |
Supports both the Responses API (gpt-5.3-codex, o3, o4-mini) and Chat Completions API (gpt-4o, gpt-4.1, etc.) β auto-detected from the model name.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GitHub Events β
β β
β PR Opened Comment (/canon *) Push (default branch) Reply to Canon β
ββββββββ¬βββββββββββββββ¬ββββββββββββββββββββββ¬βββββββββββββββββββββββ¬βββββ
β β β β
βΌ βΌ βΌ βΌ
βββββββββββββββββββββββββββββββ βββββββββββββββ ββββββββββββββββββββββββ
β src/index.ts β β push-handlerβ β feedback-handler β
β Event Router β β β β β
ββββββββββββ¬βββββββββββββββββββ β Incremental β β Links human replies β
β β embedding β β to original findings β
βΌ β updates β β β
βββββββββββββββββββββββββββββββ βββββββββββββββ β Triggers distillationβ
β pr-handler.ts β β when threshold met β
β Review Orchestrator β βββββββββββββ¬βββββββββββ
ββββββββββββ¬βββββββββββββββββββ β
β βΌ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β PostgreSQL + pgvector β
β β β
β β repo_embeddings β pr_reviews β pr_review_feedback β
β β (owner, repo, β pr_review_findings β repo_learned_rules β
β β file_path, β β β
β β embedding, β β β
β β content_hash) β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββ 1. Fetch PR files ββββββββββββββββββββ github-client.ts
β
ββββ 2. Parse diff ββββββββββββββββββββββββ diff-parser.ts
β
ββββ 3. Generate embeddings βββββββββββββββ embeddings.ts βββΊ OpenAI Embeddings API
β
ββββ 4. Find similar files ββββββββββββββββ similarity.ts βββΊ pgvector cosine search
β
ββββ 5. Fetch file contents βββββββββββββββ github-client.ts
β
ββββ 6. Load context ββββββββββββββββββββββ CANON.md + learned rules
β
ββββ 7. Build RAG prompt ββββββββββββββββββ prompt-builder.ts
β
ββββ 8. LLM analysis ββββββββββββββββββββββ reviewer.ts βββββΊ OpenAI
β
ββββ 9. Post inline comments ββββββββββββββ commenter.ts
β
βββ 10. Persist review ββββββββββββββββββββ review-store.ts
Human replies to Canon comment
β
βΌ
feedback-handler.ts βββΊ storeFeedback() βββΊ pr_review_feedback table
β
β (when feedback count >= DISTILL_THRESHOLD)
βΌ
distiller.ts βββΊ LLM summarizes feedback into rules
β
βΌ
repo_learned_rules table βββΊ injected into future review prompts
- Node.js >= 18
- npm >= 9
- OpenAI API key
- GitHub account with permissions to create GitHub Apps
- PostgreSQL with pgvector (provided via Docker Compose, or install manually)
git clone <repo-url>
cd canon
npm install- Go to github.com/settings/apps and click New GitHub App
- Fill in:
- Name: Canon (or any name)
- Homepage URL: any URL
- Webhook URL: for local dev, create a channel at smee.io/new and use that URL
- Webhook secret: generate a random string and save it
- Set permissions:
- Pull requests: Read & Write
- Contents: Read & Write
- Issues: Read
- Subscribe to events:
- Pull request
- Issue comment
- Push
- Pull request review comment
- Click Create GitHub App
- Note the App ID from the app page
- Under Private keys, click Generate a private key β save the
.pemfile - Go to Install App and install on the repositories you want Canon to review
cp .env.example .env| Variable | Description |
|---|---|
APP_ID |
GitHub App ID |
PRIVATE_KEY_PATH |
Path to the .pem file (local dev) |
WEBHOOK_SECRET |
Secret from GitHub App creation |
OPENAI_API_KEY |
OpenAI API key |
DATABASE_URL |
PostgreSQL connection string |
In production, use
PRIVATE_KEY(full.pemcontent as string) instead ofPRIVATE_KEY_PATH. Probot readsPRIVATE_KEYfirst.
| Variable | Default | Description |
|---|---|---|
PORT |
3000 |
Server port |
OPENAI_MODEL |
gpt-5.3-codex |
Model for reviews |
EMBEDDING_MODEL |
text-embedding-3-small |
Model for embeddings |
MIN_CONFIDENCE |
0.9 |
Minimum confidence to report a finding |
MAX_FINDINGS |
10 |
Max findings per review |
MAX_REVIEW_ROUNDS |
3 |
Max review iterations per PR |
TOP_K_SIMILAR |
5 |
Similar files used as context |
MAX_FILES_TO_EMBED |
20 |
Max files to embed per review |
MAX_REPO_FILES |
10000 |
Max files to traverse during indexing |
MAX_PROMPT_TOKENS |
120000 |
Token budget for the prompt |
RETRY_MAX_ATTEMPTS |
3 |
Retries for transient OpenAI errors |
RETRY_BASE_DELAY_MS |
1000 |
Base delay for exponential backoff |
EMBEDDING_DIMENSIONS |
auto | Override embedding dimensions |
DISTILL_THRESHOLD |
5 |
Feedback count before distilling learned rules |
docker compose up- Set
WEBHOOK_PROXY_URLin.envto your smee.io channel - The GitHub App's Webhook URL must match the same channel
- Open a PR in an installed repo to test
npm run build
npm start| Script | Description |
|---|---|
npm run dev |
Hot-reload dev server (nodemon watches src/) |
npm run build |
Compile TypeScript to dist/ |
npm start |
Run compiled build with Probot |
npm run lint |
Type-check without compiling (tsc --noEmit) |
npm test |
Run all tests (330 tests, 30 suites) |
npm run test:integration |
Integration tests with real PostgreSQL (testcontainers) |
src/
βββ index.ts # Entry point β routes GitHub events to handlers
βββ config/
β βββ index.ts # Environment variable loading and validation
βββ github/
β βββ github-client.ts # GitHub API helpers, CanonContext type
β βββ pr-handler.ts # Review orchestrator (the main flow)
β βββ comment-handler.ts # /canon commands and @canon mentions
β βββ feedback-handler.ts # Captures human replies to Canon comments
β βββ init-handler.ts # /canon init β generates CANON.md
β βββ push-handler.ts # Incremental embedding updates on push
βββ review/
β βββ diff-parser.ts # Parses PR diffs, builds valid line map
β βββ prompt-builder.ts # RAG prompt with numbered file content
β βββ reviewer.ts # LLM call with Zod structured outputs
β βββ commenter.ts # Posts inline comments with deduplication
βββ intelligence/
β βββ embeddings.ts # Generates and stores embeddings (pgvector)
β βββ similarity.ts # Cosine similarity search
β βββ indexer.ts # Full repo indexing
β βββ distiller.ts # Distills feedback into learned rules
βββ db/
β βββ migrate.ts # Schema migration (pgvector setup)
β βββ review-store.ts # Persists reviews + findings + feedback
β βββ review-counts.ts # Tracks review rounds per PR
β βββ learned-rules.ts # Team preference rules storage
βββ lib/
βββ concurrency.ts # Parallel execution with concurrency limit
βββ content-cache.ts # In-memory file content cache
βββ db-client.ts # PostgreSQL connection pool
βββ hash.ts # SHA256 content hashing
βββ openai-client.ts # Singleton OpenAI client
βββ openai-errors.ts # Transient vs fatal error handling
βββ openai-models.ts # Model type detection
βββ openai-runner.ts # Dual-API text generation wrapper
βββ retry.ts # Exponential backoff with jitter
βββ truncate.ts # Text and token truncation
tests/ # Mirrors src/ (30 suites, 330 tests)
tests/integration/ # Real PostgreSQL via testcontainers
Canon is multi-repo by design. One deployment serves all repos where the GitHub App is installed. Embeddings are stored per-repo in PostgreSQL, keyed by (owner, repo, file_path) with SHA256 content hashing for cache invalidation. Run /canon index on any repo to index it upfront.