feat(github): connect commit SHAs in traces to GitHub via a GitHub App#50
feat(github): connect commit SHAs in traces to GitHub via a GitHub App#50JeremyFunk wants to merge 8 commits into
Conversation
Adds a full GitHub integration so the deployment.commit_sha attribute on spans resolves to author + message + GitHub deep link in the UI. Supports multi-installation across GH orgs/users, syncs all branches, and works out of the box on both Cloud and self-hosted deploys. New infra: - Cloudflare Queue + 6h cron, declared in apps/api/alchemy.run.ts so alchemy deploy provisions everything idempotently - 5 D1 tables: github_installations, github_repositories, github_commits, github_releases (groundwork), github_unresolved_shas (DLQ-by-DB) - 4 services: GithubAppJwtService (RS256 JWT + HMAC verify), GithubInstallationClient (REST + pagination + rate-limit handling), GithubAppService (install/disconnect/list + commit-count enrichment), GithubSyncService (backfill, push, resolve-unknown-sha, reconcile) - New HTTP groups: integrations.github.* and commits.* UI: - CommitChip with hover-card (author avatar, message, GitHub link) - AttributeRow auto-detects 7 commit-SHA attribute keys and renders CommitChip — high leverage, applies everywhere - Service filter sidebar, services table, span detail, trace header all wired to use CommitChip - Settings card: connect GH App, manage repos, live status + commit count Test coverage: 87 new tests, 8 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This run croaked 😵 The workflow encountered an error before any progress could be reported. Please check the link below for details. |
Ingest Rust Test + Benchmark ResultsCommit: Load Benchmark —
|
| Metric | main (median) | PR (median) | Delta |
|---|---|---|---|
| Requests/sec | 2945.33 | 2685.12 | -8.8% worse |
| Rows/sec | 29453.27 | 26851.25 | -8.8% worse |
| p50 latency | 21.63 ms | 22.29 ms | +3.0% worse |
| p95 latency | 38.34 ms | 24.86 ms | -35.2% better |
| p99 latency | 39.92 ms | 43.48 ms | +8.9% worse |
| Export catch-up | 0.026 s | 0.026 s | -0.4% better |
| Max RSS | 100.58 MiB | 99.48 MiB | -1.1% better |
| Failures | 0 | 0 | same |
Same code path on both sides (same LOAD_TEST_INGEST_MODE), so the delta column is meaningful. Numbers come from ubuntu-latest, which is noisy — treat single-digit-percent deltas as noise.
PR load benchmark JSON (per-iteration)
[
{
"ingest_mode": "tinybird",
"requests": 2000,
"successes": 2000,
"failures": 0,
"rows_sent": 20000,
"rows_exported": 20000,
"imports": 23,
"duration_seconds": 0.744844325,
"export_catchup_seconds": 0.025920767,
"request_rps": 2685.1248413552726,
"row_rps": 26851.248413552727,
"p50_ms": 23.132,
"p95_ms": 26.508,
"p99_ms": 43.48,
"max_rss_mb": 100.0859375,
"max_cpu_percent": 82.1,
"avg_cpu_percent": 57.699999999999996
},
{
"ingest_mode": "tinybird",
"requests": 2000,
"successes": 2000,
"failures": 0,
"rows_sent": 20000,
"rows_exported": 20000,
"imports": 22,
"duration_seconds": 0.702632,
"export_catchup_seconds": 0.026215542,
"request_rps": 2846.4402418335626,
"row_rps": 28464.402418335627,
"p50_ms": 22.29,
"p95_ms": 24.703,
"p99_ms": 41.409,
"max_rss_mb": 98.80859375,
"max_cpu_percent": 87.5,
"avg_cpu_percent": 60.4
},
{
"ingest_mode": "tinybird",
"requests": 2000,
"successes": 2000,
"failures": 0,
"rows_sent": 20000,
"rows_exported": 20000,
"imports": 23,
"duration_seconds": 0.798030002,
"export_catchup_seconds": 0.025834487,
"request_rps": 2506.171440907807,
"row_rps": 25061.71440907807,
"p50_ms": 22.164,
"p95_ms": 24.855,
"p99_ms": 118.015,
"max_rss_mb": 99.48046875,
"max_cpu_percent": 73.2,
"avg_cpu_percent": 44.900000000000006
}
]main load benchmark JSON (per-iteration)
[
{
"ingest_mode": "tinybird",
"requests": 2000,
"successes": 2000,
"failures": 0,
"rows_sent": 20000,
"rows_exported": 20000,
"imports": 23,
"duration_seconds": 0.780579834,
"export_catchup_seconds": 0.025566511,
"request_rps": 2562.197885322259,
"row_rps": 25621.97885322259,
"p50_ms": 22.313,
"p95_ms": 40.482,
"p99_ms": 51.556,
"max_rss_mb": 101.8046875,
"max_cpu_percent": 76.7,
"avg_cpu_percent": 55.0
},
{
"ingest_mode": "tinybird",
"requests": 2000,
"successes": 2000,
"failures": 0,
"rows_sent": 20000,
"rows_exported": 20000,
"imports": 22,
"duration_seconds": 0.679041707,
"export_catchup_seconds": 0.026033005,
"request_rps": 2945.3271859190822,
"row_rps": 29453.271859190823,
"p50_ms": 21.631,
"p95_ms": 22.655,
"p99_ms": 39.01,
"max_rss_mb": 100.578125,
"max_cpu_percent": 89.2,
"avg_cpu_percent": 61.25
},
{
"ingest_mode": "tinybird",
"requests": 2000,
"successes": 2000,
"failures": 0,
"rows_sent": 20000,
"rows_exported": 20000,
"imports": 21,
"duration_seconds": 0.634388481,
"export_catchup_seconds": 0.026565177,
"request_rps": 3152.642363315547,
"row_rps": 31526.42363315547,
"p50_ms": 19.987,
"p95_ms": 38.34,
"p99_ms": 39.916,
"max_rss_mb": 99.50390625,
"max_cpu_percent": 94.6,
"avg_cpu_percent": 55.599999999999994
}
]WAL-acked microbench (cargo bench --bench ingest_bench)
Compiling maple-ingest v0.1.0 (/home/runner/work/maple/maple/apps/ingest)
Finished `bench` profile [optimized] target(s) in 31.14s
Running benches/ingest_bench.rs (target/release/deps/ingest_bench-9a4eb1301687a1fe)
Gnuplot not found, using plotters backend
test ingest_accept/logs_10_rows_wal_ack ... bench: 355805 ns/iter (+/- 12531)
test ingest_accept/traces_10_spans_wal_ack ... bench: 392083 ns/iter (+/- 51436)
cargo test
Updating crates.io index
Compiling maple-ingest v0.1.0 (/home/runner/work/maple/maple/apps/ingest)
Finished `test` profile [unoptimized + debuginfo] target(s) in 12.04s
Running unittests src/lib.rs (target/debug/deps/maple_ingest-930f70660f5b2a30)
running 22 tests
test telemetry::tests::apply_attribute_mappings_rewrites_span_attributes ... ok
test telemetry::tests::hex_empty_for_zero_ids ... ok
test otel::tests::build_resource_sets_runtime_and_sdk_type ... ok
test telemetry::tests::log_encoder_matches_tinybird_row_shape ... ok
test telemetry::tests::logs_emit_exactly_the_jsonpaths_declared_in_datasources_ts ... ok
test telemetry::tests::logs_use_observed_time_when_time_unix_nano_is_zero ... ok
test telemetry::tests::logs_severity_text_falls_back_to_mapped_number ... ok
test telemetry::tests::custom_datasource_names_propagate_to_frames ... ok
test telemetry::tests::metrics_summary_data_points_are_dropped ... ok
test telemetry::tests::metric_encoder_matches_all_tinybird_datasource_shapes ... ok
test telemetry::tests::metrics_emit_exactly_the_jsonpaths_declared_in_datasources_ts ... ok
test telemetry::tests::sampling_keeps_errors_even_when_ratio_low ... ok
test telemetry::tests::timestamp_has_nano_precision ... ok
test telemetry::tests::timestamps_match_clickhouse_datetime64_nine_format ... ok
test telemetry::tests::trace_encoder_matches_tinybird_row_shape ... ok
test telemetry::tests::traces_emit_exactly_the_jsonpaths_declared_in_datasources_ts ... ok
test telemetry::tests::wal_partial_drain_advances_cursor_without_truncating ... ok
test telemetry::tests::wal_round_trips_frame ... ok
test telemetry::tests::wal_truncates_after_full_drain_allowing_further_appends ... ok
test telemetry::tests::pipeline_e2e_exports_gzip_ndjson_to_fake_tinybird ... ok
test telemetry::tests::pipeline_e2e_exports_metrics_to_fake_tinybird ... ok
test telemetry::tests::pipeline_e2e_exports_traces_to_fake_tinybird ... ok
test result: ok. 22 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.09s
Running unittests src/bin/load_test.rs (target/debug/deps/load_test-cc7d714b800573a7)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/main.rs (target/debug/deps/maple_ingest-68405942ee8a90b6)
running 18 tests
test tests::cloudflare_validation_payload_is_detected ... ok
test tests::cloudflare_log_record_maps_body_severity_and_attributes ... ok
test tests::cloudflare_timestamps_support_rfc3339_unix_and_unix_nano ... ok
test tests::cloudflare_ndjson_payload_parses_multiple_records ... ok
test tests::d1_response_parses_empty_results_as_no_match ... ok
test tests::d1_response_parses_failure_with_errors ... ok
test tests::d1_truthy_accepts_int_and_bool_self_managed ... ok
test tests::d1_response_parses_success_with_rows ... ok
test tests::extract_ingest_key_returns_sentinel_literal_unchanged ... ok
test tests::enrichment_overwrites_tenant_fields ... ok
test tests::hash_is_deterministic ... ok
test tests::non_self_managed_goes_to_shared_pool ... ok
test tests::resolve_ingest_key_returns_none_when_hash_missing ... ok
test tests::resolve_ingest_key_returns_self_managed_true_when_active_settings_row ... ok
test tests::resolve_ingest_key_returns_self_managed_false_when_no_settings_row ... ok
test tests::self_managed_degrades_to_shared_when_endpoint_unset ... ok
test tests::sentinel_token_matches_only_exact_literal ... ok
test tests::self_managed_goes_to_self_managed_pool_when_configured ... ok
test result: ok. 18 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests maple_ingest
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
The pendingConnect state was only cleared by the postMessage from the callback page. Users who dismissed the popup without completing the install were stuck showing 'Connecting…' indefinitely. Now polls the popup's closed state every 500ms and clears the pill on close. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a parallel dev:no-portless script to each app and a root-level runner. Useful when an external tool (cloudflared tunnel, ngrok, a mobile client, an integration test) needs a stable local URL — portless assigns subdomain ports dynamically which breaks those workflows. Run 'bun dev' for the existing portless URLs, or 'bun dev:no-portless' for fixed-port local URLs: web http://localhost:3471 api http://localhost:3472 ingest http://localhost:3473 chat-agent http://localhost:8787 alerting http://localhost:8788 landing http://localhost:3391 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- GithubInstallationClient: extract expectOk + parseJson helpers, drop ~150 lines of duplicated fetch/decode boilerplate across the 6 endpoint methods. - GithubSyncService: insertCommits now takes an optional branches list and is used by both backfill and webhook-push paths. Removes the duplicated upsert SET block. Adds touchRepoSynced helper. - worker.ts: collapse the queue and scheduled handlers into a single runBackground call via a logged() wrapper. ManagedRuntime lifecycle + telemetry.flush is now centralized. - github-webhook.http.ts: drop redundant queue enqueue after inline reconcile/push — the inline call already does the work, the queue job was duplicate. Cron sweep still catches missed deliveries. - github-callback.http.ts: actually trigger commit backfill after a fresh install. Previously the install reconcile created the repo rows but never enqueued BackfillRepo jobs, so commits only appeared after a manual click of Backfill or the 6h cron sweep. - cron-scheduler.ts: drop unused DatabaseClientLike re-export. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Author meta collapsed to one line with @login + relative time (Intl.RelativeTimeFormat, full timestamp on hover title). - Commit message gets its own block (line-clamp-3, text-sm) and a subtle divider before the metadata row. - Repo path is now its own link to the GitHub repo (separate from the commit link). Branches render as pills with +N truncation. - Loading state is a matching skeleton instead of a single 'Looking up…' string. Unresolved state shares the avatar-block layout for consistency. - Card widened slightly to fit the date + branches row without wrap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The HoverCardTrigger was rendering with the default text-edit cursor. Add cursor-pointer so it's clear the chip is interactive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three correlated changes that reduce GitHub API pressure substantially: 1. Webhook push: use the embedded commit data The `push` event payload already contains commit message, author, committer, timestamp, and url per commit. We were throwing all of that away and calling `getCommit` once per SHA — 20 commits in a push meant 20 API calls. Now small pushes use the payload data directly (zero calls). We lose avatar URLs and numeric user IDs in the fast path; the chip falls back to author initials, and the 6h reconcile fills in the rich data later. 2. Webhook push: enqueue when commits.length > 5 or forced Force-pushes / branch-creates can touch up to 250 commits via the compareRefs fallback. Doing that inline blows GitHub's 10s webhook timeout. Above the threshold we now enqueue a SyncWebhookPush job; the consumer makes a single compareRefs call (returns full rich data, not just SHAs) and upserts everything in one pass. 3. runResolveUnknownSha: GitHub's commit search API Was: iterate every connected repo, call `getCommit` until one returns 200. An org with 50 repos meant up to 50 calls per unresolved-SHA hover. Now: one /search/commits call per installation finds the SHA across every accessible repo in that installation. Org with 50 repos across 1 installation = 1 call. Same end state, far cheaper. The search response includes full commit data so no follow-up getCommit is needed. Indexing delay (~minutes) is acceptable because the chip re-queries on next hover. Tests: - 3 new tests on GithubInstallationClient.searchCommitBySha (200, empty, 422 rate-limit handling). - runResolveUnknownSha tests updated to stub searchCommitBySha; new test for the sync-disabled-repo case. - runWebhookPush tests updated to exercise the inline path (no API calls) and the queue fallback (compareRefs). - All 66 GitHub-service tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The disconnect mutation already deletes installation rows, repo rows, commits, releases, and tombstones from D1 (verified by the existing hard-delete test). The bug was purely client-side: the commit-lookup atom's reactivityKeys weren't invalidated by the disconnect mutation, so any CommitChip already mounted (e.g. on /services while the user disconnected from /settings) kept showing the cached author/message. Adds a shared 'commitLookup' reactivity key and threads it into the disconnect mutation's reactivityKeys, forcing every commit-lookup atom to refetch — which then returns null from the server, and the chip correctly flips to its unresolved state. Also retags the toast text: 'GitHub integration removed' (was 'suspended', misleading after the hard-delete cascade). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| .execute((db) => | ||
| db.select().from(githubInstallations).where(isNull(githubInstallations.suspendedAt)), | ||
| ) | ||
| .pipe(Effect.orDie)) as ReadonlyArray<GithubInstallationRow> |
There was a problem hiding this comment.
fyi orDie is kinda forbidden its like unwrap in rust kinda
| }) | ||
|
|
||
| const parseBranches = (json: string): ReadonlyArray<string> => { | ||
| try { |
There was a problem hiding this comment.
never use try catch in effect land use Effect Schema for stuff like this its very powerful
|
|
||
| const SHA_REGEX = /^[0-9a-f]{7,40}$/i | ||
|
|
||
| const toPersistenceError = (cause: unknown) => |
There was a problem hiding this comment.
this is already kind of a hack since it maps a cause should rather create some sort of client repo
| return out | ||
| } | ||
|
|
||
| export const HttpCommitsLive = HttpApiBuilder.group(MapleApi, "commits", (handlers) => |
There was a problem hiding this comment.
probbaly commits is not the best Route name here would rather be somehtign liek github or something or idk, just somethign github speicifc
| const queue = yield* GithubSyncQueue | ||
|
|
||
| const dbExecute = <T>(fn: (db: DatabaseClient) => Promise<T>) => | ||
| database.execute(fn).pipe(Effect.mapError(toPersistenceError)) |
There was a problem hiding this comment.
this is kind of a hack but fine, also use it in some palces
| .handle("commitsLookupBySha", ({ payload }) => | ||
| Effect.gen(function* () { | ||
| const tenant = yield* CurrentTenant.Context | ||
| const validShas = Array.from( |
There was a problem hiding this comment.
could maybe use a Effect Hashmap here, probably easier but could be wrong
| }) | ||
| } | ||
| const commits: GithubCommitRow[] = [] | ||
| for (const batch of chunk(validShas, 50)) { |
There was a problem hiding this comment.
never use for loops here in effect use Effect.forEach or similar since that way you can tune conccurancy
| inArray(githubRepositories.id, repoIds), | ||
| ), | ||
| ), | ||
| )) as ReadonlyArray<GithubRepositoryRow> |
There was a problem hiding this comment.
as types are kind of a hack imo satisfies is fine though
| // We run reconciliation inline rather than enqueueing because the | ||
| // payload is small, the user expects immediate UI updates after | ||
| // connect/push, and the 6h cron sweep catches anything we miss. | ||
| switch (event) { |

Summary
Adds a GitHub integration so the
deployment.commit_sharesource attribute on incoming spans resolves to author, message, avatar, and a GitHub deep link — across the services table, trace headers, span detail panels, attribute tables, the filter sidebar, and (viaAttributeRowauto-detection) anywhere else attributes are rendered.Supports multi-installation across multiple GitHub orgs and personal accounts, syncs all branches (default branch backfilled on connect, others picked up via
pushwebhooks), and works out of the box on both Cloud Maple and self-hosted deploys.Architecture
What we added
packages/db/drizzle/0014_github_integration.sql):github_installations,github_repositories,github_commits,github_releases(groundwork only — schema in place, no sync logic yet),github_unresolved_shas(DB-side tombstone for SHAs we tried to resolve and failed).GithubAppJwtService— RS256 JWT minting + installation token cache + webhook HMAC signature verification (constant-time via Web Crypto).GithubInstallationClient— REST client withLink: rel="next"pagination, 404/422/409 normalization, per-installation rate-limit handling.GithubAppService— install/disconnect/list orchestration; hard-delete cascade on disconnect (installation + repos + commits + releases + tombstones) so re-connecting starts from a clean slate.GithubSyncService— the 4 job handlers:runBackfill(paginated, queue-resumable),runWebhookPush(commits from push payload orcomparefallback),runResolveUnknownSha(best-effort SHA lookup across connected repos, tombstone after 3 failures),runReconcile(refresh installation metadata + repo list).integrations.github.*(install flow, list installations/repos, set sync, backfill, disconnect) andcommits.*(batched SHA lookup + manual resync).CommitChipwith hover-card (author avatar, message, GitHub link),CommitLookupProviderfor route-level SHA batching,GitHubIntegrationCardfor settings, plus 3 new icons.AttributeRowauto-detection — 7 commit-SHA attribute keys (deployment.commit_sha,git.commit.sha, etc.) automatically render asCommitChipwhenever an attribute table is shown. This means span detail, log detail, future error/alert panels — anywhere attributes appear — get the chip for free.Why Cloudflare Queues
The four job types are all async work the user's request shouldn't block on:
We considered:
ctx.waitUntil— works for happy-path inline calls but no durability, no retries, and 30s CPU ceiling per request can truncate large backfills.Queues hit the sweet spot: durable retries with exponential backoff, no per-job CPU ceiling, and lets
ctx.waitUntilcallsites stay simple. The queue's batch handler is the same Worker as the HTTP API (sameMainLivelayer, no version-skew between producer and consumer).We intentionally do NOT configure a dead-letter queue. All four job types have application-level recovery paths (the 6h reconcile cron re-fetches missed commits, the tombstone table tracks failed SHA lookups, the UI Backfill button can re-trigger any repo). A queue-level DLQ would only be an inspection aid, at the cost of forcing operators to provision a second queue. After 5 retries, messages are dropped and recovery falls to the application layer.
Infrastructure as code
The queue + cron + env-var bindings are declared in
apps/api/alchemy.run.ts. Runningalchemy deployprovisions everything idempotently — no manualwrangler queues createstep required. Local dev (bun dev) uses miniflare's built-in queue simulator viawrangler.jsonc.New processes / setup required
For Cloud Maple
None at the application level —
alchemy deployhandles infrastructure. Operations only needs to set 4 secrets in the deploy environment:GITHUB_APP_IDGITHUB_APP_SLUGGITHUB_APP_PRIVATE_KEY(multi-line PEM)GITHUB_APP_WEBHOOK_SECRETPlus register the Maple GitHub App once on github.com with webhook URL
https://<app-domain>/api/webhooks/githuband setup URLhttps://<app-domain>/api/integrations/github/callback.For self-hosted deployments
Documented in
docs/github-integration.md:alchemy deploy.That's it. The queue, cron trigger, and all bindings come up automatically.
Webhook event subscriptions on the GitHub App
Only
Push,Release(groundwork), andPull request(groundwork) need to be checked.installationandinstallation_repositoriesare delivered automatically to every GitHub App.Out of scope for v1 (groundwork ready, not implemented)
github_releasestable +releasewebhook subscription in place; sync logic + UI to follow.github_commits.pr_numbercolumn opportunistically populated; no UI surface yet.CommitChip).Test plan
🤖 Generated with Claude Code