Skip to content

Add Gazelle Clojure plugin (babashka-based parser)#100

Draft
miridius wants to merge 5 commits into
mainfrom
dave/faster-gen-srcs-bb
Draft

Add Gazelle Clojure plugin (babashka-based parser)#100
miridius wants to merge 5 commits into
mainfrom
dave/faster-gen-srcs-bb

Conversation

@miridius
Copy link
Copy Markdown
Contributor

@miridius miridius commented May 17, 2026

Background

Bazel reads BUILD.bazel files to figure out what to build. In a large Clojure repo those files describe every clojure_library / clojure_test / clojure_binary target plus the deps between them. Hand-maintaining them at scale is impractical, so we generate them from the source tree.

We've been doing that with gen_srcs (a standalone bazel run target). It walks the repo, parses every Clojure namespace, computes deps, writes BUILD files. It works, but it's outside the normal Bazel lifecycle: no incremental updates, no path-scoped runs, no per-package directives, no interop with other languages also generating BUILD files in the same repo.

This PR adds an experimental Gazelle plugin as an alternative. gen_srcs continues to exist until the Gazelle plugin is deemed stable (we'd eventually drop it).

bazel run @rules_clojure//gazelle:gazelle_bin                  # whole-repo update
bazel run @rules_clojure//gazelle:gazelle_bin -- src/foo bar/  # path-scoped

What's Gazelle?

Gazelle is a BUILD-file generator that runs as a Bazel build target. You write a language plugin (in Go, Gazelle's own implementation language) that tells it how to find your source files and what rules to emit; Gazelle handles the rest (directory walk, BUILD-file parsing, rule merging, cross-package dep resolution). Most modern Bazel rule sets ship a Gazelle plugin (Go, Java, Python, Rust, …) so users can keep their BUILD files in sync with one command.

Why a subprocess?

Gazelle plugins are Go code. But namespace parsing belongs in Clojure (edamame already handles reader conditionals, (ns ...) forms, splice forms, etc.). There aren't good Clojure parsers for Go, plus it's MUCH easier for us to maintain Clojure code.

The plugin follows the Java Gazelle plugin architecture: the Go side is glue, the actual parsing lives in a long-running Clojure subprocess it talks to over stdio.

Why babashka, not JVM Clojure?

bb's ~30ms cold start (vs ~1s for the JVM) makes path-scoped runs viable: a per-file-save gazelle invocation completing in sub-second time vs. multi-second for the JVM variant.

The cost is that bb can't load tools.deps (Java reflection blocks AOT), so the bb side has its own rule-construction code (gazelle_server.bb) rather than reusing rules-clojure.gen-build. The two implementations are kept in sync via a shared parity fixture (test/rules_clojure/rollup_rules_fixtures.edn) loaded by both sides' tests.

flowchart LR
    Bazel[bazel run //gazelle:gazelle_bin] --> Plugin[Go plugin<br/>gazelle/]
    Plugin <-->|JSON lines<br/>on stdio| Server[bb subprocess<br/>gazelle_server.bb]
    Plugin --> Builds[BUILD.bazel files]
Loading

Life of a Gazelle run

Gazelle walks the repo top-down, calling hook points on every package. The plugin implements all four:

Hook Job
Configure Read per-package # gazelle: directives. On the root call, auto-discover deps.edn and boot the bb subprocess.
GenerateRules RPC the bb server with the package's file list. Translate returned {kind, attrs} specs into *rule.Rule.
Resolve For each clojure_library, walk its :requires and fill in :deps against Gazelle's cross-package index.
AfterResolvingDeps Shut the subprocess down.

Wire protocol

The bb server speaks newline-delimited JSON on stdio. One request per line, one response per line.

Request When Response
init Once at startup Resolved dep graph (dep_ns_labels per platform), deps_bazel overrides, source_paths, ignore_paths
parse Once per package NamespaceInfo per basename group + the __clj_lib / __clj_files rollup rules

Stdio (not gRPC / sockets / etc.) because zero setup, crash-safe (subprocess dies, Go side sees EOF and log.Fatalfs), easy to debug (bb gazelle_server.bb and paste JSON at it).

What lives where

Rule construction is bb-side (gazelle_server.bb's ns-rules): AOT-vs-plain decisions (:bazel/clojure_library ns-meta), test attr passthrough (:bazel/clojure_test for size/tags/timeout), :require → dep label mapping, clojure_binary for :bazel/clojure_binary, java_library for .js siblings, and rollup composition. ns-rules returns [{:type :clojure_library :attrs {...}} ...]; Go translates verbatim into *rule.Rule.

The Go side does three things:

  1. Subprocess plumbing. Start the bb server, marshal requests, parse responses, mark the runner dead on any I/O error so subsequent calls short-circuit instead of racing a corpse. Tees the subprocess stderr through an in-memory ring buffer so a crash's final lines surface in processExitInfo.
  2. Wire-format translation. {:type :clojure_library :attrs {:name "core" …}}rule.NewRule("clojure_library", "core") + r.SetAttr(...).
  3. Dep resolution. Per clojure_library: intra-repo index first (matches (:require [my.foo]) to //src/my:foo when another package generated it), then init's dep_ns_labels per platform, plus per-target overrides from deps_bazel.

Static deps that don't need Gazelle's index (org_clojure_clojure, import-deps, gen-class-deps, ns-library-meta extras) are pre-merged bb-side and seeded into the dep set from the rule's existing :deps; Resolve only adds what genuinely needs the cross-package index.

Configuration

Per-package directives in BUILD-file comments:

# gazelle:clojure_enabled false        # skip this directory tree entirely
# gazelle:clojure_deps_edn deps.edn    # which deps.edn to use (default: root)
# gazelle:clojure_deps_repo @other     # use a different repo tag for external labels
# gazelle:clojure_aliases :dev,:test   # deps.edn aliases to activate at init time

Important

Migrating from gen_srcs? gen_srcs takes aliases via CLI args. The plugin reads them from # gazelle:clojure_aliases :a,:b,... in the root BUILD.bazel. If unset, the plugin defaults to every alias in deps.edn (matching gen_srcs's typical deps.install(aliases = [...]) invocation pattern).

Failure semantics

An empty GenerateResult for a previously-rule-bearing package looks to Gazelle like "delete every rule". A green run that wipes the build graph is worse than a noisy exit, so the plugin fails loud on:

  • Parser startup / transport / shutdown errors (subprocess stderr tail included in the message)
  • Walk errors under subdirHasClojureFiles (permission denied, broken symlink, etc.)
  • Unknown rule kinds from the bb server (closed RuleKind enum)
  • Malformed wire shapes: missing dep_ns_labels.clj / .cljs, or NamespaceInfo mixing the Clojure-group and JS-only shapes
  • Missing rules_clojure bazel_dep in MODULE.bazel

bb side: the request loop catches Throwable (not just Exception) so OOM / VirtualMachineError can't silently kill the subprocess; non-fatal errors return a {type:"error", message: <full cause chain>} envelope so the actionable root cause surfaces.

Subdir rollup

Each package emits a __clj_lib / __clj_files rollup that aggregates its own rules plus the rollups of any Clojure-bearing subdirectories. Naively that's an O(n) WalkDir per package — quadratic across the tree. Gazelle's bottom-up walk lets the plugin record hasClojureContent[rel] for each visited package and consult it as an O(1) lookup when the parent gets generated (falls back to the on-disk walk for any subdir we haven't visited yet — defensive, shouldn't happen in normal Gazelle ordering).

Intermediate-only directories (no direct .clj/.cljs/.cljc/.js files, but Clojure-bearing subdirs) still emit rollup rules so consumers of //foo:__clj_lib keep working when foo/ is an aggregator with code only in foo/bar/.

Tests

  • //test/rules_clojure:gazelle-server-bb-test wraps bb-side unit tests covering ns parsing (reader conditionals, splice forms), libspec shapes, @deps/BUILD.bazel multi-line parsing, rule rollup, cache invariants (corrupt sha, corrupt transit), find-output-base cache validation, scan-jar .cljc / .cljs paths, parse-group failure modes, the -main request loop, and handle-init source-path tiebreaker.
  • //gazelle:gazelle_test is the Go-side Configure / GenerateRules / Resolve suite, including an end-to-end GenerateRules test against a stubbed bb script that exercises the full Configure → GenerateRules → Resolve pipeline.
  • //gazelle/clojureparser:clojureparser_test is the wire-protocol round-trip tests using a self-contained tempdir fixture (tiny deps.edn + empty @deps/BUILD.bazel set via GAZELLE_DEPS_BUILD).

Cross-process parity between gazelle_server.bb's rollup-rules and gen_build.clj's rollup-rules is pinned by test/rules_clojure/rollup_rules_fixtures.edn, loaded by tests on both sides.

Relation to #84

Supersedes #84 (the JVM-Clojure parser variant). That PR can be closed once this one is reviewed.

@miridius miridius force-pushed the dave/faster-gen-srcs-bb branch 12 times, most recently from e60ac8a to a84aa7d Compare May 18, 2026 07:46
miridius added a commit that referenced this pull request May 18, 2026
Bugs fixed:
- parse-ns-form / read-ns-from-jar-entry now use edamame/parse-string-all
  so a leading (set! *warn-on-reflection* true) before (ns ...) no longer
  hides the namespace.
- gen-dir's rollup-rules :lib-deps now uses emitted rule :name attrs
  rather than path basenames, so an ns-binary-meta :name override still
  produces valid deps.
- gazelle/clojureparser.Runner.Shutdown sets dead=true to short-circuit a
  subsequent Parse that would otherwise race a closed stdin.
- receive() copies the scanner buffer before returning so callers can
  hold the slice safely.
- find-output-base throws on non-zero bazel info exit instead of
  returning empty and propagating a misleading "@deps/BUILD.bazel not
  found under /external" later.
- handle-parse throws when no source-path matches rel-dir (was silently
  emitting rules with resource_strip_prefix="").
- Cache transit-read is wrapped so a corrupt cache file (truncated
  transit, partial-write from a killed prior run) triggers a clean
  rebuild instead of crashing handle-init.
- applyAttr rejects non-integer floats for int Bazel attrs (would have
  silently truncated).
- ClojureExtensionDirective renamed to ClojureEnabledDirective so the
  Go-side name matches the user-facing directive value.

Test improvements:
- parse-deps-build-multi-aot-entries now builds a real jar so all three
  AOT namespaces actually appear in clj-ns->label (was a smoke test).
- resolve-deps-build-override + probe-bzlmod-deps-build extracted as
  testable helpers; new tests for canonical / apparent / missing
  branches and the override existence checks.
- New tests for ApparentLoads (remapped module, missing module fatal,
  Loads/ApparentLoads delegation) and subdirHasClojureFiles walkErr
  fatal path.
- TestImportsRuleNsHit now asserts ImportSpec.Lang.
- TestGenerateRulesFatalsOnParserDeath pins the actual fatal message.
- Exception-chain ExecutionException-without-cause now asserts the
  wrapper message is preserved.
- fatal-error-detection JDK-class-hierarchy tautology removed.

Code quality:
- Dead `resolved` atom in resolve-ns-deps dropped.
- basename / file-ext delegate to babashka.fs/strip-ext and /extension.
- rule-spec->wire uses update-keys.
- handle-parse threads rel-dir into resolve-ns-deps so the
  unresolved-requires warning shows which directory.

Comments and docs:
- Go comments referencing rule construction now point at
  gazelle_server.bb's ns-rules, not gen-build / gen_build.clj.
- DepsEdn() docstring corrected (absolute path, not workspace-relative).
- Platform-keys cross-reference fixed to Platform* constants.
- Cache key docstring at top of gazelle_server.bb names all three
  inputs (BUILD content + format version + no-aot set).
- "pick the platform-appropriate one and stop" replaced with the actual
  behaviour (both labels emitted, map semantics dedupe).
- emdashes in docstrings / println output replaced with parens per
  project convention.
- "Bazel built-ins" comment for java_library qualified.
- clojure_test rule kind declares MergeableAttrs for env/tags/jvm_flags/
  size/timeout so user edits aren't clobbered.

Test infrastructure:
- bb tests guard their entry point with (when (= *file* ...)) so
  load-file callers don't trigger System/exit.
- CircleCI installs babashka and sets GAZELLE_INTEGRATION_TEST_REQUIRED=1
  so the bb-side integration tests can't silently no-op in CI.

PR description (#100):
- Drop "13 bb-side unit tests" (actual count is 42).
- Clarify relation to #98 / #99 (this PR is standalone; parity claims
  assume those land).
- "Alternative to gen_srcs" rather than "Replaces".
@miridius miridius force-pushed the dave/faster-gen-srcs-bb branch 4 times, most recently from 032e41f to bedd65e Compare May 22, 2026 18:46
A Gazelle Clojure language plugin keeping `BUILD.bazel` files in sync with
Clojure source. Bundled into `gazelle_bin`; the Go plugin spawns a `bb`
subprocess (fetched via `rules_multitool`) that parses
`.clj` / `.cljc` / `.cljs` files and `@deps/BUILD.bazel` over a newline-JSON
wire protocol.

Why bb
- Cold start ~30ms vs ~1s for a JVM-based parser; no daemon needed for
  incremental use.
- Substantially faster full-repo regen than `gen_srcs`, plus a
  sub-second path-scoped mode that `gen_srcs` can't service at all.
- edamame picks up reader-conditional / macro-heavy CLJS namespaces in
  jar contents that `clojure.tools.reader` silently dropped.

How it works
- Long-lived `bb` subprocess speaking newline-JSON. On `init` it parses
  `@deps/BUILD.bazel` and caches the per-jar ns scan to disk; cache key
  mixes BUILD content, cache-format version, and `:bazel :no-aot` so any
  of those changing invalidates.
- bb's ns-rules mirrors `cljs.analyzer/aliasable-clj-ns?`: a `clojure.X`
  require from a CLJS source rewrites to `cljs.X` when the original has
  no CLJS-loadable form and the replacement does.
- Gazelle merges with hand-written `BUILD` rules cleanly; only the
  rules_clojure load line is touched.

Co-authored-by: Daniel Compton <desk+github@danielcompton.net>
Co-authored-by: Claude <noreply@anthropic.com>
@miridius miridius force-pushed the dave/faster-gen-srcs-bb branch from bedd65e to 0d8133f Compare May 26, 2026 20:07
miridius added 4 commits May 28, 2026 17:07
`parse-deps-build` only matched buildifier-canonical blocks where `(`
and `)` sit on their own lines. rules_clojure's @deps extension emits
the compact form (open-paren + first arg on one line, close on the last
arg line); every block silently failed to match, producing an empty
dep_ns_labels map and a wall of bogus 'unresolved' warnings for every
require in any consumer with a substantial CLJS surface.

Broaden the open regex to allow content after the paren, and let block-
end match any line whose trailing `)` closes the call. Pinned both
shapes with a compact-format `java_import` test and a compact-format
`clojure_library` AOT test.
Closure stdlib namespaces (goog.string, goog.object, ...) live as raw
JavaScript inside the closure-library jar, with no `(ns ...)` form for
scan-jar to index. A CLJS file requiring goog.string would generate a
BUILD with no dep entry for it, leaving the cljs compiler unable to link.

cljs.core's wrapper label (org_clojure_clojurescript) transitively
depends on closure-library, so routing any goog.* require to that label
gets the goog code on the compile classpath without a fragile hardcoded
closure-library label name.
Empty was unset, so a clojure_library whose .clj was deleted kept its
BUILD entry across runs. Add orphan stubs; lift test_ns / main_class
into MergeableAttrs so IsEmpty fires after merge.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant