Skip to content

Code Organization

Joshua Shinavier edited this page May 30, 2026 · 18 revisions

Code organization

This page describes Hydra's code organization pattern used across multiple language implementations.

The packages/heads/dist pattern

Hydra uses a consistent separation between hand-written and generated code organized into three top-level directories.

Principle

  • packages/ holds a package's DSL-based module definitions, plus source-language helpers used to write them.
  • heads/ holds per-host runtimes that run those modules after they have been translated to a target language.
  • dist/ holds generated and copied artifacts, never edited by hand.

The test for whether a file belongs in packages/ or heads/: does it describe (or help describe) Hydra modules, or does it run them after translation? Description goes in packages/; running goes in heads/.

A package's source-language helpers (such as extra DSLs convenient for specifying that package's module definitions) live alongside the package's DSL sources in packages/, written in the same language as the sources. In some cases these helpers can be exported and reused by other packages written in the same source language.

The Hydra repository is not a place for general-purpose utilities written in a specific host language. Host-specific code that is not part of writing or running Hydra modules belongs elsewhere. The one deliberate exception is bindings/, which holds host-specific third-party integrations — adapters and utilities that connect Hydra to external systems in a particular host language. Code in bindings/ is not for runtime or bootstrapping and is not subject to the packages/ vs heads/ rule.

About bindings/

bindings/ is the third structural category alongside packages/ and heads/, introduced after the rollup-everything-into-hydra-java design proved unworkable. The rules:

  • Each binding is a hand-written Maven/PyPI/etc. artifact (no DSL definition, no JSON pipeline, not in hydra.json's package list).
  • Each binding depends on exactly one Hydra package (e.g., hydra-rdf4j depends on hydra-rdf) and optionally on the third-party library it wraps.
  • Bindings are independently versioned and publishable. In a multi-project Gradle build they participate as project(':hydra-rdf4j') references; downstream consumers pull the published artifact.
  • Bindings are not consumed by the bootstrap demo or by any Hydra package. They sit at the leaves of the dependency graph, not in the spine.

Two flavors of binding exist:

  1. Third-party adapters — wrap an external library against a Hydra package (e.g., hydra-rdf4j connects hydra.rdf.syntax.* to Eclipse rdf4j; hydra-neo4j parses Cypher/GQL via ANTLR and converts to hydra.pg.query.*). Most bindings are this shape.
  2. Per-package host DSL helpers — hand-written host-language code that provides DSL surface for a Hydra package, with no third-party dependency (e.g., hydra-pg-dsl provides Java fluent builders for hydra.pg.{model,query}). These exist in bindings/ rather than heads/<host>/ because they're tied to one Hydra package, not to the host language's Hydra runtime.

If hand-written host-language code wants to depend on a third-party library (rdf4j, ANTLR, Neo4j, Apache Jena, TinkerPop, etc.) or wants to provide Java/Python/etc. DSL surface for a specific Hydra package, that code belongs in a binding, not in a heads/<lang>/ runtime. The runtime stays third-party-free except for host stdlib + minimal build tooling.

Directory structure

  • packages/ - DSL source packages. Most are Haskell-based, but hydra-java and hydra-python are now host-language-native (Java and Python sources respectively); see the per-package notes below.

    • packages/hydra-kernel/ - Kernel type and term modules (the heart of Hydra, written in the Hydra DSL — Haskell-based)
    • packages/hydra-haskell/ - Haskell coder DSL sources (Haskell-based)
    • packages/hydra-java/ - Java coder DSL sources (Java-based as of 0.15; legacy Haskell sources retained as backup until 0.16)
    • packages/hydra-python/ - Python coder DSL sources (Python-based as of 0.15; legacy Haskell sources retained as backup until 0.16)
    • packages/hydra-scala/, packages/hydra-lisp/ - Per-target coder DSL sources (Haskell-based)
    • packages/hydra-pg/ - Property graph models and coders (Pg, Cypher, Tinkerpop, Graphviz)
    • packages/hydra-rdf/ - RDF, SHACL, OWL, ShEx, and XML schema models
    • packages/hydra-ext/ - Long-tail extension coders
    • packages/hydra-bench/ - Synthetic inference benchmark workloads (opt-in via bin/sync-bench.sh) (Avro, Protobuf, GraphQL, Cpp, Csharp, Go, Rust, TypeScript, Yaml, ...)
    • packages/hydra-coq/, packages/hydra-typescript/, packages/hydra-go/, packages/hydra-wasm/ - Additional targets (Coq complete; TypeScript complete per #126; Go and Wasm are "head buds" with partial runtimes)
  • heads/ - Hand-written runtime code per language

    • heads/haskell/ - Haskell primitives, DSL helpers, code generation utilities, tests
    • heads/java/ - Java primitives, utilities, framework classes, tests
    • heads/python/ - Python primitives, DSL utilities, tests
    • heads/scala/ - Scala primitives, tests
    • heads/lisp/ - Per-dialect Lisp runtimes (clojure/, scheme/, common-lisp/, emacs-lisp/) sharing the hydra-lisp coder
    • heads/typescript/ - TypeScript runtime (#126)
    • heads/go/, heads/wasm/ - Head buds; partial runtimes pending completion
  • dist/ - Generated code per language

    • dist/haskell/hydra-kernel/ - Generated Haskell kernel
    • dist/java/hydra-kernel/ - Generated Java kernel
    • dist/python/hydra-kernel/ - Generated Python kernel
    • dist/scala/hydra-kernel/ - Generated Scala kernel
    • dist/typescript/hydra-kernel/ - Generated TypeScript kernel
    • dist/go/hydra-kernel/ - Generated Go kernel (head bud — kernel only)
    • dist/clojure/, dist/scheme/, dist/common-lisp/, dist/emacs-lisp/ - Per-dialect Lisp kernels
    • dist/json/ - JSON kernel modules (canonical interchange format; tracked in git)

Only dist/json/ and dist/haskell/ are tracked in git; the rest regenerate from dist/json/ on demand.

Benefits

This separation serves several purposes:

  1. Clear distinction - Easy to identify what is hand-written vs. generated
  2. Multi-language parity - Same DSL sources generate Haskell, Java, Python, Scala, and Lisp implementations
  3. Reproducibility - Generated code can be recreated from sources at any time
  4. Version control - Both source and generated code are checked in, enabling:
    • Tracking changes and reviewing diffs
    • Understanding the impact of DSL changes
    • Bisecting regressions across generations
  5. Separation of concerns - Language-specific runtime code stays in heads/, while the kernel remains pure in dist/

Generated code markers

All generated files include a header comment indicating they should not be manually edited. For example in Java:

// Note: this is an automatically generated file. Do not edit.

Or in Haskell:

-- Note: this is an automatically generated file. Do not edit.

Regenerating code

Generated code should be regenerated whenever DSL sources change. See the specific README files for each package:

Test code

Generated tests follow the same pattern as generated main code:

  • dist/<lang>/hydra-kernel/src/test/ - Generated test code
    • Common test suite ensuring parity across implementations
    • Generated from the same test specifications
    • Validates that all language implementations behave identically

Implementation-specific details

Each Hydra package adapts this pattern to its language and purpose:

Hydra-Haskell

The Haskell implementation serves as the bootstrapping implementation for the entire Hydra project. Hand-written sources are split between packages/hydra-kernel/ (kernel type and term specifications), packages/hydra-haskell/ (Haskell coder DSL sources), and heads/haskell/ (runtime: primitives, DSL helpers, code generation drivers, tests). Generated code lives under dist/haskell/.

  • packages/hydra-kernel/src/main/haskell/ contains:

    • Kernel type and term DSL specifications (Hydra/Sources/Kernel/Types/, Hydra/Sources/Kernel/Terms/)
    • Canonical primitive registry — one PrimitiveDefinition-emitting module per hydra.lib.<sub> namespace (Hydra/Sources/Kernel/Lib/)
    • Host-side primitive bindings — pairs primitive names with native impls (Hydra/Sources/Libraries.hs)
  • packages/hydra-haskell/src/main/haskell/ contains:

    • Haskell coder DSL sources (Hydra/Sources/Haskell/)
  • heads/haskell/src/main/haskell/ contains:

    • DSL helpers and wrappers (Hydra/Dsl/Meta/)
    • Native primitive implementations (Hydra/Haskell/Lib/)
    • Code generation utilities (Hydra/Generation.hs)
  • dist/haskell/hydra-kernel/src/main/haskell/ contains:

    • Complete generated kernel implementation
    • Generated DSL modules (Hydra/Dsl/) with constructors, accessors, and updaters for all Hydra types
    • Generated from DSL sources via writeHaskell and writeDslHaskell

See Hydra-Haskell README for details.

Hydra-Java

The Java implementation provides a Java API for Hydra with the same kernel semantics. The Java coder DSL sources are themselves written in Java (as of 0.15); hand-written runtime lives under heads/java/; generated code under dist/java/hydra-kernel/.

  • packages/hydra-java/src/main/java/hydra/sources/java/ contains:

    • The Java coder DSL sources: Syntax.java, Language.java, Coder.java, Serde.java, Names.java, Utils.java, Environment.java, Testing.java
    • Support classes (JavaHelpers.java, SourceDsl.java)
    • These are the source of truth for hydra.java.* modules; the self-host entry point is bin/generate-hydra-java-from-java.sh.

    A legacy Haskell-DSL copy of the same modules still lives under packages/hydra-java/src/main/haskell/Hydra/Sources/Java/ and produces byte-identical output. It will be dropped before 0.16; the main sync sequence will switch over to the Java-native pipeline in the meantime.

  • heads/java/src/main/java/ contains:

    • Hand-written primitive function implementations (hydra/lib/)
    • Core utilities (hydra/util/)
    • Framework classes (hydra/tools/)
    • Core algorithms (Rewriting.java, Reduction.java)
    • The native Java DSL → JSON driver (hydra/UpdateJavaJson.java)
    • Language-specific parsers
  • dist/java/hydra-kernel/src/main/java/ contains:

    • Generated Java code from Hydra DSL sources
    • Core types (hydra/core/)
    • Graph and module structures
    • Type adapters and computational abstractions
    • Generated via writeJava in heads/haskell

Uses the visitor pattern for representing algebraic data types in Java.

See Hydra-Java README for details.

Domain packages (hydra-pg, hydra-rdf, hydra-ext, hydra-bench)

Extension modules are organized into four domain-specific packages:

  • packages/hydra-pg/ - Property graph models, coders, and related tools. See the Hydra-PG README.

    • PG data model (Hydra/Sources/Pg/)
    • GraphSON, GQL, Cypher, TinkerPop syntax models
    • Graphviz support
  • packages/hydra-rdf/ - RDF, SHACL, OWL, ShEx, and XML schema models. See the Hydra-RDF README.

    • RDF syntax model (Hydra/Sources/Rdf/)
    • SHACL model and coder (Hydra/Sources/Shacl/)
    • OWL 2 syntax model (Hydra/Sources/Owl/)
  • packages/hydra-ext/ - Long-tail extension coders

    • Avro, Protobuf, GraphQL, Pegasus
    • Cpp, Csharp, Go, Rust, TypeScript syntax models
    • Kusto, Delta, Datalog, JSON Schema, YAML, and other miscellaneous models
  • packages/hydra-bench/ - Synthetic inference benchmark workloads (hydra.bench.*). See the Hydra-Bench README. Deliberately stress-shaped; not regenerated by the default sync. Use bin/sync-bench.sh to refresh on demand before running bin/run-inference-bench.sh.

Generated code for these packages lives in dist/*/hydra-pg/, dist/*/hydra-rdf/, dist/*/hydra-ext/, and dist/*/hydra-bench/. Demos are in demos/ at the repository root.

Hydra-Python

The Python implementation uses the same pattern as other implementations. The Python coder DSL sources are themselves written in Python (as of 0.15); hand-written runtime lives under heads/python/; generated code under dist/python/hydra-kernel/.

  • packages/hydra-python/src/main/python/hydra/sources/python/ contains:

    • The Python coder DSL sources: syntax.py, language.py, coder.py, serde.py, names.py, utils.py, environment.py, testing.py
    • Support modules (_python_helpers.py, _kernel_refs.py)
    • These are the source of truth for hydra.python.* modules; the self-host entry point is bin/generate-hydra-python-from-python.sh.

    A legacy Haskell-DSL copy of the same modules still lives under packages/hydra-python/src/main/haskell/Hydra/Sources/Python/ and produces byte-identical output. It will be dropped before 0.16; the main sync sequence will switch over to the Python-native pipeline in the meantime.

  • heads/python/src/main/python/ contains:

    • Hand-written primitive implementations (hydra/lib/)
    • DSL utilities (hydra/dsl/)
    • Language-specific parsers and extensions
  • dist/python/hydra-kernel/src/main/python/ contains:

    • Generated Python code from Hydra DSL sources
    • Core types (hydra/core.py)
    • Graph and module structures (hydra/graph.py, hydra/module.py)
    • Type inference and checking (hydra/inference.py, hydra/checking.py)
    • Term transformations (hydra/reduction.py, hydra/rewriting.py, hydra/hoisting.py)
    • Generated via writePython in heads/haskell
  • dist/python/hydra-kernel/src/test/python/ contains:

    • Generated test suite ensuring parity with Haskell, Java, Python, Scala, and Lisp
    • Generation tests (terms generated to Python and executed)

See Hydra-Python README for details.

Related topics

  • Implementation - Detailed implementation guide including the bootstrap process
  • Testing - Common test suite and language-specific testing
  • Concepts - Core concepts and design principles

Clone this wiki locally