Testing

Testing in Hydra

Hydra maintains a comprehensive test suite to ensure correctness and parity across all language implementations. Testing is organized into two categories: the common test suite (shared across all implementations) and language-specific tests.

The common test suite (hydra.test.testSuite) is designed to run identically in each Hydra implementation. Passing all test cases in the common test suite is the criterion for a true Hydra implementation, ensuring that all implementations behave identically and can interoperate in heterogeneous environments.

Prerequisites

To run tests, you need:

Hydra built locally (see main README)
For Haskell: Stack installed
For Java: JDK 11+ and Gradle
For Python: Python 3.12+ and pytest

This guide is for:

Contributors adding new tests to the common suite
Developers implementing new language backends
Anyone debugging test failures

See also:

Test Suite Architecture - Detailed architecture of the test kernel and module structure
Implementation - How tests are code-generated
Concepts - Understanding Flow and primitives

Common test suite

The common test suite is part of Hydra's "test kernel" and is written in the Hydra DSL at Hydra/Sources/Test. Like the rest of Hydra's kernel, these test definitions are code-generated into Haskell, Java, Python, Scala, TypeScript, and the Lisp dialects using Hydra's own coders.

Kernel tests vs generation tests

Hydra's Common Test Suite gives rise to two kinds of tests: kernel tests and generation tests. Both are derived from the same suite of test cases (defined in Hydra/Sources/Test/), but they are instantiated differently to test different modes of operation:

Kernel tests validate that Hydra's runtime works correctly. The test cases are code-generated into each target language, and each implementation runs them against its own kernel (primitives, type checker, reducer, etc.). This answers: "Does the Hydra kernel behave correctly in this language?"
Generation tests validate that Hydra's code generators produce correct output. The same test cases are used to generate code in each target language, then verify that the generated code compiles and produces the expected results. This answers: "Does code generated by Hydra work correctly in the target language?"

In other words, kernel tests exercise Hydra-as-interpreter, while generation tests exercise Hydra-as-compiler.

Aspect	Kernel Tests	Generation Tests
Purpose	Test the Hydra runtime	Test code generation
Source	Common test suite DSL	Common test suite DSL
Instantiation	Generated test harness runs against kernel	Generated target-language code is compiled and executed
Regeneration	`bin/sync.sh` (or per-language `bin/sync-<lang>.sh`) emits both kernel-test and generation-test artifacts to `dist/<lang>/hydra-kernel/src/test/`

Both test types are generated to dist/<lang>/hydra-kernel/src/test/ but serve complementary purposes in ensuring Hydra's correctness across both modes of operation.

Test categories

The common test suite currently includes:

1. Primitive function tests

Tests for Hydra's standard library functions in hydra.lib:

List primitives (Test/Lib/Lists.hs):

apply, bind, concat, cons, drop, elem
filter, find, foldl, foldr, group, intercalate, intersperse
length, map, maybeAt, maybeHead, maybeInit, maybeLast, maybeTail
nub, null, partition, pure, replicate, reverse
singleton, sort, span, take, transpose, uncons, zip

Example test cases:

lists.maybeAt: Access elements at specific indices; Nothing on out-of-bounds
lists.map: Apply functions over lists
lists.concat: Concatenate nested lists
lists.reverse: Reverse list order

String primitives (Test/Lib/Strings.hs):

cat, cat2, fromList, intercalate
length, lines, maybeCharAt, null, splitOn, toList
toLower, toUpper, unlines

Example test cases:

Unicode handling: "ñ世🌍" (combining characters, multi-byte)
Empty string edge cases
Control characters and special characters

2. Formatting tests

Case convention conversion tests (Test/Formatting.hs):

lower_snake_case ↔ UPPER_SNAKE_CASE ↔ camelCase ↔ PascalCase
Handles numbers and edge cases: "a_hello_world_42_a42_42a_b"

3. Type inference tests

Tests for Hydra's type inference system (Test/Inference):

AlgebraicTypes.hs: Sum types, product types, union/record inference
NominalTypes.hs: Named type definitions
Fundamentals.hs: Basic inference cases
Classes.hs: Type-class constraint propagation (equality, ordering)
AlgorithmW.hs: Algorithm W implementation tests
KernelExamples.hs: Real kernel code examples
Failures.hs: Expected failure cases

Test case types

The common test suite supports multiple test case types:

Evaluation tests: Verify that a term reduces to an expected result
Case conversion tests: Verify string case convention transformations
Inference tests: Verify that type inference produces the expected type
Inference failure tests: Verify that type inference fails as expected for invalid terms

How it works

1. Test Definition (DSL)

Tests are defined in the Hydra DSL in packages/hydra-kernel/src/main/haskell/Hydra/Sources/Test/:

-- Example from Test/Lib/Lists.hs
listsReverse = TestGroup "reverse" Nothing [] [
    test "basic list" [1, 2, 3, 4, 5] [5, 4, 3, 2, 1],
    test "single element" [42] [42],
    test "empty list" [] []]
  where
    test name input output =
        primCase name _lists_reverse [intList input] (intList output)

2. Code Generation

Test definitions are generated into each target language using Hydra's coders:

Haskell: dist/haskell/hydra-kernel/src/test/haskell/Hydra/Test/TestSuite.hs
Java: dist/java/hydra-kernel/src/test/java/hydra/test/
Python: Generated to dist/python/hydra-kernel/src/test/python/hydra/test/

Regenerating tests: kernel-test and generation-test artifacts are produced by the standard sync pipeline. Running bin/sync-haskell.sh (or the full bin/sync.sh) emits both kinds via bootstrap-from-json, writing to dist/<lang>/hydra-kernel/src/test/.

For a Haskell-only refresh of the generation tests after editing test sources:

heads/haskell/bin/update-generation-tests.sh

See Test Generation Architecture below for details on test sources.

3. Test runners

Each language has a test runner that executes the generated test suite:

Haskell (TestSuiteSpec.hs):

Uses HSpec framework
Runs: stack test in heads/haskell/
Interactive: stack ghci hydra:lib hydra:hydra-test then Test.Hspec.hspec Hydra.TestSuiteSpec.spec

Java (TestSuiteRunner.java):

Uses JUnit 5 with parameterized tests
Runs: ./gradlew test from root directory
Executes evaluation tests via term reduction

Python (test_suite_runner.py):

Uses pytest framework
Runs: pytest src/test/python/test_suite_runner.py in heads/python/
Dynamically generates pytest test functions from the test suite
Includes evaluation tests via term reduction

Guaranteeing parity

The common test suite ensures that all Hydra implementations behave identically:

Same test definitions: All implementations run exactly the same tests (generated from one source)
Same primitive functions: Each language implements the same set of primitives with identical behavior
Same reduction semantics: Term evaluation produces identical results across languages
Continuous validation: Tests run on every build to catch regressions

This is critical for heterogeneous environments like Apache TinkerPop where the same logic must execute identically across multiple languages.

Test generation architecture

Test cases live in DSL-defined modules under packages/hydra-kernel/src/main/haskell/Hydra/Sources/Test/. Each test module is a regular Hydra module that constructs TestCase / TestGroup values; the same module is code-generated into every target language alongside the kernel, and each language's runner walks the resulting TestSuite tree to execute the cases.

The kernel test modules cover:

hydra.test.checking.* — type checker tests (algebraic types, collections, fundamentals, nominal types, advanced cases, expected failures).
hydra.test.inference.* — type inference tests (algorithm W, algebraic types, classes, collection terms, fundamentals, kernel examples, nominal types, expected failures).
hydra.test.lib.* — primitive-library tests (per-namespace).
Topical suites: hydra.test.formatting, hydra.test.generation, hydra.test.hoisting, hydra.test.json.*, hydra.test.ordering, hydra.test.reduction, hydra.test.rewriting, hydra.test.serialization, hydra.test.sorting, hydra.test.strip, hydra.test.substitution, hydra.test.unification, hydra.test.validate.*, hydra.test.variables.
The umbrella hydra.test.testSuite module aggregates every test group above into a single TestSuite tree.

Adding a new test case means adding a TestCase/TestGroup value to one of these modules (or creating a new module and wiring it into hydra.test.testSuite). After regeneration, every language picks up the new case via the next sync. See the Extending tests recipe for step-by-step instructions.

Setting up and running tests

This section provides quick-start instructions for running tests in each language. For complete setup instructions (installing dependencies, configuring your environment), see the language-specific READMEs linked below.

Haskell

See Hydra-Haskell README for full setup instructions.

Run all tests (kernel tests + generation tests + language-specific tests):

cd heads/haskell
stack test

Interactive testing (useful for debugging):

stack ghci hydra:lib hydra:hydra-test

-- Run kernel tests
Test.Hspec.hspec Hydra.TestSuiteSpec.spec

-- Run generation tests
Test.Hspec.hspec Generation.Spec.spec

Regenerate tests after modifying test sources:

# Regenerate Haskell kernel + generation tests via the standard sync pipeline
heads/haskell/bin/sync-haskell.sh        # or bin/sync-haskell.sh from worktree root

# Generation tests only (e.g. for fast iteration on hspec specs)
heads/haskell/bin/update-generation-tests.sh

# Run updated tests
stack test

Test directory structure:

heads/haskell/
├── src/test/haskell/           # Hand-written test infrastructure
│   ├── Spec.hs                 # Entry point (hspec-discover)
│   └── Hydra/TestSuiteSpec.hs  # Kernel test runner
dist/haskell/hydra-kernel/
├── src/test/haskell/           # Generated tests
│   ├── Hydra/Test/             # Kernel test data (TestGroup structures)
│   └── Generation/             # Generation tests (executable specs)

Java

See Hydra-Java README for full setup instructions.

Run all tests (kernel tests + generation tests + Java-specific tests, from the repository root):

./gradlew :hydra-java:test

Run a specific test class:

./gradlew test --tests "hydra.VisitorTest"

Regenerate tests (from the worktree root):

./bin/sync-java.sh

This wraps bin/sync.sh --hosts java --targets java, which regenerates Java code across every package (kernel modules, eval libs, coders, tests) and runs the Java test suite.

For manual generation, see the Hydra-Java README.

Test directory structure:

heads/java/
├── src/test/java/              # Hand-written test infrastructure
│   ├── hydra/TestSuiteRunner.java  # Kernel test runner (JUnit 5)
│   └── hydra/VisitorTest.java      # Java-specific tests
dist/java/hydra-kernel/
├── src/test/java/              # Generated tests
│   ├── hydra/test/             # Kernel test data (TestGroup structures)
│   └── generation/             # Generation tests (executable JUnit tests)

Python

See Hydra-Python README for full setup instructions.

Run all tests (kernel tests + Python-specific tests):

cd heads/python
pytest

Run only the common test suite (kernel tests):

pytest src/test/python/test_suite_runner.py

Run generation tests (tests generated code correctness):

pytest ../../dist/python/hydra-kernel/src/test/python/generation/

Run a specific test file:

pytest src/test/python/test_grammar.py

Generate a categorized summary report:

python src/test/python/test_summary_report.py

Regenerate tests (from the worktree root):

./bin/sync-python.sh

This wraps bin/sync.sh --hosts python --targets python, which regenerates Python code across every package (kernel modules, eval libs, coders, tests) and runs the pytest suite.

For manual regeneration using bootstrap-from-json:

cd heads/haskell
stack build hydra:exe:bootstrap-from-json
stack exec bootstrap-from-json -- --target python --include-coders --include-tests --include-gentests +RTS -K256M -A32M -RTS

Known limitations: The Python coder does not yet support all literal types (e.g., float32), so some generation tests may fail to generate. Run the generation script to see the current status.

Test directory structure:

heads/python/
├── src/test/python/              # Hand-written tests
│   ├── test_suite_runner.py      # Kernel test runner
│   ├── test_grammar.py           # Python-specific tests
│   └── ...
dist/python/hydra-kernel/
├── src/test/python/              # Generated tests
│   ├── hydra/test/               # Kernel test data (TestGroup structures)
│   └── generation/               # Generation tests (pytest test files)
│       └── hydra/test/lib/       # e.g., test_chars.py, test_lists.py

Language-specific tests

In addition to the common test suite, each implementation has language-specific tests:

Haskell-specific tests

Located in heads/haskell/src/test/haskell:

Haskell coder tests
DSL tests
Property-based tests (QuickCheck)
Integration tests

Generation tests (in dist/haskell/hydra-kernel/src/test/haskell/Generation/) verify Haskell code generation.

Java-specific tests

Located in heads/java/src/test/java:

Java coder tests
Serialization tests
Integration tests
Performance tests

Python-specific tests

Located in heads/python/src/test/python:

test_python.py - Python-specific functionality
test_json.py - JSON encoding/decoding
test_grammar.py - Grammar parsing
test_generated_code.py - Generated code validation

Run with: pytest in the packages/hydra-python/ directory

Adding new tests

To the common test suite

Create test definitions in packages/hydra-kernel/src/main/haskell/Hydra/Sources/Test/
- Add to existing test groups or create new ones
- Use the testing DSL (TestGroup, primCase, etc.)
Register tests in TestSuite.hs
- Add test group binding
- Include in appropriate parent group

Regenerate tests for all implementations via the standard sync pipeline:

# Single language (host == target)
bin/sync-haskell.sh       # also bin/sync-java.sh, bin/sync-python.sh, ...

# Full matrix (every host × every target)
bin/sync.sh --hosts all --targets all

# Triad (Haskell, Java, Python)
bin/sync-default.sh

Each invocation emits both kernel-test data and executable generation tests under dist/<lang>/hydra-kernel/src/test/.

Update Haskell generation tests in isolation (faster iteration):
```
heads/haskell/bin/update-generation-tests.sh
```
Run tests in each language to verify

See also:

DSL guide - Learn the testing DSL syntax (TestGroup, primCase, etc.)
Developer recipes - Step-by-step guides for common tasks

Language-specific tests

Add tests directly to the language-specific test directory using that language's testing framework.

Current coverage

The common test suite currently provides:

✅ List primitives: ~30 functions with multiple test cases each
✅ String primitives: ~13 functions including Unicode handling
✅ Case conversion: All major case conventions
✅ Type inference: Basic inference, algebraic types, nominal types, failures
🚧 Future expansion: More primitive functions, coders, adapters, and kernel functionality

The test suite is continuously expanding to cover more of Hydra's functionality.

Testing

Testing in Hydra

Prerequisites

Common test suite

Kernel tests vs generation tests

Test categories

1. Primitive function tests

2. Formatting tests

3. Type inference tests

Test case types

How it works

1. Test Definition (DSL)

2. Code Generation

3. Test runners

Guaranteeing parity

Test generation architecture

Setting up and running tests

Haskell

Java

Python

Language-specific tests

Haskell-specific tests

Java-specific tests

Python-specific tests

Adding new tests

To the common test suite

Language-specific tests

Current coverage

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally