Skip to content

Java implementation#1

Merged
samjanny merged 6 commits into
mainfrom
java-impl
May 29, 2026
Merged

Java implementation#1
samjanny merged 6 commits into
mainfrom
java-impl

Conversation

@samjanny
Copy link
Copy Markdown
Owner

No description provided.

samjanny added 6 commits May 29, 2026 14:02
Set up the Maven project (JDK 21, JUnit 5) for an independent Java
implementation of Entangled v1.0, built solely from the specification at
samjanny/entangled v1.0-rc.27.

Includes:
- the full section 11 diagnostic code catalog as an enum with severity and
  pipeline stage;
- Verdict, Diagnostic, and an internal RejectException used to enforce
  first-failing-stage precedence (section 10);
- Stage 2 input checks (byte cap, strict UTF-8, no BOM);
- a strict Stage 3 JSON parser enforcing the parser limits (depth, string,
  array, object keys), duplicate-key rejection, and the section 04 integer
  grammar classification (deferring non-integer rejection to Stage 5);
- the conformance corpus checked in verbatim as test resources.

Tests cover input and parse vectors (100-115) and integer classification.
Implement RFC 8785 JSON canonicalization over the restricted Entangled input
space (section 04): object members sorted by UTF-16 code-unit comparison,
minimal string escaping, exact decimal integer serialization with no binary64
round-trip, and whitespace elimination.

Add a GitHub Actions workflow that builds and runs the conformance tests on
JDK 21.

Tests verify the section 04 canonicalization vector down to its exact 72-byte
output, large-integer exact-digit serialization above 2^53, minimal string
escaping, and raw UTF-8 emission of non-ASCII characters.
Implement the cryptographic layer from section 04 and section 05, dependency
free for byte-level control:
- strict base64url decoding (RFC 4648 section 5): URL alphabet only, no
  padding, no whitespace, exact declared length, canonical trailing bits;
- SHA-256 and SHA-512 wrappers;
- strict Ed25519 verification (RFC 8032 plus the section 05 profile): canonical
  A and R, small-order rejection for both A and R, S < L, and the cofactorless
  equation [S]B = R + [k]A, implemented over BigInteger field arithmetic;
- BIP-39 PIP derivation from K_publisher.pub with the bundled English wordlist;
- Tor v3 onion address decoding (rend-spec-v3) with checksum and version checks.

Tests anchor on corpus/keys.json (PIP and both onion addresses) and verify the
real manifest signature of vector 001 end to end through base64url, JCS, and
Ed25519, confirming a tampered message fails.
Implement closed-schema validation for all three document kinds (section 02,
section 06, section 07, section 08) and the cross-field semantic checks the
Stage 5 definition assigns:
- closed-schema discipline (unknown field, missing required, no null literal);
- reusable field validators: slug, RFC 3339 timestamp form, content path syntax,
  strict base64url length, NFC for user-visible text, control-character rules,
  byte-length caps, integer range;
- the eleven block kinds with inline content, marks, link targets, form fields,
  and the submit_form-not-in-transaction rule;
- manifest origin.not_after vs canary.issued_at bounds (E_ORIGIN_INVALID with
  reason), the state_policy submit-budget aggregate (E_SUBMIT_BUDGET, exact wire
  byte arithmetic), and manifest.updated future-skew (E_SCHEMA_FIELD_SYNTAX);
- Stage 4 kind discrimination (spec_version, kind, sig presence and values).

The integer grammar runs as a whole-document Stage 5 pre-pass before
closed-schema field checks, per section 04's requirement to validate numeric
tokens before any conversion; corpus vector 140 fixes this ordering.

Tests drive the Stage 4 and Stage 5 corpus vectors and confirm the seven accept
vectors pass schema validation cleanly.
Wire the full 10-stage pipeline (section 10) with first-failing-stage
precedence:
- Stage 6 signature verification under publisher_pubkey (manifest) or the
  authorized runtime key (content/transaction; absent key is E_SIG_INVALID_KEY);
- Stage 8 canary resolution: issued_at future-skew (E_CANARY_INVALID),
  anti-downgrade (E_CANARY_DOWNGRADE), equal-issued_at conflict
  (E_CANARY_CONFLICT), and runtime-key reuse (E_CANARY_RUNTIME_REUSE) against
  seeded publisher history, with window_position details;
- Stage 9 binding: Tor v3 origin binding (E_BIND_ORIGIN), origin.not_after
  expiry (E_ORIGIN_EXPIRED), content path binding (E_BIND_PATH), transaction
  request_hash binding (E_BIND_REQUEST_HASH), and migration verification
  (E_MIGRATION_INVALID self-pointer / cycle / successor key mismatch,
  E_MIGRATION_MISMATCH on successor pipeline failure with underlying code).

Add the conformance harness that mocks the clock to corpus clock_now, maps each
vector's context onto the pipeline, and asserts verdict, diagnostic code, and
structured details for all 62 vectors. All 62 pass.

The Stage 2 byte cap is selected by the expected document kind from the fetch
context, matching the spec requirement that the kind-specific cap is enforced
before parsing.

Add the conformance step to CI.
Describe the independence rationale, how to build and run the conformance
suite, the 62-vector status with a note on the 60-vs-62 count, the key design
choices (in-tree strict crypto, first-failing-stage precedence, the
whole-document integer-grammar pre-pass, the kind-selected byte cap), the two
spec ambiguities filed upstream with the readings chosen here, and the layout.
@samjanny samjanny merged commit a33a7c5 into main May 29, 2026
2 checks passed
@samjanny samjanny deleted the java-impl branch May 29, 2026 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant