Phase 2 architectural decisions + core ID types#11
Open
nnunley wants to merge 7 commits into
Open
Conversation
…ipfian vocab, query fixtures) using rapidhash::v1
…paths)
Completes ITER-0000 walking skeleton (T6-T9) atop the leit_wind_tunnel
harness:
- leit_wind_tunnel_index: index_build/{1k,10k} indexing-throughput benches
- leit_wind_tunnel_query: five execution paths (single/OR/AND/fielded +
BM25F cross-field) x {1k,10k}, index built once outside the timed region,
ExecutionWorkspace reused across iterations
- Criterion isolated to the two bench crates (dev-dependencies only);
primary crates and leit_benchmark untouched
- CI: exclude the three wind-tunnel crates from the no_std/wasm jobs
(std-only, mirroring leit_benchmark); no cargo bench step added
- harness docs: note the relationship to leit_benchmark (smoke test vs
performance lab)
… (STORY-0096) ITER-0001 dependency hygiene per the usage-site rule: the leit_wind_tunnel harness uses only rapidhash in its library surface; leit_core/leit_index/ leit_text are used solely by its #[cfg(test)] integration tests, so they move to [dev-dependencies] and no longer appear in the harness's production dependency graph. The bench crates were already correct (empty lib; all deps dev). Library build, 17 unit tests, and both bench crates verified green.
… (STORY-0112) ITER-0001: BlockId, FilterExprId, SegmentOrd, SegmentLocalDocId in leit_core, each a #[repr(transparent)] newtype over a [u8; 4] little-endian inner deriving bytemuck Pod/Zeroable. The on-disk form is the in-memory form: a &[u8] slice from an mmap'd buffer casts in place to &[Id] with no allocation or deserialization (zero-copy), stable across host endianness; ordering is numeric. bytemuck chosen over zerocopy because zerocopy's derives emit internal #[allow(non_ascii_idents)]/#[allow(non_local_definitions)] that conflict with the workspace's forbid-level Linebender lints (E0453); bytemuck is no_std and lint-clean under the same forbid set. Proven by SCENARIO-0005 (6 unit tests: value + slice + unaligned round-trip, numeric ordering, LE byte layout).
Records the design-decidable decisions for the Phase 2 segment format (DEC-01..10) with rationale, a Phase 3 forward-compatibility audit, and decision->enforcement traceability. Human-confirmed key calls: - DEC-01 segment offsets: u64 (no size cap; removes the only Phase 3 format-migration risk) - DEC-10 integrity: single footer checksum, verified in Full validation mode - DEC-06 block-aware API: public dedicated BlockCursor trait (Phase 3 WAND consumes it without a format/API break) - DEC-05 header: fixed-layout little-endian POD, absolute u64 section offsets, magic + version + format_flags, reserved stored-fields/columnar slots Decision-documentation ACs of STORY-0078/0081-0084/0090/0043-0047 are satisfied here (decided:ITER-0001); their code-enforcement ACs are deferred to ITER-0003/0004. Forward constraint recorded for ITER-0005: block-metadata schema must carry per-block max_score + doc-range for Phase 3 WAND/MaxScore.
…ORY-0112 AC-2) ITER-0001 audit corrective: SCENARIO-0005 now also exercises try_from_bytes/ try_cast_slice (Ok on well-formed, Err on malformed) per AC-2's validated-read obligation.
This was referenced May 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The Phase 2 segment-format architectural decisions (design-decidable now, without measurement) plus the core ID types they depend on.
Core ID types (
leit_core)BlockId,FilterExprId,SegmentOrd,SegmentLocalDocId— each a#[repr(transparent)]newtype over a[u8; 4]little-endian value derivingbytemuckPod/Zeroable. The on-disk form is the in-memory form: a&[u8]slice from an mmap'd buffer casts in place to&[Id]with no allocation or deserialization, stable across host endianness, viewable from any byte offset (alignment-1). Ordering is numeric. Proven by unit tests (single/slice/unaligned round-trip, numeric ordering, LE layout).bytemuck (not serde, not zerocopy) — serde isn't zero-copy on the mmap read path, and zerocopy's derives emit
#[allow]s that conflict with the workspace'sforbid-level lints; bytemuck is no_std and lint-clean under the same set.Phase 2 architectural decisions
docs/2026-05-30-phase2-architectural-decisions.mdrecords DEC-01..10 with rationale, a Phase 3 forward-compatibility audit, and decision→enforcement traceability. Headline calls:magic+version+format_flags, reserved stored-fields/columnar slotsFullvalidation mode (3 modes: HeaderOnly / Structural / Full)BlockCursortrait so a future WAND/MaxScore path consumes block metadata without a format/API breakThese are decisions of record; the code that enforces them lands in the segment-format / cursor iterations.
Also
Wind-tunnel dependency hygiene: the harness's test-only
leit_*deps moved to[dev-dependencies](used only by its#[cfg(test)]integration tests), keeping them out of its production dependency graph.