Skip to content

tar + optimization pass#53

Merged
samcday merged 8 commits into
mainfrom
fastboop
May 1, 2026
Merged

tar + optimization pass#53
samcday merged 8 commits into
mainfrom
fastboop

Conversation

@samcday

@samcday samcday commented Apr 30, 2026

Copy link
Copy Markdown
Owner

.tar support so we can hopefully support Mobian artifacts soon.

Pipeline optimization stuff because it should be here and not fastboop.

Related to samcday/fastboop#101

samcday added 2 commits April 30, 2026 10:29
Signed-off-by: Sam Day <me@samcday.com>
Signed-off-by: Sam Day <me@samcday.com>
@samcday

samcday commented Apr 30, 2026

Copy link
Copy Markdown
Owner Author

@claude review

@claude

claude Bot commented Apr 30, 2026

Copy link
Copy Markdown

Claude finished @samcday's task in 3m 7s —— View job


Code Review

  • Gather context and read CLAUDE.md
  • Review crates/gibblox-tar/src/lib.rs (tar support)
  • Review crates/gibblox-optimizer/src/lib.rs (pipeline optimizer)
  • Review pipeline changes (gibblox-pipeline)
  • Review CLI and schema changes
  • Post final review

Overall this is clean, well-structured work. Good test coverage on the tar crate, the hint path integrates consistently across both std and web materializers, and the PAX/GNU extension handling is solid. One issue stands out as a likely bug.

Binary format compatibility break (v3 → v4)

PipelineSourceBin in bin.rs now has Tar inserted between Xz (index 3) and AndroidSparseImg (former index 4, now 5). Postcard encodes enum variants by position index.

The decode path treats v3 and v4 payloads identically:

// crates/gibblox-pipeline/src/lib.rs:89
3 | 4 => postcard::from_bytes::<PipelineSourceBin>(payload)?,

A v3-encoded AndroidSparseImg payload carries variant tag 4. When decoded against the v4 struct, tag 4 maps to Tar { entry: String, source: ..., content: ... } — mismatched field layout, so postcard will error or silently decode garbage. Same applies to Mbr and Gpt variants in v3 payloads (they shift from 5/6 to 6/7).

PIPELINE_BIN_SUPPORTED_VERSIONS includes 3, which signals the intent to remain backward-compatible with v3.

Fix options:

  • Move Tar to the end of PipelineSourceBin to keep all pre-existing variant indices stable. This is the simplest fix.
  • Or add a PipelineSourceBinV3 mirror (like the existing PipelineSourceBinV2) and decode v3 through it.
  • Or explicitly drop v3 from PIPELINE_BIN_SUPPORTED_VERSIONS and the decode match — a clean intentional break, consistent with the AGENTS.md v0 policy. Just be explicit about it.

Fix this →


Minor issues

parse_octal_field — spaces/NULs not treated as terminators (crates/gibblox-tar/src/lib.rs:712–745)

The current loop processes all bytes in the field, treating 0 | b' ' as "skip" regardless of position. The spec says the first NUL or space after the digits ends the field. In practice this permits a technically-malformed value like "7 3" to be parsed as 3 rather than 7. Unlikely to bite on real archives but diverges from the standard.

validate_index redundant condition (crates/gibblox-tar/src/lib.rs:431)

if !is_regular_entry_type(header_meta.entry_type) || header_meta.entry_type != index.entry_type

The first condition is redundant: by the time we reach this line, index.entry_type has already been validated as a regular type (line 370). If header_meta.entry_type != index.entry_type is true, the "does not match" error is the right one — the !is_regular_entry_type clause can't add new information. Could just be header_meta.entry_type != index.entry_type.

pipeline_hints_cache_key re-encodes hints on every call (crates/gibblox-web-worker/src/lib.rs)

fn pipeline_hints_cache_key(options: &OpenWebPipelineOptions) -> String {
    let encoded = match encode_pipeline_hints(hints) { ... };
    let mut hasher = BlockIdentityHasher32::new();
    hasher.write(encoded.as_slice());
    ...
}

This runs a full encode_pipeline_hints round-trip every time pipeline_endpoint_key is called. The hints don't change between calls for the same options. Low impact given wasm call frequency, but worth noting.

CLI execute mode can't consume hints

The pipeline optimize subcommand produces a .gbxhint0 file, but the default execute path (gibblox <pipeline>) has no --hints flag to supply it. You can produce but not consume hints from the CLI. Probably out of scope for this PR, but worth tracking.


samcday added 5 commits May 1, 2026 07:03
The previous 3s connect / 6s total defaults were inherited from the
smoo port and are too aggressive for callers like one-shot probes
over slow links, where there's no benefit to fast-fail + retry.

Signed-off-by: Sam Day <me@samcday.com>
Adds optional connect_timeout / request_timeout fields (with
with_connect_timeout / with_request_timeout builders) so callers can
override the library defaults per-reader. Defaults are unchanged and
exported as DEFAULT_CONNECT_TIMEOUT / DEFAULT_REQUEST_TIMEOUT.

Wasm targets ignore the values (fetch has no equivalent reqwest hook).

Signed-off-by: Sam Day <me@samcday.com>
Signed-off-by: Sam Day <me@samcday.com>
Signed-off-by: Sam Day <me@samcday.com>
Signed-off-by: Sam Day <me@samcday.com>
@samcday

samcday commented Apr 30, 2026

Copy link
Copy Markdown
Owner Author

@claude another review please. I opted to break schema compatibility again in v0.0.2, in order to reset everything back to a sane+pristine v0 state. The only consumer of gibblox, fastboop, has not yet had a tagged release, so this looks scary but I promise you it ain't

@claude

claude Bot commented Apr 30, 2026

Copy link
Copy Markdown

Claude finished @samcday's task in 3m 33s —— View job


Code Review

  • Read CLAUDE.md and gather context
  • Review crates/gibblox-pipeline/src/bin.rs (compat handling)
  • Review crates/gibblox-tar/src/lib.rs
  • Review crates/gibblox-optimizer/src/lib.rs
  • Review pipeline, CLI, and schema changes
  • Post final review

The intentional v0 break is the right call given the pre-release context, and the implementation is clean. Both of the action items from the first review are fixed. The remaining issues are minor.

Binary format compatibility — resolved cleanly

PIPELINE_BIN_SUPPORTED_VERSIONS is now &[PIPELINE_BIN_FORMAT_VERSION] (v0 only), PIPELINE_BIN_FORMAT_VERSION is reset to 0, and the magic is GBXPIPE0. The decode path is a single postcard::from_bytes call with no migration branches. The code comment at bin.rs:19 documents the instability policy explicitly. Nothing more needed here.

CLI hints consumption — resolved

run_default_pipeline_execute now accepts hints_path: Option<&str> and the top-level Cli struct has --hints. The stdin conflict guard at main.rs:209 is a nice touch.


Minor issues (carried from previous review, still present)

validate_index redundant condition (crates/gibblox-tar/src/lib.rs:431)

if !is_regular_entry_type(header_meta.entry_type) || header_meta.entry_type != index.entry_type

index.entry_type is already validated as regular at line 370, so if header_meta.entry_type is non-regular then != index.entry_type is necessarily true. The first clause adds nothing. Just header_meta.entry_type != index.entry_type would be sufficient and clearer.

Fix this →

parse_octal_field doesn't stop on space/NUL after digits (crates/gibblox-tar/src/lib.rs:712–744)

When saw_digit is true and a 0 | b' ' byte is encountered, the code continues — which means digits appearing after the terminator would still be accumulated. Real tar tools write only spaces/NULs after the value, so this is harmless against well-formed archives. Noting it again in case a fuzzer ever lands here.


New observation

validate_index ignores PAX metadata when re-reading the header (crates/gibblox-tar/src/lib.rs:430)

let header_meta = parse_header(&header, &PendingMetadata::default())?;

locate_entry accumulates PAX metadata (including a size override) into PendingMetadata before calling parse_header. But validate_index calls parse_header with PendingMetadata::default(), so it uses the raw octal/base-256 size from the header block, not any PAX-extended size.

For entries whose PAX header specifically overrides size (e.g. files approaching or exceeding the ~8 GiB octal-field limit), the size comparison at line 438 would fail with a false mismatch:

if header_meta.size_bytes != index.size_bytes {  // uses raw header size vs PAX-extended size

In practice this is only reachable if:

  1. An entry was opened initially via locate_entry (which read the PAX size), producing an index
  2. That index is later passed to open_with_index
  3. The entry is large enough that PAX encoded a different size than the raw header field

OS images inside tarballs at that scale are unusual but not impossible (consider a 16 GiB rootfs.img). Worth a note in case Mobian artifacts ever hit this.

Fix this →


Everything else

The optimizer is clean. visited deduplication, insert_pipeline_hint type-keyed dedup, and the Box::pin recursive traversal all look correct. Schema changes are well-structured. The hints encode/decode round-trip with sort+validate is solid. The tar integration in both std and web materializers is consistent: both check tar_entry_index_hint before falling back to a full scan.

Signed-off-by: Sam Day <me@samcday.com>
@samcday samcday merged commit 2fd996d into main May 1, 2026
2 checks passed
@samcday samcday deleted the fastboop branch May 1, 2026 03:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant