Skip to content

Fix critical data corruption bugs and improve robustness#7

Open
peatey wants to merge 7 commits into
mainfrom
claude/fix-bingen-issues-Wu5m7
Open

Fix critical data corruption bugs and improve robustness#7
peatey wants to merge 7 commits into
mainfrom
claude/fix-bingen-issues-Wu5m7

Conversation

@peatey
Copy link
Copy Markdown

@peatey peatey commented Apr 28, 2026

Summary

This PR fixes multiple critical bugs that caused silent data corruption, improves input validation, and enhances code generation robustness. The changes address issues in binary serialization, string interning, code generation, and error handling.

Key Changes

Critical Data Corruption Fixes

  • Int/Uint encoding: Changed from 32-bit to 64-bit encoding to prevent silent truncation of values outside the int32/uint32 range. Added overflow detection with proper error reporting.
  • String length encoding: Upgraded from uint16 to uint32 to support strings larger than 64 KiB without silent truncation.
  • LRU string bank eviction: Fixed bug where LoadOrStore deleted by value instead of key, causing the cache to grow unbounded. The eviction now correctly uses the map key.
  • String bank semantics: Fixed NoOpStringBank.LoadOrStore to return false (not loaded) instead of true, matching the contract that nothing is cached.

Input Validation & Security

  • Slice/map length validation: Added bounds checking in unmarshalling code to reject negative or oversized length prefixes that could cause panics or huge allocations.
  • String table validation: Added validation in NewSliceStringTableReaderFrom and NewFileStringTableReaderFrom to reject invalid length values.
  • Code injection prevention: Added validation in ToDefaultValue and ToRawDefault to parse user-supplied default values as Go expressions, preventing arbitrary code injection at go generate time.
  • CLI input validation: Added checks for package names containing path separators or .. traversal, and version range validation (0-255).
  • Annotation validation: Added detection of duplicate @bingen annotations and mismatched brackets in annotation syntax.

Code Generation Improvements

  • Deterministic output: Sorted imports, package keys, and file keys during type collection and annotation loading to ensure byte-identical output across runs.
  • Error handling: Wrapped Generate() to catch panics and return proper errors instead of crashing the process. Improved error messages throughout code generation.
  • Type resolution: Added panic for unsupported AST expressions instead of silently falling back to bool codec.
  • Embedded field handling: Fixed potential panic when building error messages for embedded fields with no names.

Buffer & Utility Improvements

  • Buffer error tracking: Added Err() method to surface errors from io.Reader drain operations that previously silently failed.
  • String reading bounds: Added allocation bounds checking in ReadString to prevent crafted payloads from triggering huge allocations.
  • Buffer pool optimization: Changed to store *[]byte instead of []byte to avoid interface allocation overhead on every Put() call.

Testing & Documentation

  • Added comprehensive test coverage for buffer round-trips across full int64/uint64 ranges
  • Added tests for large string handling and rejection of malicious payloads
  • Added fuzzing for buffer read operations
  • Added tests for code injection prevention in default values
  • Added test for LRU eviction correctness
  • Added test for goroutine cleanup in string bank updates
  • Added determinism test for code generation
  • Updated README with breaking changes documentation for v0.2 wire format

Notable Implementation Details

  • The int/uint encoding change is a breaking wire format change (v0.1 → v0.2)
  • String length validation uses buffer.Remaining() to prevent allocation attacks
  • Default value validation uses Go's parser.ParseExpr to ensure only valid expressions are accepted
  • LRU bank now stores the key in lruEntry to fix the eviction bug
  • Code generation now properly propagates errors instead of panicking

https://claude.ai/code/session_01N7NVPfVgS2JE9p149S44W1

claude added 3 commits April 27, 2026 19:26
Several wire-format choices in v0.1 silently corrupted data or opened DoS
attack surface. Bump to v0.2 (breaking) and tighten:

- int / uint encode as 8 bytes (previously truncated to 32 bits on the wire,
  silently dropping values outside int32 range on 64-bit hosts).
- WriteString length prefix is now uint32 (previously uint16, with strings
  > 64 KiB silently truncated).
- ReadString / ReadBytes / generated slice and map readers / string-table
  reader now bound the decoded length against Buffer.Remaining() before
  allocating, so a crafted blob with len = MaxInt32 returns an error or
  empty value rather than triggering a huge allocation.
- Default value rendering for @bingen:field[default=...] is validated:
  string defaults are strconv.Quote'd, bool defaults are restricted to
  true/false, numeric and non-basic-type defaults must parse as Go
  expressions. This closes a code-injection path through user-supplied
  annotations.
- Buffer.Bytes() no longer prints to os.Stderr on drain failure; errors
  surface via the new Buffer.Err() accessor.
- bufferpool now stores *[]byte to avoid sync.Pool's per-Put interface
  allocation on a slice value.

Adds buffer round-trip / negative tests, a fuzz target on the read paths,
and tests for the default-value validators. Test codecs are regenerated.

https://claude.ai/code/session_01N7NVPfVgS2JE9p149S44W1
…ned parsing

The generator and annotation parser had several places where bad input
caused process-wide panics, silent fall-throughs, or non-deterministic
output. This commit replaces panics with errors, sorts iteration order so
codec output is byte-identical across runs, and tightens validation:

- toGenType returns a clear error (panic with explanation) for unsupported
  AST expressions instead of falling through to the bool codec, which
  produced a compiling but data-corrupting codec for func/chan/generic
  fields.
- LoadTypes returns errors instead of panicking on parse failure, and
  sorts package and file iteration so generation is deterministic.
- Generate returns an error and recovers panics from template rendering
  so callers see a normal failure rather than a process crash.
- cmd/bingen validates -version (must fit in uint8), rejects -package
  values containing path separators or "..", checks the os.Remove error
  on the existing codec file, and propagates Generate's error.
- @bingen:define inputs are validated: name must be a Go identifier and
  type must parse as a Go expression before being spliced into a parser
  input.
- The annotation parser rejects mismatched square brackets instead of
  silently folding the bracketed remainder into the command name.
- Duplicate (command, target) annotations within a version set return an
  error instead of being silently overwritten.
- Embedded fields (no Names) no longer trip an index-out-of-range panic
  while building an already-failing parse error.

Adds a determinism test that re-runs the generator on the same input and
asserts byte-identical output.

https://claude.ai/code/session_01N7NVPfVgS2JE9p149S44W1
…leak

Three related issues in the string interner:

- The default StringBank was unbounded, so a workload that decoded many
  unique strings grew memory without limit. Switch the default to a
  bounded LRU bank (100k entries, 30s eviction interval).
- The LRU bank evicted by `value` rather than `key`. LoadOrStoreFunc
  uses value == key, so it appeared to work in tests, but LoadOrStore
  with key != value silently no-op'd in eviction and let the map grow
  past capacity. Track the key on each entry and delete by key.
- Stop() was defined on lruStringBank but not on the StringBank
  interface, so UpdateStringBank had no way to stop the previous
  bank's background goroutine. Every swap leaked one goroutine.
  Promote Close() error onto the interface (matching io.Closer) and
  call it on swap.

Adds tests covering the eviction map-size invariant and a goroutine-leak
check around UpdateStringBank.

https://claude.ai/code/session_01N7NVPfVgS2JE9p149S44W1
Copilot AI review requested due to automatic review settings April 28, 2026 00:13
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates bingen’s wire format and runtime utilities to prevent silent truncation/corruption during binary encode/decode, strengthens validation against hostile payloads, and makes code generation deterministic and more robust (returning errors instead of crashing).

Changes:

  • Bump wire encoding for int/uint to 64-bit and string length prefixes to uint32, adding overflow/length validation.
  • Make generator output deterministic (sorted imports/packages/files/sets) and harden generation (panic recovery, better error propagation, default-value expression validation).
  • Improve string interning robustness (bounded LRU + Close semantics), buffer pooling, and add tests (determinism, fuzzing, eviction, round-trips) + README breaking-change notes.

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/opencost/opencost_codecs.go Adds length-prefix validation to generated codecs used by tests
tests/generator_test.go Propagates generator errors instead of discarding them
tests/determinism_test.go New test asserting byte-identical generator output across runs
tests/containerv2/containerv2_codecs.go Adds slice/string-table length-prefix validation in test codecs
tests/container/container_codecs.go Adds slice/string-table length-prefix validation in test codecs
tests/aliases/aliases_codecs.go Import ordering tweak + adds slice/map/string-table validation in test codecs
pkg/util/stringutil/stringutil.go Default string bank becomes bounded LRU; adds Close to StringBank and closes old bank on swap
pkg/util/stringutil/noopbank.go Fixes no-op bank semantics (always “not loaded”); adds Close
pkg/util/stringutil/mapbank.go Adds Close implementation (clears map)
pkg/util/stringutil/lrubank_test.go New tests for LRU eviction correctness + goroutine cleanup on UpdateStringBank
pkg/util/stringutil/lrubank.go Fixes eviction-by-key bug; implements Close; stores key in entries
pkg/util/bufferpool.go Stores pooled buffers as *[]byte to reduce interface boxing overhead
pkg/util/bufferhelper.go Switches int/uint read/write helpers to 64-bit encoding with overflow detection
pkg/util/buffer_test.go New round-trip, large-string, rejection, and fuzz tests for Buffer
pkg/util/buffer.go Updates string encoding to uint32; adds Err() and Remaining(); adds some bounds checks
pkg/types/types.go Deterministic imports and type collection; panic on unsupported AST expr; better error handling; fix embedded-field error message
pkg/types/helper.go Validates @bingen:define names/types to prevent splicing arbitrary text into generated Go
pkg/meta/annotationset.go Detects duplicate annotations; deterministic annotation loading order
pkg/meta/annotation.go Rejects malformed bracket syntax instead of misparsing annotation commands
pkg/generator/templates_test.go Tests default-value injection prevention and accepted defaults
pkg/generator/templates/go/unmarshaller.go.tmpl Adds slice/map length-prefix checks + uses ToRawDefault for non-basic defaults
pkg/generator/templates/go/support.go.tmpl Adds string table length validation
pkg/generator/templates/go/streamer.go.tmpl Adds streaming slice/map length-prefix checks + uses ToRawDefault for non-basic defaults
pkg/generator/templates.go Validates defaults via Go parser; quotes strings safely; adds ToRawDefault
pkg/generator/bingen.go Generate now returns errors and recovers panics; improves error messages
cmd/bingen/bingen.go Validates CLI inputs (package path traversal, version range) and propagates generator errors
README.md Documents v0.2 breaking wire-format changes
Comments suppressed due to low confidence (2)

pkg/util/buffer.go:372

  • In reader-mode (bw == nil), ReadString allocates a buffer of size l with bytePool.Get(int(l)) without any upper bound. A crafted payload can advertise a multi-GB length prefix and trigger huge allocations/OOM. Consider adding a hard maximum string length (configurable), and/or recording an error in the Buffer and returning "" when l exceeds that bound (similar to the bw-mode check).
	readBuffUint32(b.b, &l)

	bytes := bytePool.Get(int(l))
	defer bytePool.Put(bytes)

pkg/generator/templates/go/support.go.tmpl:459

  • NewFileStringTableReaderFrom validates only tl < 0 and then immediately allocates refs := make([]fileStringRef, tl). A crafted payload can set tl to a huge positive value and trigger an OOM. Consider adding an upper bound (e.g., max entries cap and/or a byte-buffer-only Remaining()-based sanity check) before allocating the refs slice, and avoid panicking on attacker-controlled input if possible.
	// table length
	tl := buffer.ReadInt()
	if tl < 0 {
		panic(fmt.Errorf("%s: invalid string table length: %d", GeneratorPackageName, tl))
	}

	var refs []fileStringRef
	if tl > 0 {
		refs = make([]fileStringRef, tl)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/util/buffer.go Outdated
Comment on lines +404 to +413
// Remaining returns the number of unread bytes available in the buffer. For
// reader-mode buffers this returns the bufio.Reader's currently-buffered byte
// count, which is a lower bound on what's available without performing more
// I/O. Callers should treat the result as a sanity bound for length-prefix
// validation, not an exact remaining size.
func (b *Buffer) Remaining() int {
if b.bw != nil {
return b.bw.Len()
}
return b.b.Buffered()
Comment on lines +91 to 95
{{ $lenVar }} := buff.ReadInt() // slice len
if {{ $lenVar }} < 0 || {{ $lenVar }} > buff.Remaining() {
{{ .Context.HandleError (printf "fmt.Errorf(\"bingen: invalid slice length %%d (remaining=%%d)\", %s, buff.Remaining())" $lenVar) }}
}
{{ $sliceVar }} := make({{ .Type.Name }}, {{ $lenVar }})
Comment on lines +116 to 120
{{ $lenVar }} := buff.ReadInt() // map len
if {{ $lenVar }} < 0 || {{ $lenVar }} > buff.Remaining() {
{{ .Context.HandleError (printf "fmt.Errorf(\"bingen: invalid map length %%d (remaining=%%d)\", %s, buff.Remaining())" $lenVar) }}
}
{{ $mapVar }} := make(map[{{ .Type.KeyType.TypeName }}]{{ .Type.ValueType.TypeName }}, {{ $lenVar }})
Comment on lines +42 to +45
{{ $lenVar }} := buff.ReadInt() // slice len
if {{ $lenVar }} < 0 || {{ $lenVar }} > buff.Remaining() {
{{ .Context.HandleError (printf "fmt.Errorf(\"bingen: invalid slice length %%d (remaining=%%d)\", %s, buff.Remaining())" $lenVar) }}
}
Comment on lines +65 to +68
{{ $lenVar }} := buff.ReadInt() // map len
if {{ $lenVar }} < 0 || {{ $lenVar }} > buff.Remaining() {
{{ .Context.HandleError (printf "fmt.Errorf(\"bingen: invalid map length %%d (remaining=%%d)\", %s, buff.Remaining())" $lenVar) }}
}
Comment on lines 368 to +371
tl := buffer.ReadInt()
if tl < 0 || tl > buffer.Remaining() {
panic(fmt.Errorf("%s: invalid string table length: %d (remaining=%d)", GeneratorPackageName, tl, buffer.Remaining()))
}
… buffers

Copilot's review on PR #7 surfaced that the new length-prefix sanity
checks could falsely reject valid payloads when decoding from an
io.Reader. Buffer.Remaining() was implemented as bufio.Reader.Buffered(),
which is only the currently-buffered byte count — typically 4096 — not
the true remaining payload. A legitimate slice/map/string-table larger
than that buffer would be rejected with "invalid length" depending on
incidental bufio buffering.

- Buffer.Remaining() now returns -1 in reader-mode to signal "unknown".
  Byte-buffer mode still returns the unread byte count.
- Generated unmarshaller / streamer / string-table-reader checks now
  split the validation: a separate negative-length check always fires;
  the "exceeds remaining bytes" upper bound only fires when Remaining()
  returns a non-negative value (byte-buffer mode). In reader-mode the
  underlying read fails naturally if the stream is short.
- Adds tests covering the Remaining() contract in both modes and a
  generator-level round-trip of a slice larger than bufio.Reader's
  default buffer through UnmarshalBinaryFromReader.

https://claude.ai/code/session_01N7NVPfVgS2JE9p149S44W1
@peatey
Copy link
Copy Markdown
Author

peatey commented Apr 28, 2026

Looks good — covers most of the original review. Two things from Copilot's review that I'd want addressed before merge:

Reader-mode ReadString (buffer.go:372) and NewFileStringTableReaderFrom (support.go.tmpl) still allocate based on attacker-controlled length with no hard cap. After the v0.2 length prefix change, ReadString can be asked to allocate ~4 GB. Add a MaxStringLength / MaxStringTableEntries constant checked unconditionally before make.
The runtime string-table errors are still panics. Generate() now recovers panics, but those are generator panics — a bad input file at runtime still crashes the host process. Either fix here or file a follow-up explicitly.

H2 (sticky errors threaded through generated unmarshallers) is genuinely structural and worth its own PR, just want to make sure it doesn't get marked as done by association with Buffer.Err().

Addresses peatey's review on PR #7. Two related issues remained after the
v0.2 wire-format hardening landed:

(1) ReadString reader-mode and the file-backed string-table reader allocated
based on the attacker-controlled length prefix with no hard cap. With v0.2's
uint32 string length, ReadString could be asked to allocate ~4 GiB.

- Adds MaxStringLength (64 MiB) in pkg/util/buffer.go, checked
  unconditionally in both byte-buffer and reader-mode before any allocation.
  Rejection records ErrStringTooLarge on Buffer.Err() and returns "".
- Adds MaxStringTableEntries (1<<20) in the support template, checked in
  both NewSliceStringTableReaderFrom and NewFileStringTableReaderFrom
  before the make([]string, tl) / make([]fileStringRef, tl) call.

(2) The string-table readers panicked on malformed input. Generate()'s
panic recovery doesn't apply at runtime — a hostile input file decoded by
a long-running host process would crash it.

- NewSliceStringTableReaderFrom and NewFileStringTableReaderFrom now
  return (StringTableReader, error). The file-backed reader cleans up its
  temp file on every error path.
- NewDecodingContextFromBytes / NewDecodingContextFromReader return
  (*DecodingContext, error).
- The generated UnmarshalBinary / UnmarshalBinaryFromReader and
  New<Type>Stream functions propagate the error.
- StreamFactoryFunc signature widens to (BingenStream, error) and
  NewStreamFor passes it through.

Also adds a TODO(H2) on Buffer.Err() noting that full sticky-error threading
through every Read*/Write* and the generated UnmarshalBinaryWithContext
epilogue is deferred to its own PR.

Tests:
- TestReadStringRejectsHugeLengthReaderMode (cap fires in reader mode).
- TestReadStringAtCap (boundary value succeeds).
- TestUnmarshalRejectsOversizedStringTable (full UnmarshalBinary
  round-trip on a hand-crafted oversized string-table header returns a
  normal error rather than panicking).

https://claude.ai/code/session_01N7NVPfVgS2JE9p149S44W1
Copilot AI review requested due to automatic review settings April 28, 2026 13:34
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens bingen’s wire format and generation pipeline to prevent silent truncation/corruption and to make decoding/codegen more robust against malformed or hostile inputs.

Changes:

  • Updates the wire format (v0.2) to encode int/uint as 8 bytes and string lengths as uint32, with added bounds checks during decode.
  • Improves generator robustness and determinism (sorted imports/type collection; panic-to-error conversion; safer default-value handling).
  • Reworks string interning to a bounded LRU with lifecycle management (Close()), and adds new tests/fuzzing around buffer and codegen behavior.

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/string_table_caps_test.go Adds regression test for rejecting oversized string-table headers.
tests/reader_mode_test.go Validates reader-mode unmarshalling for large slices.
tests/opencost/opencost_codecs.go Updates generated codecs with new string-table validation, error-returning context/stream constructors, and length checks.
tests/generator_test.go Updates generator test helper for Generate(...) error signature.
tests/determinism_test.go Adds determinism test ensuring byte-identical generated output across runs.
tests/containerv2/containerv2_codecs.go Updates generated v2 container codecs for new decode/context/stream behavior and validations.
tests/container/container_codecs.go Updates generated container codecs similarly (length validation + error-returning context/streams).
tests/aliases/aliases_codecs.go Updates generated alias codecs (imports, context construction errors, length validation, stream factory signature).
pkg/util/stringutil/stringutil.go Makes default string bank a bounded LRU and closes previous bank on update.
pkg/util/stringutil/noopbank.go Fixes no-op bank LoadOrStore* semantics to report “not loaded”; adds Close().
pkg/util/stringutil/mapbank.go Adds Close() implementation for map-backed bank.
pkg/util/stringutil/lrubank_test.go Adds tests for correct LRU eviction and goroutine cleanup on bank swap.
pkg/util/stringutil/lrubank.go Fixes eviction to delete by key; adds Close(); stores key in entries.
pkg/util/bufferpool.go Changes pool storage to *[]byte to reduce interface overhead (plus corresponding Get/Put changes).
pkg/util/bufferhelper.go Switches int/uint encoding to 64-bit and adds host-width overflow detection.
pkg/util/buffer_test.go Adds round-trip tests for int/uint/string, Remaining() semantics, oversized string rejection, and fuzz target.
pkg/util/buffer.go Adds Err() tracking, Remaining() contract, uint32 string lengths with caps, and safer ReadBytes behavior.
pkg/types/types.go Sorts imports/type discovery for determinism; replaces silent unsupported-AST fallback with a panic; improves error handling.
pkg/types/helper.go Validates @bingen:define names/types to prevent injection via generated source reparse.
pkg/meta/annotationset.go Adds duplicate-annotation detection and deterministic iteration order when loading annotations.
pkg/meta/annotation.go Rejects mismatched bracket syntax in @bingen directives.
pkg/generator/templates_test.go Adds tests for default-value validation (code-injection prevention).
pkg/generator/templates/go/unmarshaller.go.tmpl Adds slice/map length validation and routes non-basic defaults through ToRawDefault; propagates decoding-context init errors.
pkg/generator/templates/go/support.go.tmpl Adds MaxStringTableEntries cap and makes string-table/context creation return errors; updates stream factory signature.
pkg/generator/templates/go/streamer.go.tmpl Adds streaming slice/map length checks and makes stream constructors return errors.
pkg/generator/templates.go Validates user-supplied defaults as Go expressions and introduces ToRawDefault.
pkg/generator/bingen.go Changes Generate to return errors and recovers panics into errors; improves error context.
cmd/bingen/bingen.go Adds CLI input validation and propagates generator errors instead of panicking/exiting silently.
README.md Documents v0.2 breaking wire-format changes.
Comments suppressed due to low confidence (1)

pkg/util/buffer.go:410

  • In reader-mode, if readBuffFull fails while reading the string payload, ReadString currently returns "" but does not set Buffer.err. This makes truncated streams indistinguishable from legitimate empty strings for callers. Record the read error (or io.ErrUnexpectedEOF) in b.err before returning so callers can detect corruption via Err().
	bytes := bytePool.Get(int(l))
	defer bytePool.Put(bytes)

	_, err := readBuffFull(b.b, bytes)
	if err != nil {
		return ""
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/util/buffer.go
Comment on lines +388 to +392
return ""
}
if uint64(l) > uint64(b.bw.Len()) {
return ""
}
Comment on lines +386 to +390
return nil, fmt.Errorf("%s: string table length %d exceeds MaxStringTableEntries %d", GeneratorPackageName, tl, MaxStringTableEntries)
}
if rem := buffer.Remaining(); rem >= 0 && tl > rem {
return nil, fmt.Errorf("%s: string table length %d exceeds remaining bytes %d", GeneratorPackageName, tl, rem)
}
Comment thread pkg/util/bufferpool.go Outdated
Comment on lines +69 to +72

i := putIndex(capacity)
bp.pools[i].Put(buf[:cap(buf)])
full := buf[:capacity]
bp.pools[i].Put(&full)
Comment thread tests/string_table_caps_test.go Outdated
// table header advertises a length above MaxStringTableEntries and verifies
// the unmarshal fails with a normal error rather than panicking the host.
//
// This exercises the path peatey called out: NewSliceStringTableReaderFrom
…und; revert pool indirection

Addresses four follow-up review comments on PR #7.

- Buffer.ReadString and Buffer.ReadBytes now record an error on Buffer.Err()
  on every silent rejection path: remaining-bytes shortfall in byte-buffer
  mode (io.ErrUnexpectedEOF), readBuffFull failure in reader-mode, and the
  existing MaxStringLength cap. Truncated/corrupt payloads previously turned
  into legitimate-looking empty strings; callers checking Err() can now
  distinguish.

- support.go.tmpl string-table check: change `tl > rem` (entries vs bytes,
  unit mismatch) to `tl > rem/4`. Each table entry has at least a 4-byte
  uint32 length prefix, so an honest table of tl entries needs >= tl*4
  bytes. The previous bound was 4x looser than reality.

- bufferpool.go: revert `*[]byte` storage back to `[]byte`. The earlier
  change took the address of a local in Put (`&full`), which escapes and
  allocates a new slice header on every Put — defeating the goal of
  avoiding the boxing. Plain []byte storage is simpler and equivalent in
  allocation count, since we have no way to recycle the original *[]byte.

- tests/string_table_caps_test.go: replace a personal-name reference in
  the doc comment with a generic description of the behavior.

Adds TestReadStringTruncatedByteBuffer and TestReadStringTruncatedReaderMode
verifying Err() is populated on truncated payloads in both modes.

https://claude.ai/code/session_01N7NVPfVgS2JE9p149S44W1
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens bingen’s wire format and decoding/generation pipeline to prevent silent data corruption, reduce memory/DoS risk from attacker-controlled inputs, and improve generator determinism and error handling.

Changes:

  • Updates wire encoding/decoding (64-bit int/uint, uint32 string lengths) and adds defensive bounds checks in generated unmarshallers and string table readers.
  • Improves robustness and determinism in the generator (sorted iteration, safer default value handling, panic-to-error conversion).
  • Reworks string interning to a bounded LRU with proper lifecycle management, plus adds/extends tests (incl. determinism and fuzz coverage).

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/string_table_caps_test.go Adds a regression test ensuring oversized string tables are rejected with an error.
tests/reader_mode_test.go Adds a regression test covering large slice decode in reader-mode buffers.
tests/opencost/opencost_codecs.go Updates generated codecs to new decoding context + string table validation + stream constructor errors.
tests/generator_test.go Updates generator test helper for new Generate(...) error signature.
tests/determinism_test.go Adds a determinism test for generator output stability across runs.
tests/containerv2/containerv2_codecs.go Updates generated containerv2 codecs for new string-table/error semantics.
tests/container/container_codecs.go Updates generated container codecs for new string-table/error semantics.
tests/aliases/aliases_codecs.go Updates generated alias codecs for new stream/context/string-table behaviors.
pkg/util/stringutil/stringutil.go Switches default interning to bounded LRU and closes prior bank on swap.
pkg/util/stringutil/noopbank.go Fixes no-op bank LoadOrStore semantics and implements Close().
pkg/util/stringutil/mapbank.go Adds Close() implementation to map-backed bank.
pkg/util/stringutil/lrubank_test.go Adds tests for correct LRU eviction keying and goroutine cleanup on bank swaps.
pkg/util/stringutil/lrubank.go Fixes LRU eviction key bug and adds Close() to stop background eviction goroutine.
pkg/util/bufferpool.go Updates buffer pool commentary regarding []byte storage strategy.
pkg/util/bufferhelper.go Changes int/uint encoding helpers to 64-bit and adds overflow detection.
pkg/util/buffer_test.go Adds extensive tests for Remaining(), int/uint round-trips, string caps, truncation, and fuzzing.
pkg/util/buffer.go Adds Remaining() semantics, Err() tracking, uint32 string lengths, and string allocation caps.
pkg/types/types.go Sorts imports / deterministic collection and fails loudly on unsupported AST expressions.
pkg/types/helper.go Validates @bingen:define name/type as Go syntax to prevent injection.
pkg/meta/annotationset.go Adds duplicate-annotation detection and deterministic annotation loading order.
pkg/meta/annotation.go Rejects mismatched bracket syntax in annotation parsing.
pkg/generator/templates_test.go Adds tests for default-value injection prevention and validation behavior.
pkg/generator/templates/go/unmarshaller.go.tmpl Adds slice/map length validation and routes defaults through safe parsing/escaping.
pkg/generator/templates/go/support.go.tmpl Updates support template for MaxStringTableEntries, stream factories returning errors, and safe string tables.
pkg/generator/templates/go/streamer.go.tmpl Adds length checks in streaming readers and stream constructors returning errors.
pkg/generator/templates.go Validates defaults as Go expressions (and quotes strings) to prevent code injection.
pkg/generator/bingen.go Changes Generate to return errors, adds panic recovery, and improves error propagation.
cmd/bingen/bingen.go Adds CLI input validation, improves error handling, and uses generator errors.
README.md Documents v0.2 wire-format breaking changes and validation behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/util/buffer_test.go
Comment on lines +42 to +52
values := []int{
0,
1,
-1,
math.MaxInt32,
math.MinInt32,
math.MaxInt32 + 1,
math.MinInt32 - 1,
math.MaxInt64,
math.MinInt64,
}
Comment thread pkg/util/buffer_test.go
Comment on lines +67 to +73
values := []uint{
0,
1,
math.MaxUint32,
math.MaxUint32 + 1,
math.MaxUint64,
}
Comment on lines 97 to 101
// we trust the underlying read to fail if the stream is short.
if rem := buff.Remaining(); rem >= 0 && {{ $lenVar }} > rem {
{{ .Context.HandleError (printf "fmt.Errorf(\"bingen: slice length %%d exceeds remaining bytes %%d\", %s, rem)" $lenVar) }}
}
{{ $sliceVar }} := make({{ .Type.Name }}, {{ $lenVar }})
Comment on lines 127 to 131
// the buffer is byte-backed and the value is meaningful.
if rem := buff.Remaining(); rem >= 0 && {{ $lenVar }} > rem {
{{ .Context.HandleError (printf "fmt.Errorf(\"bingen: map length %%d exceeds remaining bytes %%d\", %s, rem)" $lenVar) }}
}
{{ $mapVar }} := make(map[{{ .Type.KeyType.TypeName }}]{{ .Type.ValueType.TypeName }}, {{ $lenVar }})
Comment thread pkg/util/bufferhelper.go
*data = int(int32(order.Uint32(bs)))
v := int64(order.Uint64(bs))
if int64(int(v)) != v {
return fmt.Errorf("bingen: int value %d overflows host int width", v)
Comment thread pkg/util/bufferhelper.go
*data = uint(order.Uint32(bs))
v := order.Uint64(bs)
if uint64(uint(v)) != v {
return fmt.Errorf("bingen: uint value %d overflows host uint width", v)
Comment thread pkg/util/buffer.go Outdated
if b.err == nil {
b.err = fmt.Errorf("ReadBytes: %w", err)
}
return bytes
… test compat

Addresses seven follow-up review comments on PR #7.

- pkg/generator/templates/go/{unmarshaller,streamer}.go.tmpl: add an
  unconditional MaxContainerLength cap (1 << 24) on slice and map length
  prefixes. Without it, a payload decoded in reader-mode (where Remaining()
  returns -1 and the upper-bound check is skipped) could advertise a huge
  length and force make() to OOM before any read failed.

- pkg/util/buffer.go: Buffer.ReadInt and Buffer.ReadUInt now record the
  overflow error returned by readInt/readBuffInt/readUint/readBuffUint on
  Buffer.Err() instead of dropping it, so a 32-bit host decoding a value
  above MaxInt32 surfaces a real error instead of silently returning zero.

- pkg/util/buffer.go: Buffer.ReadBytes reader-mode now returns nil on read
  failure to match the docstring (was returning the partially-filled
  allocation).

- pkg/util/buffer_test.go: gate the over-int32/over-uint32 round-trip
  values on strconv.IntSize and construct them via int64/uint64 typed
  locals, so the file compiles on 32-bit platforms (where math.MaxInt64
  and math.MaxUint64 aren't representable as int/uint constants).

- pkg/util/buffer_test.go: add TestReadIntOverflowSetsErr (skipped on
  64-bit hosts, asserting Err() is set on a 32-bit overflow decode).

- tests/string_table_caps_test.go: add TestUnmarshalRejectsOversizedSlice
  exercising the reader-mode slice cap end-to-end through
  UnmarshalBinaryFromReader.

Verified clean build under GOARCH=386.

https://claude.ai/code/session_01N7NVPfVgS2JE9p149S44W1
@mbolt35
Copy link
Copy Markdown
Contributor

mbolt35 commented May 20, 2026

Hey @peatey, There are some changes in here that would break all compatibility with existing kubecost customers, as well as any previously stored versions of bingen file formats. The AI model at-will changed whatever it thought was better without taking any backwards compatibility into account. If there is anything of value in here, it's completely shrouded by the fact that breaking the existing format is a non-starter.

This format is also not really a wire format, despite the AI continuing to refer to it that way. I largely disagree with this wide net casting the AI without the highly crucial context of how it is leveraged throughout multiple projects. It's proposed "fixes" for a lot of these "critical issues" assume a hallucinated/made-up circumstance which isn't applicable in basic use-cases.

Int/Uint encoding: Changed from 32-bit to 64-bit encoding to prevent silent truncation of values outside the int32/uint32 range. Added overflow detection with proper error reporting.

Caution

This is a breaking change, and uint/ints have always been treated as 32-bit within the bingen format. You can't just make this change without breaking the format completely.

String length encoding: Upgraded from uint16 to uint32 to support strings larger than 64 KiB without silent truncation.

Caution

This is a breaking change, you can't just change the data type within the format without breaking all previously stored data. This is a design choice to silently truncate strings greater than max ushort. This format was not meant to store giant strings, so we effectively truncate and move on.

LRU string bank eviction: Fixed bug where LoadOrStore deleted by value instead of key, causing the cache to grow unbounded. The eviction now correctly uses the map key.

Caution

Again, another failure to understand compatibility decisions. The only reason this interface supports a key and value is due to a very old implementation that leveraged sync.Map -- the key and value passed in aren't different, and we don't validate that because we don't need to.

String bank semantics: Fixed NoOpStringBank.LoadOrStore to return false (not loaded) instead of true, matching the contract that nothing is cached.

Caution

This doesn't fix anything. It doesn't matter whether it returns true or false because it doesn't do anything. Thus, no-op.

Slice/map length validation: Added bounds checking in unmarshalling code to reject negative or oversized length prefixes that could cause panics or huge allocations.

Caution

Panic is preferred. This is a deterministic file format.

Code injection prevention: Added validation in ToDefaultValue and ToRawDefault to parse user-supplied default values as Go expressions, preventing arbitrary code injection at go generate time.

Caution

This is so absurd, I don't even feel like trying to justify why it's dumb. If you have the bingen annotations, you have access to the source. If you can "inject" source, you would just modify it directly. There is no decoupling of the annotations and the source code itself.


These two might actually be useful:

  • CLI input validation: Added checks for package names containing path separators or .. traversal, and version range validation (0-255).
  • Annotation validation: Added detection of duplicate @bingen annotations and mismatched brackets in annotation syntax.

Due to it's complete miss of prior assumptions, many many things it's labeling as "bad" were just intentional decisions that go along with the all or nothing approach to a deterministic format. We would always rather a panic on some non-deterministic encounter with the data. always

There are a few things we do care to surface, but not slice length validation. Similarly, it's obsession with the ReadString() call and purposefully wanting to allow support of huge strings, then adding a critique that "STRINGS CAN ALLOCATE UNBOUNDED" (which is wrong).

I would appreciate if we can discuss these types of things before we throw out a wide sweeping AI generated PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants