Fix critical data corruption bugs and improve robustness#7
Conversation
Several wire-format choices in v0.1 silently corrupted data or opened DoS attack surface. Bump to v0.2 (breaking) and tighten: - int / uint encode as 8 bytes (previously truncated to 32 bits on the wire, silently dropping values outside int32 range on 64-bit hosts). - WriteString length prefix is now uint32 (previously uint16, with strings > 64 KiB silently truncated). - ReadString / ReadBytes / generated slice and map readers / string-table reader now bound the decoded length against Buffer.Remaining() before allocating, so a crafted blob with len = MaxInt32 returns an error or empty value rather than triggering a huge allocation. - Default value rendering for @bingen:field[default=...] is validated: string defaults are strconv.Quote'd, bool defaults are restricted to true/false, numeric and non-basic-type defaults must parse as Go expressions. This closes a code-injection path through user-supplied annotations. - Buffer.Bytes() no longer prints to os.Stderr on drain failure; errors surface via the new Buffer.Err() accessor. - bufferpool now stores *[]byte to avoid sync.Pool's per-Put interface allocation on a slice value. Adds buffer round-trip / negative tests, a fuzz target on the read paths, and tests for the default-value validators. Test codecs are regenerated. https://claude.ai/code/session_01N7NVPfVgS2JE9p149S44W1
…ned parsing The generator and annotation parser had several places where bad input caused process-wide panics, silent fall-throughs, or non-deterministic output. This commit replaces panics with errors, sorts iteration order so codec output is byte-identical across runs, and tightens validation: - toGenType returns a clear error (panic with explanation) for unsupported AST expressions instead of falling through to the bool codec, which produced a compiling but data-corrupting codec for func/chan/generic fields. - LoadTypes returns errors instead of panicking on parse failure, and sorts package and file iteration so generation is deterministic. - Generate returns an error and recovers panics from template rendering so callers see a normal failure rather than a process crash. - cmd/bingen validates -version (must fit in uint8), rejects -package values containing path separators or "..", checks the os.Remove error on the existing codec file, and propagates Generate's error. - @bingen:define inputs are validated: name must be a Go identifier and type must parse as a Go expression before being spliced into a parser input. - The annotation parser rejects mismatched square brackets instead of silently folding the bracketed remainder into the command name. - Duplicate (command, target) annotations within a version set return an error instead of being silently overwritten. - Embedded fields (no Names) no longer trip an index-out-of-range panic while building an already-failing parse error. Adds a determinism test that re-runs the generator on the same input and asserts byte-identical output. https://claude.ai/code/session_01N7NVPfVgS2JE9p149S44W1
…leak Three related issues in the string interner: - The default StringBank was unbounded, so a workload that decoded many unique strings grew memory without limit. Switch the default to a bounded LRU bank (100k entries, 30s eviction interval). - The LRU bank evicted by `value` rather than `key`. LoadOrStoreFunc uses value == key, so it appeared to work in tests, but LoadOrStore with key != value silently no-op'd in eviction and let the map grow past capacity. Track the key on each entry and delete by key. - Stop() was defined on lruStringBank but not on the StringBank interface, so UpdateStringBank had no way to stop the previous bank's background goroutine. Every swap leaked one goroutine. Promote Close() error onto the interface (matching io.Closer) and call it on swap. Adds tests covering the eviction map-size invariant and a goroutine-leak check around UpdateStringBank. https://claude.ai/code/session_01N7NVPfVgS2JE9p149S44W1
There was a problem hiding this comment.
Pull request overview
This PR updates bingen’s wire format and runtime utilities to prevent silent truncation/corruption during binary encode/decode, strengthens validation against hostile payloads, and makes code generation deterministic and more robust (returning errors instead of crashing).
Changes:
- Bump wire encoding for
int/uintto 64-bit and string length prefixes touint32, adding overflow/length validation. - Make generator output deterministic (sorted imports/packages/files/sets) and harden generation (panic recovery, better error propagation, default-value expression validation).
- Improve string interning robustness (bounded LRU + Close semantics), buffer pooling, and add tests (determinism, fuzzing, eviction, round-trips) + README breaking-change notes.
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/opencost/opencost_codecs.go | Adds length-prefix validation to generated codecs used by tests |
| tests/generator_test.go | Propagates generator errors instead of discarding them |
| tests/determinism_test.go | New test asserting byte-identical generator output across runs |
| tests/containerv2/containerv2_codecs.go | Adds slice/string-table length-prefix validation in test codecs |
| tests/container/container_codecs.go | Adds slice/string-table length-prefix validation in test codecs |
| tests/aliases/aliases_codecs.go | Import ordering tweak + adds slice/map/string-table validation in test codecs |
| pkg/util/stringutil/stringutil.go | Default string bank becomes bounded LRU; adds Close to StringBank and closes old bank on swap |
| pkg/util/stringutil/noopbank.go | Fixes no-op bank semantics (always “not loaded”); adds Close |
| pkg/util/stringutil/mapbank.go | Adds Close implementation (clears map) |
| pkg/util/stringutil/lrubank_test.go | New tests for LRU eviction correctness + goroutine cleanup on UpdateStringBank |
| pkg/util/stringutil/lrubank.go | Fixes eviction-by-key bug; implements Close; stores key in entries |
| pkg/util/bufferpool.go | Stores pooled buffers as *[]byte to reduce interface boxing overhead |
| pkg/util/bufferhelper.go | Switches int/uint read/write helpers to 64-bit encoding with overflow detection |
| pkg/util/buffer_test.go | New round-trip, large-string, rejection, and fuzz tests for Buffer |
| pkg/util/buffer.go | Updates string encoding to uint32; adds Err() and Remaining(); adds some bounds checks |
| pkg/types/types.go | Deterministic imports and type collection; panic on unsupported AST expr; better error handling; fix embedded-field error message |
| pkg/types/helper.go | Validates @bingen:define names/types to prevent splicing arbitrary text into generated Go |
| pkg/meta/annotationset.go | Detects duplicate annotations; deterministic annotation loading order |
| pkg/meta/annotation.go | Rejects malformed bracket syntax instead of misparsing annotation commands |
| pkg/generator/templates_test.go | Tests default-value injection prevention and accepted defaults |
| pkg/generator/templates/go/unmarshaller.go.tmpl | Adds slice/map length-prefix checks + uses ToRawDefault for non-basic defaults |
| pkg/generator/templates/go/support.go.tmpl | Adds string table length validation |
| pkg/generator/templates/go/streamer.go.tmpl | Adds streaming slice/map length-prefix checks + uses ToRawDefault for non-basic defaults |
| pkg/generator/templates.go | Validates defaults via Go parser; quotes strings safely; adds ToRawDefault |
| pkg/generator/bingen.go | Generate now returns errors and recovers panics; improves error messages |
| cmd/bingen/bingen.go | Validates CLI inputs (package path traversal, version range) and propagates generator errors |
| README.md | Documents v0.2 breaking wire-format changes |
Comments suppressed due to low confidence (2)
pkg/util/buffer.go:372
- In reader-mode (bw == nil), ReadString allocates a buffer of size
lwithbytePool.Get(int(l))without any upper bound. A crafted payload can advertise a multi-GB length prefix and trigger huge allocations/OOM. Consider adding a hard maximum string length (configurable), and/or recording an error in the Buffer and returning "" whenlexceeds that bound (similar to the bw-mode check).
readBuffUint32(b.b, &l)
bytes := bytePool.Get(int(l))
defer bytePool.Put(bytes)
pkg/generator/templates/go/support.go.tmpl:459
NewFileStringTableReaderFromvalidates onlytl < 0and then immediately allocatesrefs := make([]fileStringRef, tl). A crafted payload can settlto a huge positive value and trigger an OOM. Consider adding an upper bound (e.g., max entries cap and/or a byte-buffer-only Remaining()-based sanity check) before allocating the refs slice, and avoid panicking on attacker-controlled input if possible.
// table length
tl := buffer.ReadInt()
if tl < 0 {
panic(fmt.Errorf("%s: invalid string table length: %d", GeneratorPackageName, tl))
}
var refs []fileStringRef
if tl > 0 {
refs = make([]fileStringRef, tl)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Remaining returns the number of unread bytes available in the buffer. For | ||
| // reader-mode buffers this returns the bufio.Reader's currently-buffered byte | ||
| // count, which is a lower bound on what's available without performing more | ||
| // I/O. Callers should treat the result as a sanity bound for length-prefix | ||
| // validation, not an exact remaining size. | ||
| func (b *Buffer) Remaining() int { | ||
| if b.bw != nil { | ||
| return b.bw.Len() | ||
| } | ||
| return b.b.Buffered() |
| {{ $lenVar }} := buff.ReadInt() // slice len | ||
| if {{ $lenVar }} < 0 || {{ $lenVar }} > buff.Remaining() { | ||
| {{ .Context.HandleError (printf "fmt.Errorf(\"bingen: invalid slice length %%d (remaining=%%d)\", %s, buff.Remaining())" $lenVar) }} | ||
| } | ||
| {{ $sliceVar }} := make({{ .Type.Name }}, {{ $lenVar }}) |
| {{ $lenVar }} := buff.ReadInt() // map len | ||
| if {{ $lenVar }} < 0 || {{ $lenVar }} > buff.Remaining() { | ||
| {{ .Context.HandleError (printf "fmt.Errorf(\"bingen: invalid map length %%d (remaining=%%d)\", %s, buff.Remaining())" $lenVar) }} | ||
| } | ||
| {{ $mapVar }} := make(map[{{ .Type.KeyType.TypeName }}]{{ .Type.ValueType.TypeName }}, {{ $lenVar }}) |
| {{ $lenVar }} := buff.ReadInt() // slice len | ||
| if {{ $lenVar }} < 0 || {{ $lenVar }} > buff.Remaining() { | ||
| {{ .Context.HandleError (printf "fmt.Errorf(\"bingen: invalid slice length %%d (remaining=%%d)\", %s, buff.Remaining())" $lenVar) }} | ||
| } |
| {{ $lenVar }} := buff.ReadInt() // map len | ||
| if {{ $lenVar }} < 0 || {{ $lenVar }} > buff.Remaining() { | ||
| {{ .Context.HandleError (printf "fmt.Errorf(\"bingen: invalid map length %%d (remaining=%%d)\", %s, buff.Remaining())" $lenVar) }} | ||
| } |
| tl := buffer.ReadInt() | ||
| if tl < 0 || tl > buffer.Remaining() { | ||
| panic(fmt.Errorf("%s: invalid string table length: %d (remaining=%d)", GeneratorPackageName, tl, buffer.Remaining())) | ||
| } |
… buffers Copilot's review on PR #7 surfaced that the new length-prefix sanity checks could falsely reject valid payloads when decoding from an io.Reader. Buffer.Remaining() was implemented as bufio.Reader.Buffered(), which is only the currently-buffered byte count — typically 4096 — not the true remaining payload. A legitimate slice/map/string-table larger than that buffer would be rejected with "invalid length" depending on incidental bufio buffering. - Buffer.Remaining() now returns -1 in reader-mode to signal "unknown". Byte-buffer mode still returns the unread byte count. - Generated unmarshaller / streamer / string-table-reader checks now split the validation: a separate negative-length check always fires; the "exceeds remaining bytes" upper bound only fires when Remaining() returns a non-negative value (byte-buffer mode). In reader-mode the underlying read fails naturally if the stream is short. - Adds tests covering the Remaining() contract in both modes and a generator-level round-trip of a slice larger than bufio.Reader's default buffer through UnmarshalBinaryFromReader. https://claude.ai/code/session_01N7NVPfVgS2JE9p149S44W1
|
Looks good — covers most of the original review. Two things from Copilot's review that I'd want addressed before merge: Reader-mode ReadString (buffer.go:372) and NewFileStringTableReaderFrom (support.go.tmpl) still allocate based on attacker-controlled length with no hard cap. After the v0.2 length prefix change, ReadString can be asked to allocate ~4 GB. Add a MaxStringLength / MaxStringTableEntries constant checked unconditionally before make. H2 (sticky errors threaded through generated unmarshallers) is genuinely structural and worth its own PR, just want to make sure it doesn't get marked as done by association with Buffer.Err(). |
Addresses peatey's review on PR #7. Two related issues remained after the v0.2 wire-format hardening landed: (1) ReadString reader-mode and the file-backed string-table reader allocated based on the attacker-controlled length prefix with no hard cap. With v0.2's uint32 string length, ReadString could be asked to allocate ~4 GiB. - Adds MaxStringLength (64 MiB) in pkg/util/buffer.go, checked unconditionally in both byte-buffer and reader-mode before any allocation. Rejection records ErrStringTooLarge on Buffer.Err() and returns "". - Adds MaxStringTableEntries (1<<20) in the support template, checked in both NewSliceStringTableReaderFrom and NewFileStringTableReaderFrom before the make([]string, tl) / make([]fileStringRef, tl) call. (2) The string-table readers panicked on malformed input. Generate()'s panic recovery doesn't apply at runtime — a hostile input file decoded by a long-running host process would crash it. - NewSliceStringTableReaderFrom and NewFileStringTableReaderFrom now return (StringTableReader, error). The file-backed reader cleans up its temp file on every error path. - NewDecodingContextFromBytes / NewDecodingContextFromReader return (*DecodingContext, error). - The generated UnmarshalBinary / UnmarshalBinaryFromReader and New<Type>Stream functions propagate the error. - StreamFactoryFunc signature widens to (BingenStream, error) and NewStreamFor passes it through. Also adds a TODO(H2) on Buffer.Err() noting that full sticky-error threading through every Read*/Write* and the generated UnmarshalBinaryWithContext epilogue is deferred to its own PR. Tests: - TestReadStringRejectsHugeLengthReaderMode (cap fires in reader mode). - TestReadStringAtCap (boundary value succeeds). - TestUnmarshalRejectsOversizedStringTable (full UnmarshalBinary round-trip on a hand-crafted oversized string-table header returns a normal error rather than panicking). https://claude.ai/code/session_01N7NVPfVgS2JE9p149S44W1
There was a problem hiding this comment.
Pull request overview
This PR hardens bingen’s wire format and generation pipeline to prevent silent truncation/corruption and to make decoding/codegen more robust against malformed or hostile inputs.
Changes:
- Updates the wire format (v0.2) to encode
int/uintas 8 bytes andstringlengths asuint32, with added bounds checks during decode. - Improves generator robustness and determinism (sorted imports/type collection; panic-to-error conversion; safer default-value handling).
- Reworks string interning to a bounded LRU with lifecycle management (
Close()), and adds new tests/fuzzing around buffer and codegen behavior.
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/string_table_caps_test.go | Adds regression test for rejecting oversized string-table headers. |
| tests/reader_mode_test.go | Validates reader-mode unmarshalling for large slices. |
| tests/opencost/opencost_codecs.go | Updates generated codecs with new string-table validation, error-returning context/stream constructors, and length checks. |
| tests/generator_test.go | Updates generator test helper for Generate(...) error signature. |
| tests/determinism_test.go | Adds determinism test ensuring byte-identical generated output across runs. |
| tests/containerv2/containerv2_codecs.go | Updates generated v2 container codecs for new decode/context/stream behavior and validations. |
| tests/container/container_codecs.go | Updates generated container codecs similarly (length validation + error-returning context/streams). |
| tests/aliases/aliases_codecs.go | Updates generated alias codecs (imports, context construction errors, length validation, stream factory signature). |
| pkg/util/stringutil/stringutil.go | Makes default string bank a bounded LRU and closes previous bank on update. |
| pkg/util/stringutil/noopbank.go | Fixes no-op bank LoadOrStore* semantics to report “not loaded”; adds Close(). |
| pkg/util/stringutil/mapbank.go | Adds Close() implementation for map-backed bank. |
| pkg/util/stringutil/lrubank_test.go | Adds tests for correct LRU eviction and goroutine cleanup on bank swap. |
| pkg/util/stringutil/lrubank.go | Fixes eviction to delete by key; adds Close(); stores key in entries. |
| pkg/util/bufferpool.go | Changes pool storage to *[]byte to reduce interface overhead (plus corresponding Get/Put changes). |
| pkg/util/bufferhelper.go | Switches int/uint encoding to 64-bit and adds host-width overflow detection. |
| pkg/util/buffer_test.go | Adds round-trip tests for int/uint/string, Remaining() semantics, oversized string rejection, and fuzz target. |
| pkg/util/buffer.go | Adds Err() tracking, Remaining() contract, uint32 string lengths with caps, and safer ReadBytes behavior. |
| pkg/types/types.go | Sorts imports/type discovery for determinism; replaces silent unsupported-AST fallback with a panic; improves error handling. |
| pkg/types/helper.go | Validates @bingen:define names/types to prevent injection via generated source reparse. |
| pkg/meta/annotationset.go | Adds duplicate-annotation detection and deterministic iteration order when loading annotations. |
| pkg/meta/annotation.go | Rejects mismatched bracket syntax in @bingen directives. |
| pkg/generator/templates_test.go | Adds tests for default-value validation (code-injection prevention). |
| pkg/generator/templates/go/unmarshaller.go.tmpl | Adds slice/map length validation and routes non-basic defaults through ToRawDefault; propagates decoding-context init errors. |
| pkg/generator/templates/go/support.go.tmpl | Adds MaxStringTableEntries cap and makes string-table/context creation return errors; updates stream factory signature. |
| pkg/generator/templates/go/streamer.go.tmpl | Adds streaming slice/map length checks and makes stream constructors return errors. |
| pkg/generator/templates.go | Validates user-supplied defaults as Go expressions and introduces ToRawDefault. |
| pkg/generator/bingen.go | Changes Generate to return errors and recovers panics into errors; improves error context. |
| cmd/bingen/bingen.go | Adds CLI input validation and propagates generator errors instead of panicking/exiting silently. |
| README.md | Documents v0.2 breaking wire-format changes. |
Comments suppressed due to low confidence (1)
pkg/util/buffer.go:410
- In reader-mode, if readBuffFull fails while reading the string payload, ReadString currently returns "" but does not set Buffer.err. This makes truncated streams indistinguishable from legitimate empty strings for callers. Record the read error (or io.ErrUnexpectedEOF) in b.err before returning so callers can detect corruption via Err().
bytes := bytePool.Get(int(l))
defer bytePool.Put(bytes)
_, err := readBuffFull(b.b, bytes)
if err != nil {
return ""
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return "" | ||
| } | ||
| if uint64(l) > uint64(b.bw.Len()) { | ||
| return "" | ||
| } |
| return nil, fmt.Errorf("%s: string table length %d exceeds MaxStringTableEntries %d", GeneratorPackageName, tl, MaxStringTableEntries) | ||
| } | ||
| if rem := buffer.Remaining(); rem >= 0 && tl > rem { | ||
| return nil, fmt.Errorf("%s: string table length %d exceeds remaining bytes %d", GeneratorPackageName, tl, rem) | ||
| } |
|
|
||
| i := putIndex(capacity) | ||
| bp.pools[i].Put(buf[:cap(buf)]) | ||
| full := buf[:capacity] | ||
| bp.pools[i].Put(&full) |
| // table header advertises a length above MaxStringTableEntries and verifies | ||
| // the unmarshal fails with a normal error rather than panicking the host. | ||
| // | ||
| // This exercises the path peatey called out: NewSliceStringTableReaderFrom |
…und; revert pool indirection Addresses four follow-up review comments on PR #7. - Buffer.ReadString and Buffer.ReadBytes now record an error on Buffer.Err() on every silent rejection path: remaining-bytes shortfall in byte-buffer mode (io.ErrUnexpectedEOF), readBuffFull failure in reader-mode, and the existing MaxStringLength cap. Truncated/corrupt payloads previously turned into legitimate-looking empty strings; callers checking Err() can now distinguish. - support.go.tmpl string-table check: change `tl > rem` (entries vs bytes, unit mismatch) to `tl > rem/4`. Each table entry has at least a 4-byte uint32 length prefix, so an honest table of tl entries needs >= tl*4 bytes. The previous bound was 4x looser than reality. - bufferpool.go: revert `*[]byte` storage back to `[]byte`. The earlier change took the address of a local in Put (`&full`), which escapes and allocates a new slice header on every Put — defeating the goal of avoiding the boxing. Plain []byte storage is simpler and equivalent in allocation count, since we have no way to recycle the original *[]byte. - tests/string_table_caps_test.go: replace a personal-name reference in the doc comment with a generic description of the behavior. Adds TestReadStringTruncatedByteBuffer and TestReadStringTruncatedReaderMode verifying Err() is populated on truncated payloads in both modes. https://claude.ai/code/session_01N7NVPfVgS2JE9p149S44W1
There was a problem hiding this comment.
Pull request overview
This PR hardens bingen’s wire format and decoding/generation pipeline to prevent silent data corruption, reduce memory/DoS risk from attacker-controlled inputs, and improve generator determinism and error handling.
Changes:
- Updates wire encoding/decoding (64-bit int/uint, uint32 string lengths) and adds defensive bounds checks in generated unmarshallers and string table readers.
- Improves robustness and determinism in the generator (sorted iteration, safer default value handling, panic-to-error conversion).
- Reworks string interning to a bounded LRU with proper lifecycle management, plus adds/extends tests (incl. determinism and fuzz coverage).
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/string_table_caps_test.go | Adds a regression test ensuring oversized string tables are rejected with an error. |
| tests/reader_mode_test.go | Adds a regression test covering large slice decode in reader-mode buffers. |
| tests/opencost/opencost_codecs.go | Updates generated codecs to new decoding context + string table validation + stream constructor errors. |
| tests/generator_test.go | Updates generator test helper for new Generate(...) error signature. |
| tests/determinism_test.go | Adds a determinism test for generator output stability across runs. |
| tests/containerv2/containerv2_codecs.go | Updates generated containerv2 codecs for new string-table/error semantics. |
| tests/container/container_codecs.go | Updates generated container codecs for new string-table/error semantics. |
| tests/aliases/aliases_codecs.go | Updates generated alias codecs for new stream/context/string-table behaviors. |
| pkg/util/stringutil/stringutil.go | Switches default interning to bounded LRU and closes prior bank on swap. |
| pkg/util/stringutil/noopbank.go | Fixes no-op bank LoadOrStore semantics and implements Close(). |
| pkg/util/stringutil/mapbank.go | Adds Close() implementation to map-backed bank. |
| pkg/util/stringutil/lrubank_test.go | Adds tests for correct LRU eviction keying and goroutine cleanup on bank swaps. |
| pkg/util/stringutil/lrubank.go | Fixes LRU eviction key bug and adds Close() to stop background eviction goroutine. |
| pkg/util/bufferpool.go | Updates buffer pool commentary regarding []byte storage strategy. |
| pkg/util/bufferhelper.go | Changes int/uint encoding helpers to 64-bit and adds overflow detection. |
| pkg/util/buffer_test.go | Adds extensive tests for Remaining(), int/uint round-trips, string caps, truncation, and fuzzing. |
| pkg/util/buffer.go | Adds Remaining() semantics, Err() tracking, uint32 string lengths, and string allocation caps. |
| pkg/types/types.go | Sorts imports / deterministic collection and fails loudly on unsupported AST expressions. |
| pkg/types/helper.go | Validates @bingen:define name/type as Go syntax to prevent injection. |
| pkg/meta/annotationset.go | Adds duplicate-annotation detection and deterministic annotation loading order. |
| pkg/meta/annotation.go | Rejects mismatched bracket syntax in annotation parsing. |
| pkg/generator/templates_test.go | Adds tests for default-value injection prevention and validation behavior. |
| pkg/generator/templates/go/unmarshaller.go.tmpl | Adds slice/map length validation and routes defaults through safe parsing/escaping. |
| pkg/generator/templates/go/support.go.tmpl | Updates support template for MaxStringTableEntries, stream factories returning errors, and safe string tables. |
| pkg/generator/templates/go/streamer.go.tmpl | Adds length checks in streaming readers and stream constructors returning errors. |
| pkg/generator/templates.go | Validates defaults as Go expressions (and quotes strings) to prevent code injection. |
| pkg/generator/bingen.go | Changes Generate to return errors, adds panic recovery, and improves error propagation. |
| cmd/bingen/bingen.go | Adds CLI input validation, improves error handling, and uses generator errors. |
| README.md | Documents v0.2 wire-format breaking changes and validation behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| values := []int{ | ||
| 0, | ||
| 1, | ||
| -1, | ||
| math.MaxInt32, | ||
| math.MinInt32, | ||
| math.MaxInt32 + 1, | ||
| math.MinInt32 - 1, | ||
| math.MaxInt64, | ||
| math.MinInt64, | ||
| } |
| values := []uint{ | ||
| 0, | ||
| 1, | ||
| math.MaxUint32, | ||
| math.MaxUint32 + 1, | ||
| math.MaxUint64, | ||
| } |
| // we trust the underlying read to fail if the stream is short. | ||
| if rem := buff.Remaining(); rem >= 0 && {{ $lenVar }} > rem { | ||
| {{ .Context.HandleError (printf "fmt.Errorf(\"bingen: slice length %%d exceeds remaining bytes %%d\", %s, rem)" $lenVar) }} | ||
| } | ||
| {{ $sliceVar }} := make({{ .Type.Name }}, {{ $lenVar }}) |
| // the buffer is byte-backed and the value is meaningful. | ||
| if rem := buff.Remaining(); rem >= 0 && {{ $lenVar }} > rem { | ||
| {{ .Context.HandleError (printf "fmt.Errorf(\"bingen: map length %%d exceeds remaining bytes %%d\", %s, rem)" $lenVar) }} | ||
| } | ||
| {{ $mapVar }} := make(map[{{ .Type.KeyType.TypeName }}]{{ .Type.ValueType.TypeName }}, {{ $lenVar }}) |
| *data = int(int32(order.Uint32(bs))) | ||
| v := int64(order.Uint64(bs)) | ||
| if int64(int(v)) != v { | ||
| return fmt.Errorf("bingen: int value %d overflows host int width", v) |
| *data = uint(order.Uint32(bs)) | ||
| v := order.Uint64(bs) | ||
| if uint64(uint(v)) != v { | ||
| return fmt.Errorf("bingen: uint value %d overflows host uint width", v) |
| if b.err == nil { | ||
| b.err = fmt.Errorf("ReadBytes: %w", err) | ||
| } | ||
| return bytes |
… test compat Addresses seven follow-up review comments on PR #7. - pkg/generator/templates/go/{unmarshaller,streamer}.go.tmpl: add an unconditional MaxContainerLength cap (1 << 24) on slice and map length prefixes. Without it, a payload decoded in reader-mode (where Remaining() returns -1 and the upper-bound check is skipped) could advertise a huge length and force make() to OOM before any read failed. - pkg/util/buffer.go: Buffer.ReadInt and Buffer.ReadUInt now record the overflow error returned by readInt/readBuffInt/readUint/readBuffUint on Buffer.Err() instead of dropping it, so a 32-bit host decoding a value above MaxInt32 surfaces a real error instead of silently returning zero. - pkg/util/buffer.go: Buffer.ReadBytes reader-mode now returns nil on read failure to match the docstring (was returning the partially-filled allocation). - pkg/util/buffer_test.go: gate the over-int32/over-uint32 round-trip values on strconv.IntSize and construct them via int64/uint64 typed locals, so the file compiles on 32-bit platforms (where math.MaxInt64 and math.MaxUint64 aren't representable as int/uint constants). - pkg/util/buffer_test.go: add TestReadIntOverflowSetsErr (skipped on 64-bit hosts, asserting Err() is set on a 32-bit overflow decode). - tests/string_table_caps_test.go: add TestUnmarshalRejectsOversizedSlice exercising the reader-mode slice cap end-to-end through UnmarshalBinaryFromReader. Verified clean build under GOARCH=386. https://claude.ai/code/session_01N7NVPfVgS2JE9p149S44W1
|
Hey @peatey, There are some changes in here that would break all compatibility with existing kubecost customers, as well as any previously stored versions of bingen file formats. The AI model at-will changed whatever it thought was better without taking any backwards compatibility into account. If there is anything of value in here, it's completely shrouded by the fact that breaking the existing format is a non-starter. This format is also not really a wire format, despite the AI continuing to refer to it that way. I largely disagree with this wide net casting the AI without the highly crucial context of how it is leveraged throughout multiple projects. It's proposed "fixes" for a lot of these "critical issues" assume a hallucinated/made-up circumstance which isn't applicable in basic use-cases.
Caution This is a breaking change, and uint/ints have always been treated as 32-bit within the bingen format. You can't just make this change without breaking the format completely.
Caution This is a breaking change, you can't just change the data type within the format without breaking all previously stored data. This is a design choice to silently truncate strings greater than max ushort. This format was not meant to store giant strings, so we effectively truncate and move on.
Caution Again, another failure to understand compatibility decisions. The only reason this interface supports a key and value is due to a very old implementation that leveraged
Caution This doesn't fix anything. It doesn't matter whether it returns true or false because it doesn't do anything. Thus, no-op.
Caution Panic is preferred. This is a deterministic file format.
Caution This is so absurd, I don't even feel like trying to justify why it's dumb. If you have the bingen annotations, you have access to the source. If you can "inject" source, you would just modify it directly. There is no decoupling of the annotations and the source code itself. These two might actually be useful:
Due to it's complete miss of prior assumptions, many many things it's labeling as "bad" were just intentional decisions that go along with the all or nothing approach to a deterministic format. We would always rather a panic on some non-deterministic encounter with the data. always There are a few things we do care to surface, but not slice length validation. Similarly, it's obsession with the ReadString() call and purposefully wanting to allow support of huge strings, then adding a critique that "STRINGS CAN ALLOCATE UNBOUNDED" (which is wrong). I would appreciate if we can discuss these types of things before we throw out a wide sweeping AI generated PR. |
Summary
This PR fixes multiple critical bugs that caused silent data corruption, improves input validation, and enhances code generation robustness. The changes address issues in binary serialization, string interning, code generation, and error handling.
Key Changes
Critical Data Corruption Fixes
LoadOrStoredeleted by value instead of key, causing the cache to grow unbounded. The eviction now correctly uses the map key.NoOpStringBank.LoadOrStoreto returnfalse(not loaded) instead oftrue, matching the contract that nothing is cached.Input Validation & Security
NewSliceStringTableReaderFromandNewFileStringTableReaderFromto reject invalid length values.ToDefaultValueandToRawDefaultto parse user-supplied default values as Go expressions, preventing arbitrary code injection atgo generatetime...traversal, and version range validation (0-255).@bingenannotations and mismatched brackets in annotation syntax.Code Generation Improvements
Generate()to catch panics and return proper errors instead of crashing the process. Improved error messages throughout code generation.Buffer & Utility Improvements
Err()method to surface errors fromio.Readerdrain operations that previously silently failed.ReadStringto prevent crafted payloads from triggering huge allocations.*[]byteinstead of[]byteto avoid interface allocation overhead on everyPut()call.Testing & Documentation
Notable Implementation Details
buffer.Remaining()to prevent allocation attacksparser.ParseExprto ensure only valid expressions are acceptedlruEntryto fix the eviction bughttps://claude.ai/code/session_01N7NVPfVgS2JE9p149S44W1