Skip to content

Optimize JsonCompleter.parse append-only byte hot path#7

Merged
default-anton merged 1 commit into
mainfrom
perf/parse-byte-hot-path
Mar 13, 2026
Merged

Optimize JsonCompleter.parse append-only byte hot path#7
default-anton merged 1 commit into
mainfrom
perf/parse-byte-hot-path

Conversation

@default-anton
Copy link
Copy Markdown
Collaborator

Summary

Optimize JsonCompleter.parse for append-only streaming input.

The old hot path still paid for input.start_with?(input_snapshot) on every growing chunk to prove the new input shared the old prefix. On long streams that turns into a repeated memcmp over already-processed data and showed up as the dominant profiler frame. This change keeps the existing parsing behavior for append-only streams, switches the syntax dispatch/scanners to raw ASCII bytes, and adds inline comments around the byte-oriented path so the optimization stays reviewable.

Before / after

Benchmark command:

JSON_COMPLETER_BENCHMARK=1 bundle exec rspec spec/parse_benchmark_spec.rb

Benchmark settings from the spec run:

  • payload bytes: 77690
  • prefixes: 9712
  • iterations: 50
  • chunk size: 8

parse

Metric Before After Change
total runtime 1.2887s 0.8435s 34.5% faster
per iteration 25.774ms 16.870ms 34.5% faster
allocated objects 2,047,568 994,865 51.4% fewer
heap growth bytes 196,608 0 eliminated

Relative to complete + JSON.parse

Metric Before After
speedup 12.93x 19.89x
allocation reduction 8.83x 17.14x

What changed

  • removed the repeated full-prefix start_with? check from the append-only parse hot path
  • kept parse state resets for truncation, same-length edits, and completed top-level values
  • switched parser/completion dispatch and shared scanners to byte-oriented loops (bytesize/getbyte/byteslice)
  • added an incremental multibyte string parse spec to guard the byte path
  • documented the append-only stateful-instance contract in README.md and CHANGELOG.md
  • added inline comments explaining the byte constants and why the optimization is safe for UTF-8 payloads

Why this helps

Most streaming updates are append-only and mostly boring bytes.

Before:

  • compare the whole prior prefix with start_with? on every chunk
  • branch on 1-character strings in the main dispatch/scanner loops
  • keep paying that cost as the stream grows

After:

  • trust append-only stateful input and only reset on the cases that actually need it
  • branch on ASCII byte values instead of allocating/transcoding tiny strings
  • keep multibyte string content intact by copying plain runs with byteslice

Same result, less repeated work.

The profiler signal matched that story: rb_str_start_with dominated the sample before the change and disappeared from the hot-path sample after it.

Validation

  • bundle exec rubocop
  • bundle exec rspec
  • JSON_COMPLETER_BENCHMARK=1 bundle exec rspec spec/parse_benchmark_spec.rb

@default-anton default-anton merged commit 100bded into main Mar 13, 2026
5 checks passed
@default-anton default-anton deleted the perf/parse-byte-hot-path branch March 13, 2026 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant