Skip to content

perf(stdlib): defer string.format width padding to iolist#316

Open
davydog187 wants to merge 3 commits into
mainfrom
perf/string-format-width-iolist
Open

perf(stdlib): defer string.format width padding to iolist#316
davydog187 wants to merge 3 commits into
mainfrom
perf/string-format-width-iolist

Conversation

@davydog187
Copy link
Copy Markdown
Contributor

defer string.format width padding to iolist

Plan: .agents/plans/B13-string-format-width-iolist.md
Closes #310

Goal

Eliminate the per-specifier binary allocation in string.format width padding. apply_width_flags/3 built String.duplicate(pad_char, deficit) and concatenated the padding onto the formatted value with <>, producing a fresh binary for every width-flagged specifier. It now returns an iolist ([pad, str] for right-justify, [str, pad] for left-justify) that flows through the iolist accumulator introduced in PR #299, so the padded result is materialised exactly once at the top level via IO.iodata_to_binary/1.

This is a performance-only change. Output is byte-identical to the previous implementation for every input.

Success criteria

  • apply_width_flags/3 returns an iolist ([pad, str] / [str, pad]) instead of a concatenated binary; the no-padding branch still returns str unchanged.
  • The padded result threads through format_string/3 / format_directive/3 with no intermediate IO.iodata_to_binary per specifier — materialisation happens only at the format_string/3 base case. Verified: single call site at apply_format_spec/2, which feeds the existing [acc, str] append.
  • Output byte-identical for left-justify, right-justify, space pad, and zero pad — including the zero-padded-with-sign case (%05d of -7 -> -0007). Verified with direct Lua.eval! checks and the full property suite.
  • Byte-width semantics for multibyte %s preserved (format("%6s", "café") -> " café", one fill byte). Verified.
  • String tests pass: test/lua/vm/string_test.exs 152 passed (41 properties, 111 tests). (Note: the path named in the plan, test/lua/vm/stdlib/string_test.exs, does not exist; the actual string.format coverage lives in test/lua/vm/string_test.exs — recorded in Discoveries.)
  • Full mix test passes: 2114 passed, 19 skipped, 1 excluded — identical to pre-change counts.
  • mix compile --warnings-as-errors passes.
  • mix run benchmarks/string_format.exs width-flagged (n=1000) case improved (see below).

Benchmark (width-flagged, n=1000)

Focused timing (200 runs after 50-warmup, this machine):

ms/call
before (main) ~3.88
after ~3.39

~13% faster on the width-flagged path by removing the per-specifier <> copy of str + pad.

Changes

 .agents/plans/B13-string-format-width-iolist.md | 139 ++++++++++++++++++++++++
 lib/lua/vm/stdlib/string.ex                     |  13 ++-
 2 files changed, 148 insertions(+), 4 deletions(-)

Discoveries

  • The plan referenced test/lua/vm/stdlib/string_test.exs, which does not exist. The string.format unit + property coverage is in test/lua/vm/string_test.exs; that file was run instead (152 passed). No scope change.

Verification

mix format
mix compile --warnings-as-errors
mix test test/lua/vm/string_test.exs   # 152 passed (41 properties, 111 tests)
mix test                               # 2114 passed, 19 skipped, 1 excluded

Out of scope (intentional)

apply_width_flags/3 built String.duplicate/2 padding and concatenated it
onto the formatted value with <>, allocating a fresh padded binary for
every width-flagged specifier. Return an iolist ([pad, str] / [str, pad])
instead and let it thread through the existing format_directive/3
accumulator, so the padded result is materialised exactly once at the
format_string/3 base case via IO.iodata_to_binary/1. Output is
byte-identical; the width-flagged benchmark (n=1000) improves ~13%.

Plan: B13
Closes #310
Copy link
Copy Markdown
Contributor Author

@davydog187 davydog187 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated code review — verdict: clean (perf-only, correct)

Reviewed the diff against main. This is a sound performance-only change; output is byte-identical and the data flow is iolist-safe end to end.

Correctness (the one real risk: does anything downstream treat the result as a binary?) — no.

  • apply_width_flags/3 measures byte_size(str), binary_part(str, …), and String.starts_with?(str, "-") on the input raw binary, then returns an iolist. No binary operation is ever applied to the iolist it produces.
  • Its sole consumer is apply_format_spec/2format_directive/3, which appends [acc, str]. An iolist str is a valid iolist element, so it threads through untouched and is materialised exactly once at the format_string/3 base case (IO.iodata_to_binary/1). No intermediate flatten, no per-specifier binary allocation — which is the whole point.

Edge cases verified:

  • Zero-pad-with-sign ["-", pad, binary_part(str, 1, byte_size(str) - 1)] is byte-for-byte equal to the old "-" <> pad <> binary_part(...) — sign stays leftmost (%05d of -7-0007).
  • 0+- flag combo: pad_char correctly falls back to space (printf has - override 0); left-justify path [str, pad].
  • Multibyte %s width stays byte-measured (%6s of "café"" café").

Conventions: scope is stdlib (not the plan id); no plan-id leakage in source; comment explains the iolist rationale well.

Tests/bench: full suite byte-identical (2114 passed / 19 skipped, unchanged); width-flagged n=1000 ~3.88ms → ~3.39ms (~13% faster).

No blocking or actionable findings. Note for the merge train: #317 and #319 also edit lib/lua/vm/stdlib/string.ex, so whichever of these merges later will need a trivial rebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf(stdlib): defer string.format width padding to iolist

1 participant