Skip to content

perf: patch midi-file writeBytes to in-place push (13.4× writer speedup)#110

Closed
elicwhite wants to merge 14 commits into
Geomitron:masterfrom
elicwhite:perf/midi-writebytes-inplace
Closed

perf: patch midi-file writeBytes to in-place push (13.4× writer speedup)#110
elicwhite wants to merge 14 commits into
Geomitron:masterfrom
elicwhite:perf/midi-writebytes-inplace

Conversation

@elicwhite
Copy link
Copy Markdown
Contributor

Summary

Writer.prototype.writeBytes in midi-file 1.2.4 is O(n²):

Writer.prototype.writeBytes = function(arr) {
  this.buffer = this.buffer.concat(Array.prototype.slice.call(arr, 0))
}

Every call allocates two new arrays and replaces this.buffer, so writing N bytes via M calls costs O(N·M) and churns the GC. Replace with an in-place push loop — each call is O(arr.length), total write is O(N).

Measurement

I set up an autoresearch bench that parses + writes + re-parses 2000 charts (500 .chart + 1500 .mid) across 8 workers, timing only the writeChartFile / writeMidiFile call and validating every re-parse against a baseline ScannedChart-equivalent hash. CPU profile at baseline showed 35.76% in Writer.writeUInt8 + 21.06% in Writer.writeBytes + 13.81% GC — all in midi-file itself.

metric baseline after delta
mean 90.115 ms 6.721 ms −92.5%
p50 59.944 ms 5.764 ms −90.4%
p95 274.443 ms 17.255 ms −93.7%
p99 560.784 ms 25.263 ms −95.5%
wall (8w) 24.972 s 5.496 s −78.0%

13.4× speedup on mean writer time. 0 hash mismatches, 442/442 tests pass — byte-identical output.

Test plan

  • yarn test green
  • 2000-chart round-trip (parse → write → re-parse) produces the identical structured ParsedChart

Stack (bottom → top)

  1. Move hasLyrics/hasVocals/hasForcedNotes from ParsedChart to ScannedChart (state-derived) #98 Move hasLyrics/hasVocals/hasForcedNotes to ScannedChart
  2. Extract shared natural-HOPO helpers into natural-hopo.ts #105 Extract shared natural-HOPO helpers
  3. Add createEmptyChart() counterpart to parseChartAndIni #99 createEmptyChart
  4. Add writeIniFile() for song.ini emission #100 writeIniFile
  5. writeChartFile: [Song] / [SyncTrack] / [Events] / unrecognized sections #101 writeChartFile core
  6. writeChartFile: emit instrument track sections #102 writeChartFile tracks
  7. Add writeMidiFile() with TEMPO/EVENTS/unrecognized-track emission #103 writeMidiFile core
  8. writeMidiFile: emit PART DRUMS instrument tracks #104 writeMidiFile drums
  9. writeMidiFile: emit PART GUITAR / GHL instrument tracks #106 writeMidiFile guitar
  10. writeMidiFile: emit PART VOCALS / HARM1-3 vocal tracks #107 writeMidiFile vocals
  11. Add same-format round-trip integration tests #108 round-trip integration tests
  12. Add ChartDocument + writeChartFolder orchestrator #109 ChartDocument + writeChartFolder
  13. this PR writeBytes O(n²) fix
  14. (followup) Uint8Array writer + Object.assign spreads

Widens ParsedChart.metadata to Partial<typeof defaultMetadata> & {
extraIniFields?: Record<string, string>}, and has parseChartAndIni merge
[Song]-section values + song.ini into it (ini wins, matching the
"song.ini is authoritative, [Song] is a legacy overlap" stance). Unknown
ini key/value pairs move from ParseChartAndIniResult.iniUnknownValues
onto parsedChart.metadata.extraIniFields for round-trip writing.

ParseChartAndIniResult.iniMetadata and iniUnknownValues are removed --
the same data is now reachable via result.parsedChart.metadata.
iniFolderIssues and iniMetadataIssues stay (they're diagnostics, not
duplicate data).

scanChart's diff-against-default checks now read the merged metadata via
parseResult.parsedChart.metadata; "set" is defined as "present AND not
equal to the default". This also fixes a pre-existing inconsistency
where chart.delay was taken from ini but chart.chart_offset was taken
from [Song] -- both now come from the merged metadata with ini winning.

Also derives defaultIniChartModifiers as a projection of defaultMetadata
so the 8 parse-behavior defaults can't drift from the 40-field source
of truth.
…art (state-derived)

The three derived boolean flags on ParsedChart (hasLyrics, hasVocals,
hasForcedNotes) were parse-time snapshots that went stale whenever
consumers mutated chart data post-parse. Remove them from ParsedChart
entirely; scanChart derives all three at scan time from the current
chart state, piggybacking on its existing single note-walk (no extra
iterations).

- hasLyrics / hasVocals: computed from parsedChart.vocalTracks.parts —
  constant-time existence check.

- hasForcedNotes: inverts resolveFretModifiers. A note is 'forced' iff
  its resolved flag disagrees with the natural HOPO state the parser
  would pick without any force events, or it carries the tap flag
  (tap can only come from explicit forceTap). Inlined helpers for
  isNaturalHopo, isFretChord, isSameFretNote, isInFretNote,
  computeHopoThresholdTicks in chart-scanner.ts (duplicated from
  notes-parser.ts; a follow-up can extract to a shared helper).

Semantic change to note: under the new definition, redundantly-applied
force events (e.g. explicit forceHopo on a naturally-HOPO note) no
longer contribute to hasForcedNotes — the chart plays identically with
or without them, so state-derived detection correctly says false. This
eliminates the need for any 'hasForcedNotes backstop' in writers since
the flag round-trips naturally: the writer emits force events exactly
when a flag disagrees with natural state, and the scanner detects
those same disagreements on re-parse.

ScannedChart.notesData's shape is unchanged (hasLyrics / hasVocals /
hasForcedNotes still present); only the population source changes.

Consumers that previously read parsedChart.hasLyrics /
parsedChart.hasVocals / parsedChart.hasForcedNotes must switch to
scanChart output.
The chart scanner, chart writer, and MIDI writer each had their own
near-identical copies of isFretChord / isSameFretNote / isInFretNote /
isNaturalHopo over NoteEvent[], and the parser had the same three
helpers again over TrackEvent[]. This consolidates all five callsites
into src/chart/natural-hopo.ts.

The helpers are generic (<T, E extends { type: T }> + an isFret
predicate); thin wrappers are exported for each concrete group type:

  - NoteEvent-based (scanner + writers): isFretChord / isSameFretNote /
    isInFretNote, plus the combined isNaturalHopo.
  - TrackEvent-based (parser's resolveFretModifiers):
    isFretChordRawEvents / isSameFretNoteRawEvents /
    isInFretNoteRawEvents. The parser still builds its natural-HOPO
    check inline — it compares effectiveNotes to lastNotes while
    passing the raw pre-coalesced events to isSameFretNoteRawEvents,
    a subtlety no NoteEvent callsite has.

The isFretNoteType / isFretEventType predicates are also exported,
so the parser's two standalone isFretNote(...) callsites (filtering
event lists outside the group helpers) share the same definition too.
The parser's four local helpers (isFretNote, isSameFretNote, isFretChord,
isInFretNote) are deleted.

Net: ~70 LOC of duplication removed across scanner, parser, and (in
the writer branches further up the stack) the two writers. One file
owns every natural-HOPO rule.

This commit updates the scanner + parser; the writers move to the
shared helpers in the two branches further up the stack that own them.
Builds a minimal valid ParsedChart from scratch: default 480 resolution,
120 BPM at tick 0, 4/4 time signature at tick 0, empty tracks/sections/
metadata/vocal parts/unrecognized events. chartBytes defaults to an empty
Uint8Array (no source bytes), format defaults to 'chart', iniChartModifiers
to the library defaults.

Options let callers override resolution, bpm, timeSignature, and format.

Useful for programmatic chart generation (e.g. downstream code that builds
charts up from scratch rather than parsing source bytes).
Serializes IniMetadata (Partial<typeof defaultMetadata> + extraIniFields)
back to song.ini text with:
- [song] header
- Known fields emitted in the canonical defaultMetadata order
- Undefined values skipped
- Booleans as True/False (Clone Hero convention)
- extraIniFields appended in insertion order
- CRLF line endings

Derives its field order from Object.keys(defaultMetadata) rather than
hardcoding a separate FIELD_ORDER list, so new fields added to
defaultMetadata automatically participate in writing.
…emission

Port of the non-instrument-track half of the chart writer into scan-chart:
- [Song] emits the subset of metadata the chart body supports (Name,
  Artist, Charter, Album, Genre, Year, Resolution, Offset, PreviewStart,
  Difficulty). Other fields live exclusively in song.ini.
- [SyncTrack] emits tempos as `B millibeats` and time signatures as
  `TS numerator [denominator-exponent]`, sorted by tick with TS before B
  at the same tick.
- [Events] emits section markers (wrapped as [section name] to survive
  the parser's greedy trailing-\] regex), end events, unrecognized global
  events (with bracket-stripping when source is .mid so round-trip to
  .chart emits naked text), and vocal phrase_start/phrase_end/lyric
  events from the normalized vocalTracks.parts.vocals.
- Unrecognized chart sections are re-emitted verbatim.

Instrument tracks ([ExpertSingle] etc.) land in a follow-up PR.
Completes writeChartFile with per-track section emission:

- [ExpertSingle], [HardDrums], etc. named from instrument + difficulty.
- Drum tracks: base notes (N 0..4 or 0..5 for 5-lane), cymbal markers
  (N 66/67/68) only in fourLanePro, accent (N 34..38) and ghost (N 40..44)
  markers matching the emitted note's eventType, one N 109 flam per group,
  double kick as N 32 only.
- 5-lane round-trip: greenDrum+cymbal emits as N 4 (orange) while plain
  greenDrum stays at N 5; blueDrums coinciding with greenDrum+cymbal emit
  as N 5 to keep the parser's drumType detection on fiveLane.
- Fret/GHL tracks: notes via the 5-fret and 6-fret maps, tap as N 6,
  forceUnnatural (N 5) when the note's hopo/strum flag disagrees with
  the natural-HOPO state.
- Star power S 2, activation lanes S 64, flex lanes S 65/66, versus
  phrases S 0/1, solo sections as E 'solo'/'soloend' (tick + length - 1
  for soloend to round-trip).
- Per-track text events (including disco flip mix markers).
- Disco-flip state transitions per difficulty from red/yellow drum
  disco/discoNoflip flags → 'mix <diff> drums0[d|dnoflip]' text events.
- Coda events: if no 'coda' in unrecognizedEvents, synthesize from
  drumFreestyleSections where isCoda=true.
- hasForcedNotes backstop: if hasForcedNotes is set but no forceUnnatural
  emitted, append N 5 0 at a vacant tick in the first fret track to
  preserve the flag on round-trip.
Port of the core MIDI writer infrastructure:
- writeMidiFile(chart) entry point: builds a Format-1 MIDI and returns
  Uint8Array via midi-file's writeMidi
- TEMPO TRACK: trackName + setTempo (from chart.tempos) + timeSignature
  (from chart.timeSignatures), all at their absolute ticks
- EVENTS track: trackName + section text events (unwrapped so YARG's
  NormalizeTextEvent doesn't strip names containing ]) + [end] events +
  global events (bracket-wrap when source is .chart so MIDI output
  follows convention) + [coda] derived from drumFreestyleSections when
  not already present in unrecognizedEvents
- Unrecognized MIDI tracks: verbatim pass-through with abs-tick →
  delta-tick conversion; duplicate track names suffixed to keep the
  internal trackMap unique
- finalizeMidiTrack helper: sorts by tick with a type-priority
  tiebreaker, converts absolute ticks to delta times, appends endOfTrack

Instrument tracks (PART DRUMS, PART GUITAR, …) and vocal tracks
(PART VOCALS, HARM1/2/3) land in follow-up PRs.
Adds drum emission to writeMidiFile:

- buildDrumTrack: groups one or more ParsedTrackData entries (one per
  difficulty) into a single PART DRUMS MIDI track. Emits trackName,
  delegates per-difficulty notes to emitDrumNotes, collects and dedupes
  instrument-wide star power (MIDI 116) / solo (MIDI 103) / activation
  lanes (MIDI 120), emits flex lanes (MIDI 126/127 with per-difficulty
  LDS velocity), passes through per-track text events and
  unrecognizedMidiEvents from the first difficulty.

- emitDrumNotes: per-difficulty note emission with full modifier support:
  base drum notes (MIDI 96..100 expert / 84..88 hard / 72..76 medium /
  60..64 easy); double-kick as MIDI 95 only, emitted AFTER regular kick
  at the same tick (YARG insertion order quirk); tom markers (MIDI
  110/111/112) only in fourLanePro; accent (velocity 127) / ghost
  (velocity 1) encoding; one flam marker (MIDI 109) per group; 5-lane
  green+cymbal remapped to offset 4 (orange pad); blue-at-same-tick-as-
  green+cymbal remapped to offset 5 so fiveLane detection succeeds;
  fourLanePro green+tom fallback to offset 5 when a conflicting
  green+cymbal exists at the same tick across difficulties; disco-flip
  state-transition text events (`[mix <diff> drums0[d|dnoflip]]`).

- fourLanePro sentinel greenTomMarker: when drumType=1 but no tom
  markers were emitted (all yellow/blue defaulted to cymbal, green-toms
  used offset-5 encoding), emit a MIDI 112 at a safe tick so scan-chart's
  drumType detection still picks fourLanePro.

- [ENABLE_CHART_DYNAMICS] text event at tick 0 when any accent or ghost
  was emitted.

- computeLengthOverrides: prevents scan-chart's trimSustains from
  collapsing adjacent short-sustain drum-note chains by attributing the
  combined chain length to the first note.

- Shared helpers: addNoteOnOff, addNoteOnOffWithChannel (with
  zero-length seq-pairing to survive finalizeMidiTrack's sort),
  deduplicateSections, instrumentTrackNames table.

Fret tracks (PART GUITAR, GHL) and vocal tracks (PART VOCALS, HARM1/2/3)
remain no-ops until the follow-up PRs land — the writeMidiFile entry
iterates trackData but skips any non-drum instrument group.
Adds fret-instrument emission to writeMidiFile.

- buildFretTrack: one MIDI track per instrument group (guitar / bass /
  guitarghl / bassghl / keys / guitarcoop / rhythm / GHL variants);
  multiple difficulties merge into the same track.
- emitFretNotes: 5-fret (95/83/71/59 diffStarts) and 6-fret/GHL
  (94/82/70/58) note-number mapping; open notes routed two ways:
  ENHANCED_OPENS mode (diffStart+0, preserves chord-with-open) when any
  group contains both open and non-open, else forceOpen-SysEx fallback
  (diffStart+1 + SysEx, collapses chords with open but safer next to
  animations at easy diffStart). Coalesces fret-note length with
  overlapping animations at the same tick:noteNumber via pre-computed
  seq slots so animation ordering survives re-parse.
- Force modifier ranges (forceHopo / forceStrum / forceTap) via
  emitFretModifierRanges — inverts resolveFretModifiers: natural HOPO
  state is re-derived with the MIDI threshold (1 + res/3), and a modifier
  emits only when the resolved flag disagrees. forceHopo = diffStart+6
  (5-fret) / +7 (GHL); forceStrum = +7 / +8; forceTap as SysEx.
- reconstructModifierRanges: collapses per-tick hit sets into minimal
  contiguous ranges over noteTicksInOrder.
- Instrument-wide emission: star power (MIDI 116), solo (MIDI 103),
  flex lanes (126/127 with per-difficulty LDS velocity), per-track text
  events, versus phrases (105/106), animations (with skip when the fret
  note above already emitted at the same tick:noteNumber), per-track
  unrecognizedMidiEvents.
- [ENHANCED_OPENS] text event when that mode is active.
- Helpers added: addSysExOnOff, reconstructModifierRanges,
  midiIsFretChord / midiIsSameFretNote / midiIsInFretNote.

Vocal tracks (PART VOCALS / HARM1-3) land in the follow-up PR.

Tests: 15 new test cases covering track naming, 5-fret + GHL
note-number mapping, open-note encoding (both modes), force modifier
emission (hopo/strum/tap + non-emission when natural matches),
instrument-wide sections, and round-trip through parseChartAndIni for
both 5-fret modifiers and GHL chord-with-open. 53 midi-writer tests,
397 total scan-chart tests passing at the tip.
Adds vocal-track emission to writeMidiFile. One MIDI track per vocal
part (vocals → PART VOCALS, harmony{1,2,3} → HARM{1,2,3}; HARM* not
PART HARM*, matching the convention used in the wild).

Phrase-marker emission is careful about YARG's CopyDownPhrases, which
at parse time copies HARM1's notePhrases onto HARM2/HARM3:

  - PART VOCALS: notePhrases → 105 (player:2 → 106)
  - HARM1: notePhrases → 105, staticLyricPhrases → 106
  - HARM2: only staticLyricPhrases → 106 (note 105 comes from HARM1
    via CopyDown on re-parse — would double-count otherwise)
  - HARM3: no phrase markers at all

Lyrics and notes are union'd across both phrase sets (note 105 and 106
can have different boundaries; emitting the union keeps all lyrics
that belong on the track). Zero-length vocal notes are preserved via
per-event `seq` tags so finalizeMidiTrack keeps noteOn immediately
before its matching noteOff.

Star power (MIDI 116) is suppressed on HARM2/HARM3 — CopyDown recreates
it from HARM1 on re-parse. Range shifts (MIDI 0) and lyric shifts (MIDI
1) are per-part for lossless round-trip, with fallback to the
track-level arrays for the part that owns them (PART VOCALS, or HARM1
when PART VOCALS is absent). Raw vocal-track text events (stance
markers, facial anim triggers) are re-emitted verbatim so that
stance-only tracks survive round-trip — YARG marks a VocalsPart
non-empty iff it has phrases or text events.

Tests: 18 new cases covering track naming (PART VOCALS, HARM1/2/3 with
no PART HARM* prefix), phrase-marker routing per part, pitched/
percussion/out-of-range note emission, lyric union across phrase sets,
star power suppression on HARM2/3, range+lyric shifts, text events,
and round-trip through parseChartAndIni for both PART VOCALS and the
full harmony stack with CopyDown semantics preserved.

71 midi-writer tests, 415 total scan-chart tests passing.
End-to-end coverage for the scan-chart writers: build a ParsedChart,
serialize via writeChartFile/writeMidiFile, re-parse via parseChartAndIni,
and assert that structured fields survive.

The per-feature tests (chart-writer.test.ts, midi-writer.test.ts) already
cover granular correctness. This file's job is interactions across tracks
— e.g. a chart with drums + guitar + vocals populated together — and the
"everything enabled at once" shape the earlier tests skip.

Coverage:
  - tempo/TS/sections/endEvents (both formats)
  - drums: kick, tom/cymbal, accent, ghost, 2x-kick (both formats)
    flam: .mid only (the .chart parser doesn't recognize N 109 — writer
    emits it, parser gap is pre-existing)
  - 5-fret: base colors + forceHopo + forceTap + sustains (both formats)
    star power + solo sections (both formats)
  - GHL: open + chord-with-open ENHANCED_OPENS path (both formats)
  - vocals: PART VOCALS full shape; HARM1/2/3 with CopyDown semantics (.mid)
  - multi-track: drums + guitar + bass (both formats); drums + guitar +
    vocals (.mid)
  - metadata: [Song] fields and chart_offset via .chart

18 new tests; 433 total passing.
Bundles a ParsedChart with non-chart files (audio, album art, extras)
into a single value, and provides the write-side companion to
parseChartAndIni's input shape:

    files (in)                    ChartDocument (out)
    ----------------              -----------------------
    notes.chart or notes.mid   →  parsedChart
    song.ini                   →  parsedChart.metadata
    song.ogg, album.png, ...   →  assets (passthrough)

writeChartFolder produces the reverse transformation:

  - notes.chart or notes.mid from writeChartFile / writeMidiFile based on
    parsedChart.format
  - song.ini from writeIniFile(parsedChart.metadata) — chart_offset is
    naturally skipped since writeIniFile only emits keys in defaultMetadata
  - every entry in doc.assets, in order

No separate `metadata` field on ChartDocument — it lives on
parsedChart.metadata (the consolidated shape produced by parseChartAndIni).

Tests: 9 cases covering format selection, ini content (chart_offset not
leaked, extraIniFields preserved), asset passthrough + ordering, and
full round-trip through parseChartAndIni for both .chart and .mid with
assets.

442 total scan-chart tests passing.
Writer.prototype.writeBytes was O(n²):
  this.buffer = this.buffer.concat(Array.prototype.slice.call(arr, 0))
Every call allocated a new array and replaced `this.buffer`, so writing N
bytes via M calls cost O(N·M) time and churned the GC.

Replace with an in-place push loop — each call is O(arr.length), total
write is O(N).

Measured on the writer autoresearch bench (2000 charts, 500 .chart +
1500 .mid, 8 workers):
  - mean:  90.115 ms → 6.721 ms   (13.4× faster)
  - p50:   59.944 ms → 5.764 ms   (10.4× faster)
  - p95:  274.443 ms → 17.255 ms  (15.9× faster)
  - p99:  560.784 ms → 25.263 ms  (22.2× faster)
  - wall:  24.972s   → 5.496s     (4.5× faster)
  - summed writer time: 162.206s → 12.098s (93% reduction)

0 hash mismatches, 442/442 tests still green — byte-identical output.

CPU profile at baseline showed:
  35.76% Writer.writeUInt8
  21.06% Writer.writeBytes
  13.81% GC
of total writer time. Fixing writeBytes removes both the direct cost
(it's a handful of the 19 writeBytes callsites downstream of every
track write) and the GC pressure from the allocate-and-replace pattern.
@elicwhite
Copy link
Copy Markdown
Contributor Author

Closing — these patches were for the upstream midi-file package, which scan-chart no longer uses. They've been incorporated into @geomitron/midi-file (the fork that scan-chart now depends on, swapped in via #117), so this PR is obsolete.

@elicwhite elicwhite closed this Apr 21, 2026
@elicwhite elicwhite deleted the perf/midi-writebytes-inplace branch April 21, 2026 04:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant