Skip to content

Test262 #9: CharacterClassEscape parser bug — generated tests with full Unicode ranges (12 tests) #108

Description

@nickna

Motivation

Surfaced during the #69 RegExp rollout (commit b968a9e). 12 generated tests under test/built-ins/RegExp/CharacterClassEscapes/ consistently fail at parse time. All have the same shape: they call a buildString(...) helper from regExpUtils.js (in the Test262 harness) with a configuration that enumerates the full Unicode code-point range.

Sample failing tests

  • test/built-ins/RegExp/CharacterClassEscapes/character-class-digit-class-escape-negative-cases.js
  • character-class-digit-class-escape-positive-cases.js
  • character-class-non-digit-class-escape-{negative,positive}-cases.js
  • character-class-non-whitespace-class-escape-{negative,positive}-cases.js
  • character-class-non-word-class-escape-{negative,positive}-cases.js
  • character-class-whitespace-class-escape-{negative,positive}-cases.js
  • character-class-word-class-escape-{negative,positive}-cases.js

The body shape:

const str = buildString({
  loneCodePoints: [],
  ranges: [
    [0x00DC00, 0x00DFFF],
    [0x000000, 0x00002F],
    [0x00003A, 0x00DBFF],
    [0x00E000, 0x10FFFF]
  ]
});

Impact

12 tests across both modes. Small bucket but worth tracking — these are the only ParseErrors in built-ins/RegExp and they form a coherent group.

Likely root causes (to investigate)

Possibilities, in rough order of likelihood:

  1. includes: [regExpUtils.js] not loading correctly — the Test262 harness file isn't being assembled into the test program. Check Test262HarnessAssembler.cs.
  2. Object literal with array-of-2-element-arrays[[0x000000, 0x00002F], ...] could trip a parser ambiguity around [[ (array vs tagged template).
  3. Hex code-point range exceeding BMP0x10FFFF is outside BMP. Parser may handle code-point literals differently than codepoint values used as array elements.
  4. String size — the eventual buildString(...) produces a string of ~1.1M characters. Some interpreter path may not handle strings that large; unlikely at parse time, but worth checking if the helper inlines the result.

Suggested approach

  1. Run one failing test directly: dotnet run -- test/built-ins/RegExp/CharacterClassEscapes/character-class-digit-class-escape-negative-cases.js. Capture the actual parse-error diagnostic — it should localize the bug.
  2. Apply the targeted fix.
  3. Cross-check by running the other 11 tests (they'll likely all flip together).

Acceptance

  • All 12 ParseError tests in RegExp/CharacterClassEscapes/ advance to Pass or to a more specific bucket.
  • No regressions in the existing parser tests.

Related

Part of #69. Smallest but most narrowly-scoped of the surfaced clusters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    deferredDe-prioritized; not planned for active work (see tracking comment)enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions