Context
Follow-up to #693, which added RewriteEcmaScriptShorthands (a JS→.NET pattern rewrite shared by interpreter SharpTSRegExp and emitted $RegExp) to pin the shorthand class escapes to the JS sets. #693 deliberately scoped the rewrite to:
- Positive shorthands (
\d \w \s) — expanded everywhere (inside and outside [...]).
- Negated shorthands (
\D \W \S) — expanded only outside a class. Inside a class they pass through to .NET unchanged (ExpandShorthand returns null for D/W/S when inClass).
This issue tracks the remaining inside-a-class negated divergence.
Repro (both interpreter and compiled modes)
/[\W]/.test("İ") // İ — JS: true, SharpTS: false (.NET \w over-matches U+0130, so [\W] under-matches it)
/[\S]/.test(" ") // NBSP — JS: false, SharpTS: true (.NET narrow \s, so [\S] over-matches every Unicode WhiteSpace)
/[\S]/.test(" ") // IDEO space — JS: false, SharpTS: true
/[\S]/.test("
") // LS — JS: false, SharpTS: true
The positive in-class forms are already correct after #693:
/[\w]/.test("İ") // JS: false, SharpTS: false ✓
/[\s]/.test(" ") // JS: true, SharpTS: true ✓
/[^\w]/.test("İ") // JS: true, SharpTS: true ✓ (negated *class* with a positive shorthand inside — handled)
So the gap is strictly: a negated shorthand (\D/\W/\S) appearing inside a [...] character class. (\D is mostly harmless since .NET ECMAScript \d is already [0-9]; \W and \S carry the real divergence.)
Why it was deferred
A negated shorthand inside a class can't be expressed as a plain character set without engine-level class nesting/union-of-negation, which .NET's regex syntax doesn't offer in general. .NET does support class subtraction ([base-[subtract]]), so the sole-element case is tractable — e.g. [\S] → [^<wsSet>], [\W] → [^A-Za-z0-9_] — but the union case is not clean: [a\S] = a ∪ \S, and since \S already includes a, that collapses to \S ("anything but whitespace"); a general [X\S] has no simple rewrite. A correct general fix needs either careful case analysis (sole-element vs union, and interaction with other class members / ranges / ^ negation) or a different matching strategy.
Suggested approach
- Handle the common sole-element case in
RewriteEcmaScriptShorthands: when a class body is exactly one negated shorthand (optionally with a leading ^), rewrite to the equivalent [^…]/[…]. This alone fixes the [\W]/[\S] repros above.
- Leave the union case (
[x\S], multiple negated shorthands, ranges mixed in) for a later pass, or document it as a known edge.
Pointers
Acceptance
/[\W]/.test("İ") === true; /[\S]/.test(" ") === false in both modes.
- No regressions in
RegExp/ or String/ Test262 baselines.
Context
Follow-up to #693, which added
RewriteEcmaScriptShorthands(a JS→.NET pattern rewrite shared by interpreterSharpTSRegExpand emitted$RegExp) to pin the shorthand class escapes to the JS sets. #693 deliberately scoped the rewrite to:\d \w \s) — expanded everywhere (inside and outside[...]).\D \W \S) — expanded only outside a class. Inside a class they pass through to .NET unchanged (ExpandShorthandreturnsnullforD/W/SwheninClass).This issue tracks the remaining inside-a-class negated divergence.
Repro (both interpreter and compiled modes)
The positive in-class forms are already correct after #693:
So the gap is strictly: a negated shorthand (
\D/\W/\S) appearing inside a[...]character class. (\Dis mostly harmless since .NET ECMAScript\dis already[0-9];\Wand\Scarry the real divergence.)Why it was deferred
A negated shorthand inside a class can't be expressed as a plain character set without engine-level class nesting/union-of-negation, which .NET's regex syntax doesn't offer in general. .NET does support class subtraction (
[base-[subtract]]), so the sole-element case is tractable — e.g.[\S]→[^<wsSet>],[\W]→[^A-Za-z0-9_]— but the union case is not clean:[a\S]=a ∪ \S, and since\Salready includesa, that collapses to\S("anything but whitespace"); a general[X\S]has no simple rewrite. A correct general fix needs either careful case analysis (sole-element vs union, and interaction with other class members / ranges /^negation) or a different matching strategy.Suggested approach
RewriteEcmaScriptShorthands: when a class body is exactly one negated shorthand (optionally with a leading^), rewrite to the equivalent[^…]/[…]. This alone fixes the[\W]/[\S]repros above.[x\S], multiple negated shorthands, ranges mixed in) for a later pass, or document it as a known edge.Pointers
Runtime/Types/SharpTSRegExp.cs—RewriteEcmaScriptShorthands/ExpandShorthand.Compilation/RuntimeEmitter.TSRegExp.cs—EmitTSRegExpRewriteShorthands/EmitTSRegExpExpandShorthand(must stay BCL-only / standalone).iu-mode case folding (K U+212A, ſ U+017F folding into\w).Acceptance
/[\W]/.test("İ") === true;/[\S]/.test(" ") === falsein both modes.RegExp/orString/Test262 baselines.