Skip to content

CORE-9082#14

Merged
ethan-wrasman-pkware merged 2 commits into
masterfrom
ew/fix-shorthand
May 28, 2026
Merged

CORE-9082#14
ethan-wrasman-pkware merged 2 commits into
masterfrom
ew/fix-shorthand

Conversation

@ethan-wrasman-pkware
Copy link
Copy Markdown

@ethan-wrasman-pkware ethan-wrasman-pkware commented May 12, 2026

Add documentation for known limitations, and fix a long-standing bug (originally introduced in 2014) where shorthand classes inside a character class produced invalid generations.

[a-z\d] was rewritten by a post-pass replaceAll to [a-z[0-9]]. Brics, the underlying regex engine, does not support nested character classes — it parsed the outer [...] as the class [a-z[0-9] followed by a literal ], so every generated string ended with one or more stray ] characters.

Shorthand expansion now happens inline while normalizing the pattern, with awareness of whether the cursor is inside a character class:

  • Outside a class: \d → [0-9] (as before).
  • Inside a class: \d → 0-9 (class-body form, no nested brackets).
  • Negated shorthands (\D, \S, \W) inside a class expand to explicit complementary Unicode BMP ranges.

Also fixes a related off-by-one in [^X...] where the leading ^ was being emitted twice.

Added parameterized tests covering every shorthand in every position inside [...] (alone, with literal neighbors, with explicit ranges, in negated outer classes, under quantifiers), plus regression tests for every entry in the new LIMITATIONS.md.

Comment on lines +315 to +322
val generex = Generex("[\\W]")
repeat(10_000) {
val result = generex.random()
assertThat(result).hasLength(1)
val c = result[0]
val isWord = c == '_' || c in 'a'..'z' || c in 'A'..'Z' || c in '0'..'9'
assertThat(isWord).isFalse()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val generex = Generex("[\\W]")
repeat(10_000) {
val result = generex.random()
assertThat(result).hasLength(1)
val c = result[0]
val isWord = c == '_' || c in 'a'..'z' || c in 'A'..'Z' || c in '0'..'9'
assertThat(isWord).isFalse()
}
val generex = Generex("[\\W]")
val expected = Pattern.compile("\\W")
repeat(10_000) {
assertThat(generex.random()).matches(expected)
}

Same with other tests

ethan-wrasman-pkware and others added 2 commits May 28, 2026 07:17
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ethan-wrasman-pkware ethan-wrasman-pkware merged commit 8f35bd0 into master May 28, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants