Skip to content

Commit f9172e1

Browse files
committed
Document non-obvious lexer dispatch conditions
Three review-noted spots that were terse in the code: - The remaining_tokens() loop guard now spells out why both EOF and `null === token_type && bytes_already_read > 0` are needed (EOF on clean end-of-input vs invalid byte mid-stream, with the `> 0` guard letting the very first iteration through). - The identifier/keyword fast path now explains `$byte > "\x7F"` (UTF-8 multi-byte starter; MySQL identifiers allow U+0080-U+FFFF) and `next_byte !== "'"` (only single quotes form the special hex/bin/n-char literal starters; `"` never does, regardless of SQL mode). No behavior change.
1 parent 30c557c commit f9172e1

1 file changed

Lines changed: 6 additions & 1 deletion

File tree

packages/mysql-on-sqlite/src/mysql/class-wp-mysql-lexer.php

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2301,6 +2301,8 @@ public function remaining_tokens(): array {
23012301
);
23022302

23032303
while ( true ) {
2304+
// Bail on EOF, or on a null token type once at least one byte has
2305+
// been consumed (read_next_token() hit invalid input mid-stream).
23042306
if (
23052307
self::EOF === $this->token_type
23062308
|| ( null === $this->token_type && $this->bytes_already_read > 0 )
@@ -2421,7 +2423,10 @@ private function read_next_token(): ?int {
24212423
);
24222424

24232425
// Fast path for keywords and identifiers.
2424-
// These are the most common token types in MySQL payloads.
2426+
// `$byte > "\x7F"` catches UTF-8 multi-byte starters (U+0080-U+FFFF).
2427+
// `"'" !== $next_byte` defers x'..', n'..' and similar special
2428+
// literals to their dedicated branches below; only single quotes
2429+
// form those, regardless of SQL mode.
24252430
if (
24262431
(
24272432
( $byte >= 'a' && $byte <= 'z' )

0 commit comments

Comments
 (0)