Skip to content

Commit 0be2a68

Browse files
committed
Patch Doltlite to preserve original bytes on lossy-collation index entries
prollyBtCursorInsert stored no value for non-INTKEY inserts where every record field fits into the sort key (splitKey=0), relying on getCursorPayload to reconstruct the record from the sort key bytes on read. That round-trip is lossless for BINARY but the sort-key encoder in sortkey.c folds A-Z to a-z under NOCASE and strips trailing spaces under RTRIM, so when SQLite picks the auto-index as a covering index for a plain column read (e.g. `SELECT pkcol FROM t` on a composite-PK rowid table with a NOCASE pkcol), OP_Column emits the folded bytes instead of the original column value. Reproduces testAlterTableModifyColumnComplexChange — the driver rebuilds the table via CREATE/INSERT/RENAME and then reads back 'johnny' where it stored 'Johnny'. Patch: when any pKeyInfo collation is non-identity (NOCASE or RTRIM), preserve pPayload->pKey as the value alongside the sort key. getCursorPayload prefers the stored value over reconstruction, so reads return the original bytes. No behavior change for BINARY- only records.
1 parent e81b946 commit 0be2a68

2 files changed

Lines changed: 100 additions & 3 deletions

File tree

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
Preserve the original record when the sort-key encoding is lossy
2+
3+
For a non-INTKEY cursor, prollyBtCursorInsert normally picks between
4+
two storage shapes:
5+
6+
- splitKey=1 — sort key holds a prefix of the record and the rest of
7+
the record is stored as the value (WITHOUT ROWID tables, secondary
8+
indexes);
9+
- splitKey=0 — sort key holds every field and no value is stored.
10+
11+
On read, getCursorPayload prefers the stored value; if none is
12+
present it reconstructs a SQLite record from the sort key bytes. That
13+
round-trip is lossless for BINARY collations but the sort-key encoder
14+
in sortkey.c lower-cases A-Z under NOCASE and strips trailing spaces
15+
under RTRIM.
16+
17+
When SQLite picks the auto-index as a covering index for a plain
18+
column read — e.g. `SELECT pkcol FROM t` on a composite-PK rowid
19+
table with a NOCASE pkcol — OP_Column reads the reconstructed (folded)
20+
bytes from getCursorPayload and returns them in place of the original
21+
column value. Reproduces on:
22+
23+
CREATE TABLE t (ID INT, pkcol TEXT COLLATE NOCASE, extra TEXT,
24+
PRIMARY KEY (ID, pkcol));
25+
INSERT INTO t VALUES (1, 'Johnny', 'x');
26+
SELECT pkcol FROM t; -- 'johnny' (wrong) without fix
27+
SELECT * FROM t; -- 'Johnny' (right) both ways
28+
29+
Fix: when any field in pKeyInfo uses a collation whose sort-key
30+
encoding is not a bijection over the original bytes (NOCASE, RTRIM),
31+
preserve the original record as the value alongside the sort key.
32+
getCursorPayload then prefers the original bytes over reconstruction.
33+
No behavior change when all collations are BINARY: the value side
34+
stays empty and storage footprint is unchanged.
35+
36+
--- a/src/prolly_btree.c
37+
+++ b/src/prolly_btree.c
38+
@@ -4716,7 +4716,33 @@
39+
pCur->pKeyInfo,
40+
&pSortKey, &nSortKey);
41+
if( rc==SQLITE_OK ){
42+
- if( splitKey ){
43+
+ /* When every field of the record is folded into the sort key
44+
+ ** (splitKey==0) we would normally store no value and, on read,
45+
+ ** reconstruct the SQLite record from the sort key. That is
46+
+ ** lossless for BINARY collations but the sort-key encoder
47+
+ ** lower-cases A-Z under NOCASE and strips trailing spaces
48+
+ ** under RTRIM. When SQLite picks the auto-index as a covering
49+
+ ** index for a plain column read (e.g. `SELECT pkcol FROM t`
50+
+ ** on a composite-PK rowid table with NOCASE pkcol), OP_Column
51+
+ ** reads the reconstructed (folded) bytes and returns them to
52+
+ ** the caller in place of the original column value. Preserve
53+
+ ** the original record as the value whenever any field in
54+
+ ** pKeyInfo uses a lossy collation so the read path prefers
55+
+ ** the original bytes over the reconstructed ones. */
56+
+ int lossy = 0;
57+
+ if( pCur->pKeyInfo ){
58+
+ int i;
59+
+ for( i = 0; i < pCur->pKeyInfo->nAllField; i++ ){
60+
+ CollSeq *pColl = pCur->pKeyInfo->aColl[i];
61+
+ if( pColl && pColl->zName
62+
+ && ( sqlite3StrICmp(pColl->zName, "NOCASE")==0
63+
+ || sqlite3StrICmp(pColl->zName, "RTRIM")==0 ) ){
64+
+ lossy = 1;
65+
+ break;
66+
+ }
67+
+ }
68+
+ }
69+
+ if( splitKey || lossy ){
70+
rc = prollyMutMapInsert(pCur->pMutMap,
71+
pSortKey, nSortKey, 0,
72+
(const u8*)pPayload->pKey, (int)pPayload->nKey);

.github/workflows/phpunit-tests-doltlite.yml

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ jobs:
3232
path: doltlite-src
3333

3434
- name: Patch Doltlite for mysql-on-sqlite compatibility
35-
# Narrow, verifiable source-level patches applied before build:
35+
# Four narrow, verifiable source-level patches applied before build:
3636
#
3737
# 1) src/build.c — neutralize the auto-conversion of composite/TEXT-PK
3838
# tables to WITHOUT ROWID, so the driver's `ORDER BY ROWID`
@@ -55,6 +55,13 @@ jobs:
5555
# entry check the mutmap for a DELETE on that tree key; if
5656
# found, skip and advance. Without this, INSERT+DELETE+UPDATE
5757
# in one transaction trips "database disk image is malformed".
58+
#
59+
# 4) src/prolly_btree.c — preserve the original record bytes as
60+
# the index value when any pKeyInfo collation is lossy
61+
# (NOCASE, RTRIM). Without this, auto-index covering reads
62+
# reconstruct a SQLite record from the sort key and return
63+
# folded bytes (e.g. 'johnny' instead of 'Johnny') for plain
64+
# column reads on composite-PK rowid tables.
5865
run: |
5966
BUILD=doltlite-src/src/build.c
6067
MOVETO=doltlite-src/src/prolly_btree.c
@@ -74,8 +81,12 @@ jobs:
7481
patch -d doltlite-src -p1 --no-backup-if-mismatch \
7582
< .github/doltlite-patches/mergeScan-check-tree-delete.patch
7683
84+
# --- Patch 4: preserve original record bytes on lossy-collation index entries ---
85+
patch -d doltlite-src -p1 --no-backup-if-mismatch \
86+
< .github/doltlite-patches/preserve-original-value-when-sortkey-lossy.patch
87+
7788
echo "--- Patched lines ---"
78-
grep -n 'patched off in CI\|savedEqSeen\|walked past' "$BUILD" "$MOVETO"
89+
grep -n 'patched off in CI\|savedEqSeen\|walked past\|lossy = 1' "$BUILD" "$MOVETO"
7990
8091
- name: Build Doltlite shared library
8192
run: |
@@ -120,7 +131,7 @@ jobs:
120131
echo "ldd pdo_sqlite:"
121132
ldd "$(php -r 'echo ini_get("extension_dir");')/pdo_sqlite.so"
122133
123-
- name: Verify Doltlite is active and patches took effect
134+
- name: Verify Doltlite is active and both patches took effect
124135
run: |
125136
VERSION=$(php -r 'echo (new PDO("sqlite::memory:"))->query("SELECT SQLITE_VERSION()")->fetch()[0];')
126137
ENGINE=$(php -r 'echo (new PDO("sqlite::memory:"))->query("SELECT doltlite_engine()")->fetch()[0];')
@@ -183,6 +194,20 @@ jobs:
183194
echo "::error::mergeScan patch did not take effect: expected 'x', got '${VAL}'"
184195
exit 1
185196
fi
197+
# 4) lossy-collation patch: a covering-index read on a
198+
# composite-PK rowid table with a NOCASE text column no
199+
# longer returns folded bytes.
200+
NAME=$(php -r '
201+
$db = new PDO("sqlite::memory:");
202+
$db->exec("CREATE TABLE t (id INT, pkcol TEXT COLLATE NOCASE, extra TEXT, PRIMARY KEY (id, pkcol))");
203+
$db->exec("INSERT INTO t VALUES (1, \"Johnny\", \"x\")");
204+
echo $db->query("SELECT pkcol FROM t")->fetch()[0];
205+
' 2>&1)
206+
echo "covering-index pkcol = ${NAME}"
207+
if [ "${NAME}" != "Johnny" ]; then
208+
echo "::error::lossy-collation patch did not take effect: expected 'Johnny', got '${NAME}'"
209+
exit 1
210+
fi
186211
187212
- name: Install Composer dependencies (root)
188213
uses: ramsey/composer-install@v3

0 commit comments

Comments
 (0)