feat(mongodb-storage)!: chunked multi-op bucket documents with range-merging compaction and invariant tests#617
feat(mongodb-storage)!: chunked multi-op bucket documents with range-merging compaction and invariant tests#617Sleepful wants to merge 63 commits into
Conversation
Renames all class, function, type, and collection accessor names in the duplicated v5 storage implementation from V3→V5: - MongoBucketBatchV3 → MongoBucketBatchV5 - MongoChecksumsV3 → MongoChecksumsV5 - MongoCompactorV3 → MongoCompactorV5 - MongoParameterCompactorV3 → MongoParameterCompactorV5 - MongoParameterLookupV3 → MongoParameterLookupV5 - MongoSyncBucketStorageV3 → MongoSyncBucketStorageV5 - PersistedBatchV3 → PersistedBatchV5 - SingleBucketStoreV3 → SingleBucketStoreV5 - SourceRecordStoreV3 → SourceRecordStoreV5 - VersionedPowerSyncMongoV3 → VersionedPowerSyncMongoV5 Also adds compressedBucketStorage to StorageConfig and wires up MongoSyncBucketStorageV5 selection in createMongoSyncBucketStorage. This is a pure mechanical rename with no behavior changes.
Change BucketDataDocumentV5 to store arrays of operations per document: - Add BucketOperationV5 interface with per-op fields including op_id - Add aggregated fields: min_op, checksum, count, size - Implement serializeBucketDataV5() to group ops and compute aggregates - Implement loadBucketDataDocumentV5() as generator yielding from ops array Add chunking logic in PersistedBatchV5.flushBucketData(): - Group operations by bucket then chunk by 1MB size threshold - Single-op chunks remain valid for backward compatibility Update read path in MongoSyncBucketStorageV5 to iterate merged docs. Update SingleBucketStoreV5 for new generator-based load function.
Overrides compactSingleBucket in MongoCompactorV5 to handle the compressed bucket storage model: 1. Reads all documents in a bucket sorted by _id.o ascending 2. Loads all ops via loadBucketDataDocumentV5() 3. Filters superseded operations using the same row_id tracking logic as v3 (newest-to-oldest pass, keeps only latest PUT/REMOVE per row) 4. Re-chunks surviving ops by 1MB data-size threshold 5. Replaces old documents with new chunked docs in a transaction 6. Updates bucket_state with recomputed checksums, counts, and bytes Unlike v3, v5 does not create MOVE/CLEAR ops during compaction. Instead, superseded ops are dropped and surviving ops are fully restructured into new documents.
…egation and activate v5 in test matrix - Override MongoChecksumsV5.computePartialChecksumsForCollection to use document-level checksum field instead of expanding ops arrays - Handle partial ranges correctly by filtering ops when start > min_op - Fix getBucketDataBatchV5 to respect op-level limits instead of document limits - Update PowerSyncMongo.versioned to create VersionedPowerSyncMongoV5 for v5 - Add STORAGE_VERSION_5 to SUPPORTED_STORAGE_VERSIONS and STORAGE_VERSION_CONFIG - Update getMongoStorageConfig to enable compressedBucketStorage for v5 - Fix v3-specific tests to only run on storageVersion == 3
🦋 Changeset detectedLatest commit: 03d8bd6 The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
f4f82ee to
b4d71e3
Compare
b4d71e3 to
755fad1
Compare
…tractMongoSyncBucketStorage and MongoSyncBucketStorageBase → MongoSyncBucketStorage
…ter to MongoParameterCompactor base class
Make collectionFilter() and deleteFilter() concrete in the base class
with the V3/V5 implementation (returns {} and {lookup, _id, key}
respectively). Remove the abstract keyword from the base class.
Delete the now-redundant V3 and V5 parameter compactor subclasses:
- v3/MongoParameterCompactorV3.ts
- v5/MongoParameterCompactorV5.ts
Update MongoSyncBucketStorageV3 and V5 to instantiate MongoParameterCompactor
directly, passing the collection lister callback inline.
…acks interface to separate file - Create common/MongoSyncBucketStorageCallbacks.ts with the full interface - Replace inline MongoSyncBucketStorageBaseCallbacks in MongoSyncBucketStorageBase.ts - Type _versionCallbacks as MongoSyncBucketStorageCallbacks in AbstractMongoSyncBucketStorage - Update v3 and v5 implementations to import from the new file - Use 'any' for createCompactor's storage parameter to avoid circular imports
Move getParameterSetsShared, getBucketDataBatchSharedWrapper, getDataBucketChangesShared, and getParameterBucketChangesShared from bucket-operations/storage-operations.ts into MongoSyncBucketStorageBase as private method implementations. Eliminate the context object pattern by accessing this.callbacks and this.group_id directly. Flatten the getBucketDataBatchShared -> getBucketDataBatchSharedWrapper chain into a single getBucketDataBatchImpl method. Delete the now-unused bucket-operations/storage-operations.ts.
… MongoCompactorV3 and V5
Extract identical types from v3/models.ts and v5/models.ts into a shared common/models.ts without version suffixes: - CurrentBucket - RecordedLookup - CurrentDataDocument - BucketParameterDocument - SourceTableDocument - BucketStateDocument - taggedBucketParameterDocumentToTagged Update v3/models.ts and v5/models.ts to re-export from common/models.ts, keeping only version-specific exports (BucketDataDocumentV3/V5, etc.). Update all imports across the codebase to use non-suffixed names from common/models.ts or version-specific names where appropriate. Update storage-index.ts to use explicit exports to avoid naming conflicts with v1/models.ts and models.ts.
2fa58c2 to
037a81e
Compare
- Add missing exports (SyncRuleConfigStateV3) to storage-index.ts - Fix db.ts versioned() to return VersionedPowerSyncMongoV3 for V3 - Fix MongoBucketBatch._db visibility (private -> protected) for subclass access - Fix SourceRecordStoreV3.ts to use shared serializeParameterLookup - Fix test file: use VersionedPowerSyncMongoV3, update method names (listSourceRecordCollectionsV3, parameterIndexV3) - Fix implicit any parameters in MongoPersistedSyncRulesContent and MongoBucketStorage - Make VersionedPowerSyncMongoV3 extend VersionedPowerSyncMongo for shared generic methods - Remove unused VersionedPowerSyncMongoClass import from db.ts All 343 module-mongodb-storage tests passing.
rkistner
left a comment
There was a problem hiding this comment.
Leaving some initial comments, mostly around compacting for now.
I'm still reviewing the other query changes.
| const docs = await this.db | ||
| .bucketData(this.group_id, resolvedDefinitionId) | ||
| .find({ '_id.b': bucket }) | ||
| .sort({ '_id.o': 1 }) | ||
| .toArray(); |
There was a problem hiding this comment.
A single bucket can be way too big to fit into memory: 1-10GB buckets are normal, potentially many millions of operations. Can you keep the "chunking" behavior used previously in queries? I.e. read one batch at a time, and write a chunk as soon as it reached the threshold. I'd also recommend not re-arranging existing documents too much unless the gains are significant enough. Specifically:
- Merging multiple small documents into one bigger one always makes sense, as long as it stays below the size thresholds.
- Generally avoid splitting up existing documents. For example, say you have documents of (100kb, 1mb, 100kb): In theory you can turn this into (1mb, 200kb), but that reshuffling may not be not worth it.
On the other hand, if there are many individual operations being compacted (turned into MOVE operations), you're re-writing the document anyway, so it might make sense to split it.
There was a problem hiding this comment.
True! Thanks for checking this, it's the kind of topic that the tests can't cover. I'm working on the streaming rearchitecture: batched reads with byte-based caps and scoped deletes. Will push for review soon.
| try { | ||
| await session.withTransaction( | ||
| async () => { | ||
| await bucketContext.collection.deleteMany({ '_id.b': bucket }, { session }); |
There was a problem hiding this comment.
There could have been new documents added while compacting, which this would remove. At minimum, this should filter by _id.o. But this section will most likely have to be rewritten to handle writing smaller parts at a time anyway (see comment earlier in this file).
Also note that transactions are great to ensure all these writes happen atomically, but try to limit the amount of work performed in a single transaction. This also relates to the earlier comment.
| // 3. Filter superseded operations using the same row_id logic as v3. | ||
| // We iterate newest-to-oldest and keep only the latest PUT/REMOVE per row. | ||
| const seen = new Map<string, bigint>(); | ||
| const surviving = new Array<BucketDataDoc | null>(allOps.length); |
There was a problem hiding this comment.
When we "keep" the latest PUT/REMOVE per row, we do still need to keep tombstone MOVE operations around (same op_id, same checksum, but remove the data), to keep the checksums intact.
See /docs/compacting-operations.md for details. It may be slightly outdated, but the concepts should still be relevant.
There was a problem hiding this comment.
Good catch, implemented in 4810a78
The V3 compactor now converts superseded PUT/REMOVE ops to MOVE tombstones instead of dropping them - same op_id, same checksum, op: 'MOVE', target_op pointing to the newer op, all data/identity fields stripped (data, table, row_id, source_table, source_key).
This preserves checksum integrity. The bucket-level checksum (sum(doc.checksum) across all documents) is invariant across compaction - every op keeps its original checksum, just with data: null on tombstones.
A few V3-specific details worth noting:
-
Per-op
target_opis not stored.serializeBucketDataaggregatestarget_opto the document level (max across all ops) and strips per-op values. The compactor creates per-optarget_opon tombstones during dedup, but serialization collapses them. Only the document-leveltarget_opsurvives. (same detail mentioned in the other comment) -
Document boundaries change on every compaction pass.
chunkBucketDatasizes bydatabytes. Tombstones contribute 0 bytes, so they pack densely - multiple tombstones and surviving PUTs may end up in the same rechunked document. Individual document checksums change, but the bucket's total checksum is preserved. -
Op count never decreases. Ops become tombstones but are never deleted. Storage shrinks because tombstones strip the large JSON payloads (typically ~50 bytes vs kilobytes per PUT), but the ops array stays the same length.
-
No CLEAR pass yet. V1 compactor has a CLEAR optimization that collapses leading MOVE/REMOVE sequences. V3 doesn't implement this yet - follow-up work.
Test coverage added in bf33d43b:
- Checksum preservation across compaction (single doc + multi-doc)
- Tombstones have
data: nulland pack densely after rechunking - Tombstones and surviving PUTs co-located in same output document
There was a problem hiding this comment.
oh yeah and updated the comment f960baf
| row_id: op.row_id, | ||
| checksum: op.checksum, | ||
| data: op.data, | ||
| target_op: null |
There was a problem hiding this comment.
This should probably use target_op from the parent doc?
There was a problem hiding this comment.
It is not necessary to inherit target_op from the chunked document into individual decoded ops; this value is not used by any code path. The only path where individual ops need a target_op is compaction's dedup pass, where the compactor sets target_op on MOVE tombstones it creates in-flight (pointing each tombstone to the newer op that superseded it). These values are then aggregated up to the document level by serializeBucketData(). Since dedup always recomputes from scratch, it doesn't need accurate per-op target_op from storage, it only needs the document-level aggregate, which is already available on BucketDataDocument.
There was a problem hiding this comment.
You're right that we don't need it on individual ops, only the aggregate (largest) values per chunk/document-level. But right now I don't see the document-level target_op being used anywhere, or am I missing it?
All I could find is this reference to the row-level target_op, which will always be null:
There was a problem hiding this comment.
But right now I don't see the document-level target_op being used anywhere, or am I missing it?
You are correct.
Even when you reviewed earlier, it was always null because I hadn't written the MOVE tombstones you pointed out in comment #3. Now the compactor sets target_op on MOVE tombstones at op-level:
// MongoCompactorV3.ts
surviving[i] = {
...op,
op: 'MOVE',
target_op: targetOp,
// ...fields stripped...
};serializeBucketData aggregates the max across ops in the chunk to document-level, and strips the per-op values from the stored ops[] array. So we end up with a single target_op per stored document, but no code path currently reads it.
Why store it: for the future CLEAR pass. V3 doesn't have CLEAR yet, but when it does, it'll need target_op at the document level -> same as V1's clearBucket() which tracks the max target_op across MOVE/REMOVE ops to set on the resulting CLEAR op.
| // $sort by _id | ||
| { $sort: { _id: 1 } }, |
There was a problem hiding this comment.
This $sort doesn't have any effect and can be removed.
In the previous version we used $sort for { $limit: batchLimit }, but that's not being used here anymore.
| return { | ||
| _id: { | ||
| $gt: { | ||
| b: request.bucket, | ||
| o: request.start ?? new bson.MinKey() | ||
| }, | ||
| $lte: { | ||
| b: request.bucket, | ||
| o: request.end | ||
| } | ||
| } | ||
| }; |
There was a problem hiding this comment.
With documents now representing ranges of operations, this filter is risky.
The $gt part is fine - if any operation in the document is > start, then o > start will also be true.
The $lte can filter too aggressively: If request.end falls in the middle of a chunk, it will exclude the chunk. As discussed in the design document, this query can work if we can guarantee that a checkpoint will never fall in the middle of a chunk, and that this filter is only used with checkpoints for request.end. And while that should be true in most cases, it is tricky to provide a hard guarantee when compacting, which can merge chunks.
Some options here:
- Expand the query to include the next document. A query in this form could work:
aggregate([
{ $match: { _id: { $gt: { ...}, $lte: { b: request.bucket, request.end }},
{
$unionWith: {
coll: ...,
pipeline: [
// this matches the next document, which may or may not be part of the requested range
// the downstream filters on individual operations should filter these out if not part of the requested range
{ $match: { a: { $gt: { b: request.bucket, request.end } } } },
{ $sort: { _id: 1 } },
{ $limit: 1 }
]
}
}
]);- Completely remove the
$ltefilter here, and fully rely on the downstream filters to filter out the later operations. This is the simplest approach, and may even be the most performant. This relies on the fact that we're always querying for a recent checkpoint, so there should not be massive amounts of additional data queried here.
There was a problem hiding this comment.
Edit: I misunderstood your comment initially and replied with regards to the MongoSyncBucketStorageV3, but here you are asking about MongoChecksumsV3. I am leaving this comment up because it might be useful, but refer to my next comment for proper reply.
Keen observation, indeed one of the new edge cases!
Phase 1.5 test case 8 in storage_sync.test.ts is exactly this scenario (d80f55bf):
start=30, checkpoint=40 → expected [40]
Document layout:
Doc A: ops [10, 20, 30] → _id.o=30, min_op=10
Doc B: ops [40, 50, 60] → _id.o=60, min_op=40
Doc C: ops [70, 80, 90] → _id.o=90, min_op=70
Checkpoint 40 falls in the middle of Doc B. The query must catch Doc B even though _id.o=60 > 40. With min_op <= 40, Doc B matches (min_op=40 <= 40), then extractRowsFromDocument filters to just [40].
This Test Case had originally failed, and it was fixed in a follow-up commit (037a81e7
).
This commit changed _id.o <= checkpoint to min_op <= checkpoint in bucket-document-format.ts
The invariant min_op <= _id.o holds after compaction too (compaction rechunks via serializeBucketData()@bucket-document-format.ts which always sets min_op to the smallest op), so this fix works regardless of chunk merging.
Does this solve the issue for you?
| const filters = Array.from(bucketMap.entries()).map(([bucket, start]) => ({ | ||
| '_id.b': bucket, | ||
| '_id.o': { $gt: start } | ||
| // MongoDB Filter<T> doesn't accept dotted field paths like '_id.o' in its type. | ||
| })) as unknown as lib_mongo.mongo.Filter<BucketDataDocument>[]; | ||
|
|
||
| const minStart = Array.from(bucketMap.values()).reduce((min, val) => (val < min ? val : min)); | ||
|
|
||
| const collection = this.db.bucketData<BucketDataDocument>(this.group_id, definitionId); | ||
| const formatAdapter = new BucketDocumentFormatAdapter(); | ||
| // MongoDB Filter<T> doesn't accept the $or operator in its type. | ||
| const filter = { $or: filters } as unknown as lib_mongo.mongo.Filter<BucketDataDocument>; | ||
| const context = { replicationStreamId: this.group_id, definitionId }; | ||
| const startOpId = minStart; | ||
| const endOpId = end; | ||
| const limit = remainingLimit; | ||
|
|
||
| const { filter: rangeFilter, cursorOptions } = formatAdapter.buildBucketDataQuery({ | ||
| startOpId, | ||
| endOpId, | ||
| remainingLimit: limit | ||
| }); | ||
|
|
||
| const combinedFilter = { | ||
| // MongoDB Filter<T> doesn't accept the $and operator in its type. | ||
| $and: [filter, rangeFilter] | ||
| } as unknown as lib_mongo.mongo.Filter<BucketDataDocument>; |
There was a problem hiding this comment.
This builds two related filters and $ands them together, but neither of them can use the _id index efficiently in the current form. I'd recommend using a single filter on _id instead, for example:
const filter = {
$or: Array.from(bucketMap.entries()).map(([bucket, start]) => ({
_id: {
$gt: {
b: bucket,
o: start
},
$lte: {
b: bucket,
o: new MaxKey()
}
},
min_op: { $lte: end } // Not sure whether it is better to have this here, or just filter out app-side
}))
}Note: There is a lot of overlap in how we're dealing with the query here versus in checksum calculations. One difference here is that we may query for a large range, but don't return the entire range to a client at a time. But I don't think that affects the query filters significantly.
There was a problem hiding this comment.
Oh! Keen! This is a new lesson for me on MongoDB query structure. Implemented here, semantically the query grabs the same data:
Resolves 2 content conflicts: - MongoBucketBatch.ts: accept both hooks and listSourceRecordCollections - AbstractMongoSyncBucketStorage.ts: combine _db/_checksums naming with upstream's storageConfig field Fixes auto-merge oversight: - MongoBucketBatchV3.ts: sourceTablesV3 -> sourceTables in markSnapshotDone() # Conflicts: # modules/module-mongodb-storage/src/storage/implementation/AbstractMongoSyncBucketStorage.ts # modules/module-mongodb-storage/src/storage/implementation/MongoBucketBatch.ts
The $sort between $match and $addFields had no effect on the pipeline result. Subsequent $addFields, $project, and $group stages are order-independent, and $group destroys ordering anyway. The final $sort after $group is kept for deterministic output ordering. Review feedback: rkistner on PR #617, comment #5.
Two serialization fidelity tests verifying loadBucketDataDocument() propagates doc.target_op to individual yielded ops. These should fail until the implementation fix is applied. Review feedback: rkistner on PR powersync-ja#617, comment powersync-ja#4.
Change from hardcoded null to doc.target_op ?? null so downstream consumers receive the document-level target_op value. Makes the target_op propagation tests pass. Review feedback: rkistner on PR powersync-ja#617, comment powersync-ja#4.
9f2c91e to
6c117a7
Compare
Merge per-bucket filter and range filter into a single $or with
compound _id range per bucket: { _id: { $gt: {b, o: start}, $lte:
{b, o: MaxKey()} }, min_op: { $lte: end } }. This uses the compound
{b, o} index efficiently — scoped to one bucket from the start instead
of a cross-bucket _id.o scan with separate bucket filtering.
Logically equivalent — all 13 Phase 1.5 read filtering tests pass.
Review feedback: rkistner on PR #617, comment #7.
Previously the V3 compactor dropped superseded ops entirely. This broke checksum integrity — any client synced before compaction would have a checksum that includes superseded ops, but the server's checksum no longer included them. Now superseded PUT/REMOVE ops are converted to MOVE tombstones: same op_id, same checksum, op type set to MOVE, target_op pointing to the newer op, data/identity fields stripped. This preserves the checksum total for all clients. Updated tests to assert MOVE tombstones instead of dropped ops. Updated the sync test to expect MOVE ops in the stream for V3 compacted data. Per-op target_op is not stored in V3 (aggregated to document level by serializeBucketData). Review feedback: rkistner on PR #617, comment #3.
Four new tests in 'V3 MOVE tombstone properties' describe block:
1. Checksum preserved across compaction with superseded ops in single doc
— verifies sum(doc.checksum) before == after when ops become tombstones
2. Checksum preserved across compaction with multiple input documents
— same invariant across doc boundaries
3. Tombstones have null data and pack densely after rechunking
— verifies MOVE ops have data:null, surviving PUTs keep data,
and all ops collapse into a single dense doc since tombstones
contribute 0 bytes to chunking size
4. Tombstones and survivors end up in same document after rechunking
— verifies checksum + co-location of MOVE and PUT ops in the same
output document after rechunking
9f68912 to
5803fdb
Compare
6a6bcd6 to
a5b1580
Compare
Lower-bound (GREEN): compacted_state.op_id=30 falls mid-document (min_op=10, _id.o=60). Pipeline's is_fully_included + $filter correctly sums only ops above the start boundary. Upper-bound (RED): Checkpoint=45 falls between ops in a document with _id.o=60, min_op=40. createBucketFilter uses _id.o <= 45 which excludes the document entirely, but the document contains op 40 which should be included in the checksum. Expected checksum=280 (op 40 only), got 0 (document excluded).
…ries
createBucketFilter used _id.$lte {b, o: end} which excluded multi-op
documents whose _id.o > end but whose min_op <= end. Change upper bound
from _id.o <= end to min_op <= end, and add endpoint filtering to the
checksum aggregation pipeline via bucket_end + $and on $filter
conditions.
buildPartialChecksumPipeline now filters ops by both o > bucket_start
AND o <= bucket_end, matching the lower-bound handling already in place.
is_fully_included updated to check both bounds.
a5b1580 to
03d8bd6
Compare
Summary
Replaces MongoDB bucket storage's single-operation-per-document model with chunked multi-operation documents. Operations are now grouped into BSON documents by a ~1MB data-size threshold, reducing document count and index overhead for workloads with many small rows. The change includes range-merging compaction (rebuild from survivors instead of in-place mutation), document-level checksum aggregation, and a comprehensive edge-case test suite verifying data integrity invariants.
This is a breaking change for existing MongoDB storage deployments — databases using the previous single-op document format are not compatible. No migration path is provided.
What Changed
1. Collapse Dual-Version Abstraction
During development, two document formats coexisted behind an abstraction layer. This PR removes the abstraction and all code for the discarded format, leaving a single direct implementation.
Deleted:
v5/directory and all adapter files (was the alternate/new format during development)document-formats/v3-format.ts— single-op format codedocument-formats/format-interface.ts— dual-format abstraction interfacecommon/MongoSyncBucketStorageCallbacks.ts— callback indirection layerv3/models.tsandv5/models.tsre-export layersVersionedPowerSyncMongowrappers — storage now usesPowerSyncMongodirectlyRenamed:
document-formats/v5-format.ts→document-formats/bucket-document-format.tsBucketDataDocumentV5→BucketDataDocumentBucketOperationV5→BucketOperationArchitecture before:
Architecture after:
2. Chunked Multi-Op Document Format
The previous model stored exactly one operation per MongoDB document. For workloads with many small rows, this created excessive document and index overhead.
New document shape:
BucketDataDocumentstores anops[]array plus aggregated metadata:_id.o= maximumop_idin the document (used for range queries)min_op= minimumop_idcount= number of operationschecksum= sum of operation checksumssize= total byte size of operation datatarget_op= maximumtarget_opacross operationsChunking: The write path groups pending operations by bucket, then chunks them into documents by a 1MB data-size threshold. Each chunk becomes one
BucketDataDocument. Single-operation chunks remain valid.Read path:
getBucketDataBatch()queries by_id.orange, then post-filters individual operations within partially overlapping documents. Operations outside(start, checkpoint]are skipped.Compaction: Instead of modifying documents in-place (previously PUT→MOVE, collapse to CLEAR), the compactor now takes a "rebuild from survivors" approach:
table/row_id/source)Checksums:
computePartialChecksumsForCollection()uses the pre-computed document-levelchecksumaggregate for fully-included documents. Only partially-included documents fall back to iterating individual operations.Glossary
Fully included document: min_op > start. Example: document covers [40, 60], client asks for (30, 55]. Since min_op=40 > 30, every op in this document is within the client's range. The pipeline uses the pre-computed checksum field on the document — no need to iterate individual ops.
Partially included document: min_op <= start. Example: document covers [40, 60], client asks for (45, 55]. Since min_op=40 <= 45, some ops at the beginning of the document (40, 45) are outside the range. The pipeline can't use the pre-computed checksum — it must filter individual ops in the ops[] array and sum only those with o > start.
3. Edge Case Hardening & Invariant Tests
Comprehensive test suite verifying data integrity invariants under boundary conditions:
Read Filtering Boundaries (
storage_sync.test.ts) — 13 test cases covering all combinations ofstartandcheckpointpositions relative to document boundaries:Compaction Boundaries (
storage_compacting.test.ts) — 8 test cases:row_idspanning document boundariesGlossary
Rechunking is the process of grouping the surviving ops into new documents using chunkBucketData() — the same function used during normal writes. It groups ops by data size (1MB threshold), creating as many new documents as needed.
Invariant Verification Tests (
storage_compacting.test.ts) — 19 unit + integration tests:ops[]ordering preserved after serialization and compaction_id.o = max_op,min_op = min_op,count = ops.length,checksum = sum(op.checksum),size = sum(data.length))target_opcorrectness (max of non-nulltarget_opvalues)_id.oinvariant (equals max op in document)addChecksums)maxOpIdfiltering (ops above limit excluded)Breaking Changes
MongoDB storage: Existing deployments using the previous single-operation-per-document format are not compatible with this change. This requires a fresh deployment or manual migration (not provided).
V1 storage is unaffected.
Test Results
All existing parameterized tests continue to pass. New edge-case tests pass with no regressions.
Key Files Changed
Detailed description per file
Files Changed
.changeset/service-coreandmodule-mongodb-storagefor the chunked multi-op document format.modules/module-mongodb-storage/src/storage/common/models.ts,bucket-operations/*) and consolidated type names.modules/module-mongodb-storage/src/storage/implementation/Core storage layer. The abstract base class and shared infrastructure live here; V1 and V3 specifics are in their respective subdirectories.
MongoSyncBucketStorageV3for V3 storage.versioned()factory returning the appropriateVersionedPowerSyncMongoper storage version.common/models.ts.target_opduring MOVE and CLEAR phases.collectionFilter()anddeleteFilter()implementations. Used directly by both V1 and V3.VersionedPowerSyncMongocollection accessors.common/models.tsanddocument-formats/bucket-document-format.ts.modules/module-mongodb-storage/src/storage/implementation/bucket-operations/Shared helpers extracted from the write path, compaction pipeline, and read path. All new files.
checksumfield onBucketDataDocumentfor fully-included documents; falls back to iteratingops[]for partially-included ones.chunkBucketData()groups ops into documents by a 1MB data-size threshold. Single oversized ops get their own chunk. Used by both the write path and compaction rechunking.table/row_id(newest-first), and rebuilding survivor documents.(start, checkpoint]range query usingmin_opfor the upper bound to catch documents that straddle the range boundary.SourceRecordStoreimplementation using shared collection accessors.modules/module-mongodb-storage/src/storage/implementation/collection-access/VersionedPowerSyncMongo. Provides typed access to bucket data, source records, parameter indexes, and source tables.modules/module-mongodb-storage/src/storage/implementation/common/Shared types and base classes used across V1 and V3.
CurrentBucket,RecordedLookup,CurrentDataDocument,BucketParameterDocument,SourceTableDocument,BucketStateDocument.serializeBucketData().modules/module-mongodb-storage/src/storage/implementation/document-formats/The chunked multi-op document format.
BucketDataDocumentstores anops[]array with aggregated metadata (_id.o,min_op,count,checksum,size,target_op).serializeBucketData()groups ops and computes aggregates.buildBucketDataQuery()constructs range queries withmin_opupper bound.extractRowsFromDocument()post-filters individual ops within partially overlapping documents.modules/module-mongodb-storage/src/storage/implementation/v1/V1 (single-op document format) is structurally updated to inline shared logic but has no functional changes.
modules/module-mongodb-storage/src/storage/implementation/v3/Primary V3 implementation using chunked multi-op documents.
buildBucketDataQuery()withmin_opupper bound andextractRowsFromDocument()for post-filtering; write path delegates toMongoBucketBatchV3.bucket-operations/helpers.table/row_id(newest-first), rechunks survivors by 1MB threshold, replaces old documents in a transaction.checksumfield; partially-included documents iterateops[].PersistedBatchShared.SourceRecordStoreImpl.VersionedPowerSyncMongodirectly.common/models.tswith V3-specific types kept locally.Deleted:
MongoParameterCompactor.document-formats/parameter-lookup.ts.modules/module-mongodb-storage/src/utils/modules/module-mongodb-storage/test/src/(start, checkpoint]semantics with pre-inserted documents. Existing V3 tests updated to use shared types.compressedBucketStorageflag to V3 test config.modules/module-postgres-storage/test/src/compressedBucketStorage: falseto test config.compressedBucketStorageflag to shared test registration.packages/service-core-tests/src/tests/compressedBucketStorageflag for conditional assertions on multi-op vs single-op document shapes.compressedBucketStorageflag for document format assertions.packages/service-core/src/storage/compressedBucketStorageboolean toTestStorageConfig. Controls whether shared tests assert multi-op document shapes.createMongoSyncBucketStorage.ts,db.tsv3/MongoSyncBucketStorageV3.ts,v3/MongoCompactorV3.ts,v3/MongoChecksumsV3.ts,v3/PersistedBatchV3.ts,v3/MongoBucketBatchV3.tsbucket-operations/chunking.ts,bucket-operations/batch-write.ts,bucket-operations/checksum-aggregation.ts,bucket-operations/compaction-scaffolding.ts,bucket-operations/query-builders.tsdocument-formats/bucket-document-format.ts,document-formats/parameter-lookup.tscommon/models.ts,common/BucketDataDoc.tsAbstractMongoSyncBucketStorage.ts,MongoSyncBucketStorage.tstest/src/storage_sync.test.ts,test/src/storage_compacting.test.ts.changeset/wild-pears-sing.mdFollow-up Work