Reduce manifest rotation for foreground metadata ops#14797
Conversation
Summary: Async WAL precreation in facebook#14738 / D105020559 was motivated by slow file creation time on remote storage. MANIFEST does not need the same precreation treatment as WAL because most MANIFEST writes come from background flush and compaction work, but user-facing metadata operations can still pay MANIFEST rotation file creation latency inline. File ingestion performance is a particular concern to some Meta users. Relax the effective MANIFEST rotation limit by 25% for MANIFEST write batches containing any foreground VersionEdit, while keeping background-only flush/compaction batches on the configured or auto-tuned limit. This covers column family manipulation, external file ingestion and import, and DeleteFilesInRange(s). SetOptions remains expected to avoid MANIFEST writes; the test keeps a regression guard for that behavior. The relaxation is intentionally bounded. It reduces the chance that foreground metadata operations create a new MANIFEST inline, while still allowing foreground operations to rotate once the current MANIFEST is beyond the relaxed threshold. Heavier blocking operations like manual Flush or CompactRange already trigger additional file creation and do not get this treatment here, though that could be reconsidered later. This should reduce a potential latency hazard of manifest file size auto-tuning: more frequent MANIFEST rotations. With this change, rotation latency is shifted toward background-only MANIFEST batches when possible. Test Plan: Expanded DBEtc3Test.AutoTuneManifestSize to cover the foreground threshold behavior and the original auto-tuning behavior in separate phases: - verifies foreground-only CreateColumnFamily writes get only bounded 25% headroom by asserting the first four large-CF additions do not rotate and the fifth does; - verifies auto-tuned background thresholds still prevent excessive rotation; - verifies foreground operations stay below the relaxed threshold for CreateColumnFamily, IngestExternalFile, CreateColumnFamilyWithImport, and DeleteFilesInRanges; - verifies SetOptions still does not write to MANIFEST; - verifies a following background flush still rotates at the normal threshold; - preserves the persisted compacted manifest size close/reopen coverage.
|
@pdillinger has imported this pull request. If you are a Meta employee, you can view this in D106578771. |
✅ clang-tidy: No findings on changed linesCompleted in 279.4s. |
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit a94c122 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit a94c122 SummaryClean, well-scoped change that reduces MANIFEST rotation latency for user-facing metadata operations by relaxing the rotation threshold by 25% for foreground edits. The implementation is correct, backwards compatible, and the High-severity findings (0): Full review (click to expand)Findings🔴 HIGHNo high-severity findings. 🟡 MEDIUMM1. Missing test coverage for mixed foreground + background batch —
|
| Context | Does code execute? | Assumptions hold? | Action needed? |
|---|---|---|---|
| WritePreparedTxnDB | YES (same LogAndApply) | YES | safe |
| ReadOnly DB | NO (no writes) | N/A | safe |
| SecondaryInstance | NO (tails MANIFEST) | N/A | safe |
| CompactionService | Compaction edits on primary | YES (background, not marked) | safe |
| User-defined timestamps | YES | YES (rotation independent of key format) | safe |
| Atomic groups (multi-CF ingest) | YES | YES (25% applied once per batch, not per edit) | safe |
| FIFO/Universal compaction | YES | YES (compaction not marked foreground) | safe |
Assumption stress-test results:
- "Bounded relaxation" claim: CONFIRMED. The 25% applies to the threshold once per
ProcessManifestWritescall, not per edit. - "Auto-tuning feedback loops" concern: REFUTED.
TuneMaxManifestFileSize()is only called fromUpdatedMutableDbOptions(), not per-write. Self-correcting. - "Batch CF operations missing foreground" concern: REFUTED.
IsForegroundOperation()includesIsColumnFamilyManipulation(), covering all CF add/drop automatically. - Integer overflow: Protected by
enforced_limit < (uint64_t{1} << 60)guard. Safe. - tuned_max = 0 case: enforced_limit = 0, rotation always happens. Correct.
Positive Observations
- Clean separation: The
is_foreground_operation_flag follows the exact same pattern asis_no_manifest_write_dummy_-- in-memory only, not serialized, with a setter and const getter. IsForegroundOperation()design: Clever use of|| IsColumnFamilyManipulation()to avoid needing to mark CF add/drop at every call site.- Test quality: The 5-phase test structure is well-organized. The
include_background_manifest_writeparameter elegantly allows testing foreground-only vs. mixed behavior. - Bounded by design: The 25% is intentionally small enough that manifest growth is bounded, while providing meaningful latency improvement for remote storage.
- No format changes: No MANIFEST format impact, fully backwards compatible.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
Summary:
Async WAL precreation in #14738 / D105020559 was motivated by slow file creation time on remote storage. MANIFEST does not need the same precreation treatment as WAL because most MANIFEST writes come from background flush and compaction work, but user-facing metadata operations can still pay MANIFEST rotation file creation latency inline. File ingestion performance is a particular concern to some Meta users.
Relax the effective MANIFEST rotation limit by 25% for MANIFEST write batches containing any foreground VersionEdit, while keeping background-only flush/compaction batches on the configured or auto-tuned limit. This covers column family manipulation, external file ingestion and import, and DeleteFilesInRange(s). SetOptions remains expected to avoid MANIFEST writes; the test keeps a regression guard for that behavior.
The relaxation is intentionally bounded. It reduces the chance that foreground metadata operations create a new MANIFEST inline, while still allowing foreground operations to rotate once the current MANIFEST is beyond the relaxed threshold. Heavier blocking operations like manual Flush or CompactRange already trigger additional file creation and do not get this treatment here, though that could be reconsidered later.
This should reduce a potential latency hazard of manifest file size auto-tuning: more frequent MANIFEST rotations. With this change, rotation latency is shifted toward background-only MANIFEST batches when possible.
Test Plan:
Expanded DBEtc3Test.AutoTuneManifestSize to cover the foreground threshold behavior and the original auto-tuning behavior in separate phases: