Pin Layout V5 default MLX pair for device evidence#14
Draft
RNT56 wants to merge 43 commits into
Draft
Conversation
Owner
Author
|
Real-device TurboQuant smoke run completed on the physical iPhone target. Device:
Command: xcodebuild -project Pines.xcodeproj \
-scheme Pines \
-destination 'platform=iOS,id=00008130-00041C6E2EB8001C' \
-derivedDataPath build/DerivedDataDevice \
-skipMacroValidation \
-skipPackagePluginValidation \
-onlyUsePackageVersionsFromResolvedFile \
-disableAutomaticPackageResolution \
-scmProvider system \
-allowProvisioningUpdates \
ONLY_ACTIVE_ARCH=YES \
'-only-testing:PinesTests/MLXTurboQuantRuntimeSmokeTests' \
'-skip-testing:PinesUITests' \
testResult:
Follow-up committed in Scope note: this is a targeted real-device smoke pass, not a full |
added 27 commits
May 25, 2026 23:20
Use plain FP16 KV (faster, higher quality) whenever its uncompressed cache fits the live memory budget; fall to TurboQuant only to reach contexts that otherwise would not fit RAM — replacing the static min(ctx, 8192) plain-KV cap with a memory-feasibility decision. - MLXRuntimeBridge.kvCacheAdmission honors admission.recommendsPlainKVCache: returns plain FP16 at full admitted length (no 8K cap); conversion carries the flag through coreTurboQuantAdmission. - LocalRuntimeAdmissionService.admit: FP16-first ladder (fp16KVBytesPerToken; FP16-full → [.fastest: shorter FP16] → TurboQuant) + recommendsPlainKVCache on the PinesCore type. +7 ladder tests (206 PinesCoreTests pass). - Bump mlx-swift (aa4a071: cooperative coalesced QK decode, opt-in) and mlx-swift-lm (002ec99: recommendsPlainKVCache planner) pins. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
In-place KV-cache donation fix on append (was reallocating full-capacity buffers per token — audit 1.3 OOM suspect). 69 cache tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
4.1: fallback decode uses fp16 scratch + an OOM guard (recoverable instead of crash). 2A: mid-generation FP16->compressed spill under memory pressure (the missing dynamic half) — GenerateParameters.spillMemoryWatermarkBytes, default off (on-device tuning). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Picks up the mlx-swift-lm fork tip carrying the turbo3 bit-metadata fix (5.5) and the new TurboQuantBench on-device A-series benchmark harness. Both are behavior-neutral / test-only for the app (pines uses its own scheme enum without turbo3; the app does not depend on the TurboQuantBench product), so this is a sync + manifest-resolution bump. Verified: full Pines app resolves mlx-swift-lm @ 3118d5b and builds green (xcodebuild, iOS Simulator, -skipPackagePluginValidation -skipMacroValidation). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Captures the working-tree TurboQuant work (control-plane + evidence types, tests, and the turboquant-implementation docs/baselines) as a green checkpoint at the current MLX pin pair (mlx-swift 609e833 + mlx-swift-lm 725add5). PinesCore builds and all 227 PinesCore tests pass, including TurboQuantPinDriftTests. Note: the mlx-swift-lm pin bump to pick up the N2 self-speculation API (makeGenerationIterator + GenerateParameters.selfSpeculationMode) is intentionally NOT included here — it requires regenerating compatibility-pair.json via the validation harness (the evidence artifact must not be hand-edited), which needs the deferred A-series device run for full evidence. See the mlx-swift-lm overhaul handoff (N2 Pines section) for the exact pin-coordination sequence. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Documents that mlx-swift-lm 295e66b now exposes the N2 self-speculation product API (GenerateParameters.selfSpeculationMode + makeGenerationIterator, bit-exact, default-off) and mlx-swift adds the data-free Gaussian payload codec, and that adopting them requires advancing the MLX pin pair + regenerating compatibility- pair.json via the wave0 harness (not hand-edited) + the deferred A-series device run. Self-speculation ships default-off (inert until enabled + device-validated). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
mlx-swift260c8fb16df772b8c20295529fde958fffb66369mlx-swift-lm13d3b35a9f6207fbf342c40ff7ff77cd6f0b9b5eValidation
bash scripts/ci/check-mlx-package-pins.shswift build --disable-automatic-resolutionswift test --disable-automatic-resolution(189 Swift Testing tests)swift run --disable-automatic-resolution PinesCoreTestRunnerbash scripts/ci/run-xcode-validation.sh allgit diff --checkHardware Gate
devicectlseesGBU-12, iPhone 15 Pro Max (iPhone16,2), but it isunavailable; no online iPhone-class hardware was available in this workspace. This PR intentionally does not activateVerified,Certified, Fast mode, snapshot restore, or adaptive precision claims.