Skip to content

Pin Layout V5 default MLX pair for device evidence#14

Draft
RNT56 wants to merge 43 commits into
tq/integration-pin-mlx-productionfrom
tq/real-device-evidence-acceptance
Draft

Pin Layout V5 default MLX pair for device evidence#14
RNT56 wants to merge 43 commits into
tq/integration-pin-mlx-productionfrom
tq/real-device-evidence-acceptance

Conversation

@RNT56

@RNT56 RNT56 commented May 25, 2026

Copy link
Copy Markdown
Owner

Summary

  • Pins Pines to the Layout V5 default MLX pair:
    • mlx-swift 260c8fb16df772b8c20295529fde958fffb66369
    • mlx-swift-lm 13d3b35a9f6207fbf342c40ff7ff77cd6f0b9b5e
  • Updates the runtime compatibility pair ID, Xcode package lock, generated project, and TurboQuant docs/metadata.
  • Records that Layout V5 is the default device-test layout while Layout V4 remains available for legacy/comparison runs.

Validation

  • bash scripts/ci/check-mlx-package-pins.sh
  • swift build --disable-automatic-resolution
  • swift test --disable-automatic-resolution (189 Swift Testing tests)
  • swift run --disable-automatic-resolution PinesCoreTestRunner
  • bash scripts/ci/run-xcode-validation.sh all
    • iOS app build without signing passed
    • iOS test build passed
    • Pines unit smoke tests passed: 29 tests, 3 device-only skips
    • Pines UI smoke tests passed: 7 tests
  • git diff --check

Hardware Gate

devicectl sees GBU-12, iPhone 15 Pro Max (iPhone16,2), but it is unavailable; no online iPhone-class hardware was available in this workspace. This PR intentionally does not activate Verified, Certified, Fast mode, snapshot restore, or adaptive precision claims.

RNT56 commented May 25, 2026

Copy link
Copy Markdown
Owner Author

Real-device TurboQuant smoke run completed on the physical iPhone target.

Device:

  • Name: GBU-12
  • Model: iPhone 15 Pro Max (iPhone16,2)
  • Device ID: 00008130-00041C6E2EB8001C
  • iOS: 26.5 (23F77)
  • Architecture: arm64

Command:

xcodebuild -project Pines.xcodeproj \
  -scheme Pines \
  -destination 'platform=iOS,id=00008130-00041C6E2EB8001C' \
  -derivedDataPath build/DerivedDataDevice \
  -skipMacroValidation \
  -skipPackagePluginValidation \
  -onlyUsePackageVersionsFromResolvedFile \
  -disableAutomaticPackageResolution \
  -scmProvider system \
  -allowProvisioningUpdates \
  ONLY_ACTIVE_ARCH=YES \
  '-only-testing:PinesTests/MLXTurboQuantRuntimeSmokeTests' \
  '-skip-testing:PinesUITests' \
  test

Result:

  • MLXTurboQuantRuntimeSmokeTests: 10 tests passed, 0 failed
  • Physical-device Metal codec path covered by testHighBitSeedMetalCodecRoundTripWhenAvailable
  • Fixed high-bit seed device path covered by testTurboQuantCacheUsesFixedHighBitSeedOnDevice
  • Result bundle: build/DerivedDataDevice/Logs/Test/Test-Pines-2026.05.25_22-24-29-+0200.xcresult

Follow-up committed in 32e9ca9: hosted PinesTests on the app target because physical iOS devices cannot run tool-hosted XCTest bundles.

Scope note: this is a targeted real-device smoke pass, not a full BenchmarkReport.v1 acceptance tuple. It should not promote any model to Verified/Certified by itself.

RNT56 and others added 14 commits May 30, 2026 09:43
Use plain FP16 KV (faster, higher quality) whenever its uncompressed cache fits
the live memory budget; fall to TurboQuant only to reach contexts that otherwise
would not fit RAM — replacing the static min(ctx, 8192) plain-KV cap with a
memory-feasibility decision.

- MLXRuntimeBridge.kvCacheAdmission honors admission.recommendsPlainKVCache:
  returns plain FP16 at full admitted length (no 8K cap); conversion carries the
  flag through coreTurboQuantAdmission.
- LocalRuntimeAdmissionService.admit: FP16-first ladder (fp16KVBytesPerToken;
  FP16-full → [.fastest: shorter FP16] → TurboQuant) + recommendsPlainKVCache on
  the PinesCore type. +7 ladder tests (206 PinesCoreTests pass).
- Bump mlx-swift (aa4a071: cooperative coalesced QK decode, opt-in) and
  mlx-swift-lm (002ec99: recommendsPlainKVCache planner) pins.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
In-place KV-cache donation fix on append (was reallocating full-capacity buffers
per token — audit 1.3 OOM suspect). 69 cache tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
4.1: fallback decode uses fp16 scratch + an OOM guard (recoverable instead of crash).
2A: mid-generation FP16->compressed spill under memory pressure (the missing dynamic
half) — GenerateParameters.spillMemoryWatermarkBytes, default off (on-device tuning).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Picks up the mlx-swift-lm fork tip carrying the turbo3 bit-metadata fix (5.5)
and the new TurboQuantBench on-device A-series benchmark harness. Both are
behavior-neutral / test-only for the app (pines uses its own scheme enum without
turbo3; the app does not depend on the TurboQuantBench product), so this is a
sync + manifest-resolution bump.

Verified: full Pines app resolves mlx-swift-lm @ 3118d5b and builds green
(xcodebuild, iOS Simulator, -skipPackagePluginValidation -skipMacroValidation).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Captures the working-tree TurboQuant work (control-plane + evidence types, tests,
and the turboquant-implementation docs/baselines) as a green checkpoint at the
current MLX pin pair (mlx-swift 609e833 + mlx-swift-lm 725add5). PinesCore builds
and all 227 PinesCore tests pass, including TurboQuantPinDriftTests.

Note: the mlx-swift-lm pin bump to pick up the N2 self-speculation API
(makeGenerationIterator + GenerateParameters.selfSpeculationMode) is intentionally
NOT included here — it requires regenerating compatibility-pair.json via the
validation harness (the evidence artifact must not be hand-edited), which needs the
deferred A-series device run for full evidence. See the mlx-swift-lm overhaul
handoff (N2 Pines section) for the exact pin-coordination sequence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Documents that mlx-swift-lm 295e66b now exposes the N2 self-speculation product
API (GenerateParameters.selfSpeculationMode + makeGenerationIterator, bit-exact,
default-off) and mlx-swift adds the data-free Gaussian payload codec, and that
adopting them requires advancing the MLX pin pair + regenerating compatibility-
pair.json via the wave0 harness (not hand-edited) + the deferred A-series device
run. Self-speculation ships default-off (inert until enabled + device-validated).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant