Skip to content

Bump dflash-mlx from 0.1.0 to 0.1.6#69

Closed
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/dflash-mlx-0.1.6
Closed

Bump dflash-mlx from 0.1.0 to 0.1.6#69
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/dflash-mlx-0.1.6

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot Bot commented on behalf of github May 16, 2026

Bumps dflash-mlx from 0.1.0 to 0.1.6.

Release notes

Sourced from dflash-mlx's releases.

dflash-mlx v0.1.6

Large runtime, server, and agentic-workflow release since v0.1.5, including the v0.1.5.1 fixes.

Highlights

  • Reworked runtime ownership around typed runtime config, RuntimeBundle, ServerRuntime, target adapters, draft loading, cache management, and observability.
  • Default verify policy is now adaptive; fixed DFlash verification is available as --verify-mode dflash.
  • Added explicit verify modes: adaptive, dflash, ddtree, and off.
  • Added DDTree branch verification mode for Qwen target paths.
  • Added internal CopySpec candidate reuse for repeated-token continuation from prompt/generated history.
  • Added target-owned Qwen and Gemma4 backend routing, with unknown model families failing closed instead of falling into generic logic.
  • Added Gemma4 adapter support for cache construction, logits, hidden capture, GQA routing, and guarded prefix snapshots.
  • Added minimal Qwen3-Next fused-GDN projection support in Qwen target verification paths. This is source-level support, not a fully optimized public target claim.
  • Moved long-context attention routing behind target adapters; public split-SDPA switches are gone.
  • Productized verify_qmm through runtime config and target capabilities, with stock MLX fallback for unsupported shapes.
  • Large registered DFlash drafts now default to in-memory w4; use --draft-quant none for bf16/non-quant A/B.
  • Added old Apple chip handling so quantized DFlash drafts use fp16 floating tensors on BF16-emulated chips.
  • Prefix cache is now a managed L1+L2 snapshot service with stable-prefix lookup, L2 promotion, validation, budgets, and server metrics.
  • Added explicit target-only fallback when DFlash context limits are exceeded, with fallback state and physical prefill accounting.
  • Hardened the OpenAI-compatible server for OpenCode, aider, Continue, Open WebUI, LM Studio through its OpenAI-compatible adapter, and other OpenAI-compatible clients.
  • Added stricter Chat Completions tool-call handling, including streamed delta.tool_calls, Qwen XML spans, Gemma4 spans, JSON fallback, and fail-fast validation for malformed or undeclared tool calls.
  • Added minimal non-streaming /v1/responses compatibility for text input and function-call tools.
  • Added live /metrics, structured diagnostics, memory reporting, request summaries, and prefix-cache observability.
  • Added agentic trace/replay lab tooling for real OpenAI-compatible client sessions such as OpenCode/pi.
  • Short-output target-only AR fast path is now opt-in with --fastpath-max-tokens N; default serving keeps requests on the DFlash path.
  • Switched license to Apache-2.0.

Breaking Changes

  • --verify-mode auto was removed. Use --verify-mode dflash for fixed DFlash verification.
  • Public --split-sdpa and --no-split-sdpa controls were removed; attention routing is now target-owned.
  • dflash profiles and old profile/env resolution behavior were removed.
  • Old top-level generation invocation is rejected; use dflash generate.
  • Removed legacy benchmark modes and old diagnostic aliases; use documented benchmark flags and --diagnostics.
  • Runtime internals moved under dflash_mlx/runtime/; old runtime import paths are gone.
  • /v1/responses is intentionally limited: no streaming, multimodal input, Responses-native reasoning/text/truncation controls, tool_choice, parallel_tool_calls, previous_response_id, or persistent store.
  • Function-specific Chat Completions tool_choice and parallel_tool_calls: false are rejected.
  • target_fa_window > 0 disables prefix cache/L2 by design.
  • Bumping to 0.1.6 invalidates older L2 prefix snapshots through runtime-version validation, so they rebuild.

Upgrade Notes

  • Use explicit runtime flags instead of old profiles.
  • Use --fastpath-max-tokens N only when you intentionally want target-only AR for very short server responses.
  • Treat tools/benchmarks/agentic_trace as diagnostic/lab tooling, not as the public benchmark surface.
  • Public benchmark claims should continue to come from dflash benchmark.

v0.1.5.1 — benchmark hotfix

Hotfix

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [dflash-mlx](https://github.com/bstnxbt/dflash-mlx) from 0.1.0 to 0.1.6.
- [Release notes](https://github.com/bstnxbt/dflash-mlx/releases)
- [Commits](https://github.com/bstnxbt/dflash-mlx/commits/v0.1.6)

---
updated-dependencies:
- dependency-name: dflash-mlx
  dependency-version: 0.1.6
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 16, 2026
@dependabot dependabot Bot requested a review from youssofal as a code owner May 16, 2026 10:33
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 16, 2026
@dependabot @github
Copy link
Copy Markdown
Contributor Author

dependabot Bot commented on behalf of github May 23, 2026

Superseded by #79.

@dependabot dependabot Bot closed this May 23, 2026
@dependabot dependabot Bot deleted the dependabot/pip/dflash-mlx-0.1.6 branch May 23, 2026 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants