Skip to content

Releases: leehack/llamadart

v0.7.2

07 Jun 03:27
343574c

Choose a tag to compare

Metadata patch release.

  • Added explicit pub.dev platform metadata for Android, iOS, Linux, macOS, web, and Windows so the package listing reflects actual cross-platform support.
  • Hardened the native prompt reuse CI model download to use authenticated, failing Hugging Face requests.

v0.7.1

07 Jun 00:19
4c71d8e

Choose a tag to compare

  • Apple native runtime packaging:
    • Added Flutter iOS/macOS Swift Package Manager integration so Apple apps link
      the pinned leehack/llamadart-native and leehack/litert-lm-native
      XCFramework artifacts through darwin/llamadart/Package.swift.
    • Disabled the legacy hook-managed Apple bundle path for Flutter iOS/macOS
      builds, avoiding wrapper/framework MinimumOSVersion mismatches in App
      Store uploads. Standalone Dart macOS keeps the native-assets dylib fallback.
    • Raised the Flutter Apple runtime floors to iOS 16.4 and macOS 14.0 to match
      the published XCFramework artifacts.
  • Runtime defaults and release automation:
    • Android native builds still include both llama_cpp and litert_lm by
      default; iOS, macOS, Linux, and Windows now default to llama_cpp only.
    • Added native release pin automation so the maintainer sync workflow updates
      Apple SPM checksums from published native release asset digests.
  • CI reliability:
    • Cached and retried tiny GGUF test-model downloads used by VM integration
      tests so main-branch CI is less exposed to Hugging Face 429 rate limits.
    • Excluded local SwiftPM artifact caches from pub archives; Flutter Apple
      consumers still resolve the published remote XCFramework targets.
  • Compatibility note: no Dart API breaking changes. Flutter Apple apps must
    target iOS 16.4/macOS 14.0 or newer, and non-Android native apps that ship
    .litertlm models should opt in with llamadart_native_runtimes.

v0.7.0

01 Jun 23:07
a0f6b4c

Choose a tag to compare

  • LiteRT-LM backend and runtime selection:
    • Added first-class .litertlm routing through LlamaBackend() on native and
      web targets, with native bundle downloads from leehack/litert-lm-native
      and web loading through @litert-lm/core.
    • Added ModelParams.liteRtLmBackend so callers can select LiteRT-LM CPU,
      GPU, or Android NPU execution where supported. auto chooses GPU on
      Android/macOS and CPU elsewhere on native targets.
    • Added cached Hugging Face .litertlm loading through
      loadModelSource(...), preserving the selected LiteRT-LM backend after the
      cache manager resolves the local file.
    • Added native LiteRT-LM tokenization, detokenization, log-level control,
      runtime metrics, and high-level ChatSession token counting support.
    • Added hooks.user_defines.llamadart.llamadart_native_tag,
      llamadart_native_repository, and llamadart_native_path so apps can test
      a different compatible native runtime source without patching llamadart.
    • Updated the Windows runtime fallback scanner to discover custom GitHub and
      local archive cache namespaces when .dart_tool/lib is unavailable.
  • LiteRT-LM chat, templates, and generation quality:
    • Added GenerationParams.speculativeDecoding for native LiteRT-LM. The
      default remains disabled; llama.cpp, WebGPU, and LiteRT-LM web reject the
      option until their speculative paths are implemented.
    • Fixed Gemma 4 .litertlm thinking and tool calling by replacing the stub
      template with the canonical Gemma 4 chat template, parsing the runtime
      thought channel as reasoning, and suppressing reasoning deltas when callers
      set enableThinking: false.
    • Added a filename-keyed .litertlm chat-template registry seeded with
      Gemma 4/3/3n and Qwen 2.5/3. Pass ModelParams.chatTemplate to override
      detection for other models.
    • Fixed LiteRT-LM tool calling for grammar-using handlers by forwarding
      supportsGrammarConstraints from the active NativeAutoBackend delegate.
    • Stopped structured tool-call streams from leaking raw Hermes/Qwen JSON or
      Gemma <|tool_call> markers as assistant content before the final
      tool_calls chunk.
  • Web and chat-app support:
    • Added ModelParams.preferMemory64 and ModelParams.modelBytesHint so
      large WebGPU GGUF models such as Gemma 4 E2B can choose the 64-bit bridge
      core before hitting the wasm32 address-space limit.
    • Fixed web .litertlm chat-app turns by swallowing unsupported token-count
      refreshes, avoiding unsupported minP/penalty parameters for LiteRT-LM
      web generation, and replacing the stuck "Loading model 0%" label with an
      indeterminate load message.
    • Halved web .litertlm load time by skipping WebGPU CacheStorage prefetch
      for LiteRT-LM models, which are fetched directly by @litert-lm/core.
    • Fixed web GGUF downloads reporting success before the bridge was ready by
      awaiting window.__llamadartBridgeReadyPromise, requiring the bridge
      prefetch API, and surfacing actionable errors for old bridge assets.
    • Allowed benign Hugging Face ?download=true URLs to be prefetched into the
      browser cache while still skipping credentialed or signed URLs.
  • Lifecycle, cancellation, and native stability:
    • Fixed iOS .litertlm loading by resolving embedded LiteRtLm and
      StreamProxy frameworks from the app bundle, matching the macOS runtime
      path behavior.
    • Improved LiteRT-LM diagnostics before model load, including selected
      CPU/GPU/NPU backend reporting, platform availability errors, and complete
      dynamic-library candidate failures.
    • Validated platform-specific LiteRT-LM companion libraries during
      native-asset setup so incomplete runtime bundles fail at build time.
    • Hardened native and LiteRT-LM cancellation/disposal so in-flight generation
      no longer races token release, worker teardown, engine deletion, closed
      response ports, or stream writes after cancellation.
    • Freed multimodal prompt buffers on tokenize/eval error paths, serialized
      multimodal projector load/unload, and closed a leaked native-backend
      handshake reply port.
  • Correctness and download resilience:
    • ChatSession now forwards empty-choices completion chunks instead of
      throwing, strips multiple <think> blocks, and trims history only on
      user-message turn boundaries.
    • LlamaEngine.generate wraps unexpected backend errors in
      LlamaInferenceException so callers catching LlamaException see the
      documented error type.
    • Tool-call parsing now uses stable fallback ids, tolerates code-fence
      language tokens without trailing delimiters, and keeps commas inside quoted
      argument values.
    • JSON-schema-to-GBNF conversion now resolves $refs nested inside other
      $ref targets and fails loudly on unresolvable or external $refs.
    • Array grammar generation validates minItems/maxItems, model downloads
      use connection and idle-read timeouts, and partial-download resume is
      restricted to files with stored validators.
  • Benchmarks, docs, and validation:
    • Added fair Gemma 4 LiteRT-LM versus llama.cpp/GGUF benchmark tooling for
      Android, macOS, and web, with speculative-decoding metrics, Pixel benchmark
      failure detection, and target-specific timeouts.
    • Added tool/gguf_chat_features_smoke.dart and the
      chat-app-web-gemma4-webgpu-smoke E2E scenario for real-model parser and
      WebGPU mem64 validation.
    • Updated README, website docs, and doc/litert_lm_templates.md for backend
      selection, platform/runtime support, package-size controls, benchmark
      results, model templates, and current LiteRT-LM capability limits.
  • Compatibility note: no public API breaking changes for existing GGUF /
    llama.cpp callers. LiteRT-LM support is additive, with deprecated benchmark
    wrappers retained for compatibility; unsupported llama.cpp-only parameters are
    rejected for .litertlm loads instead of being silently ignored.

v0.6.17

28 May 14:30
39a4d23

Choose a tag to compare

  • Native runtime sync:
    • Updated native hook pinning and regenerated bindings through
      leehack/llamadart-native@b9371, picking up llama.cpp b9371.
    • Picked up the Apple mobile Metal stability fix that disables Metal
      residency sets on iOS/tvOS/visionOS native bundles, avoiding affected
      device context-creation failures such as MTLLibraryErrorDomain Code=3.
  • Compatibility note: no public API breaking changes in 0.6.17;
    existing 0.6.16 callers remain compatible. The release only refreshes
    the pinned native runtime and generated low-level bindings.

v0.6.16

25 May 17:45
319e6a4

Choose a tag to compare

  • Native runtime diagnostics:
    • Fixed native getVramInfo() so it reports free/total VRAM from
      llama.cpp GPU-class backend devices when available, using props-based
      memory reporting first and the legacy memory probe as a fallback.
    • Routed native VRAM probing through the ggml registry fallback path so
      Windows split bundles resolve backend-device symbols from the runtime that
      owns the device registry.
  • WebGPU and chat app fixes:
    • Improved browser recovery for large remote WebGPU model/projector loads by
      retrying wasm32 model-staging aborts with the wasm64 core before surfacing
      memory-pressure failures.
    • Improved the runnable chat app's web remote-model startup path so model
      assets are prefetched into browser cache when available, browser
      CacheStorage failures fall back to direct network loading, and
      credentialed/signed model URLs skip persistent browser cache storage.
  • Model download UX:
    • Improved the runnable chat app's mobile download behavior so lifecycle
      pauses no longer deliberately cancel active foreground downloads; the app
      now lets short screen-lock/background interruptions continue when the OS
      permits and still keeps explicit pause/dispose cancellation paths.
    • Added in-app and docs guidance for mobile large-model downloads, including
      resumable partial files, foreground Dart lifecycle limits, and the need for
      opt-in native background download/model-store integrations for robust
      cross-app GGUF management.
  • Compatibility note: no public API breaking changes in 0.6.16;
    existing 0.6.15 callers remain compatible. The changes improve native VRAM
    diagnostics, WebGPU browser recovery, and chat app download lifecycle
    behavior.

v0.6.15

22 May 11:43
81aeaba

Choose a tag to compare

What's Changed

  • chore(deps): bump path-to-regexp from 0.1.12 to 0.1.13 in /website in the npm_and_yarn group across 1 directory by @dependabot[bot] in #100
  • chore(website): refresh docs dependencies by @leehack in #154
  • test: add local E2E scenario runner by @leehack in #155
  • Fix GLM-OCR multimodal prompt rendering by @leehack in #157
  • Refactor chat template render context serialization by @leehack in #158
  • Prepare llamadart 0.6.15 release by @leehack in #159

Full Changelog: v0.6.14...v0.6.15

v0.6.14

16 May 14:04
7a55735

Choose a tag to compare

What's Changed

  • docs: document WebGPU readiness checks by @leehack in #148
  • chore(native): sync native release b9159 by @github-actions[bot] in #150
  • feat: add model download controller by @leehack in #149
  • ci: update actions for Node 24 by @leehack in #151
  • chore: update WebGPU bridge assets to v0.1.16 by @leehack in #152
  • chore: prepare v0.6.14 release by @leehack in #153

Full Changelog: v0.6.13...v0.6.14

v0.6.13

14 May 14:27
cd8b1d0

Choose a tag to compare

What's Changed

  • fix: cascade WebGPU batch defaults by @leehack in #121
  • feat: add model download cache manager by @leehack in #129
  • feat(api): expose state_save_file / state_load_file on LlamaEngine by @thereisnotime in #123
  • fix: serialize same-key model downloads by @leehack in #139
  • fix: recover cache metadata sidecars by @leehack in #140
  • docs: codify agent quality guidelines by @leehack in #142
  • feat(webgpu): wire bridge state persistence by @leehack in #141
  • fix: tolerate missing optional webgpu checksum assets by @leehack in #143
  • docs: add production-readiness PR checklist by @leehack in #144
  • fix(models): reject remote-only options for local sources by @leehack in #145
  • feat(models): improve Hugging Face source ergonomics by @leehack in #146
  • chore: prepare v0.6.13 release by @leehack in #147

Full Changelog: v0.6.12...v0.6.13

v0.6.12

09 May 00:57
745910a

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.6.11...v0.6.12

v0.6.11

28 Apr 16:14
3b8e101

Choose a tag to compare

What's Changed

  • chore(native): sync native release b8778 by @github-actions[bot] in #106
  • fix(gemma4): stream thought channels as thinking by @leehack in #108
  • chore(native): sync native release b8955 by @github-actions[bot] in #109
  • chore(release): prepare 0.6.11 by @leehack in #110

Full Changelog: v0.6.10...v0.6.11