Releases · leehack/llamadart

07 Jun 03:27

leehack

v0.7.2

343574c

v0.7.2 Latest

Latest

Metadata patch release.

Added explicit pub.dev platform metadata for Android, iOS, Linux, macOS, web, and Windows so the package listing reflects actual cross-platform support.
Hardened the native prompt reuse CI model download to use authenticated, failing Hugging Face requests.

Assets 2

07 Jun 00:19

leehack

v0.7.1

4c71d8e

v0.7.1

Apple native runtime packaging:
- Added Flutter iOS/macOS Swift Package Manager integration so Apple apps link
  the pinned leehack/llamadart-native and leehack/litert-lm-native
  XCFramework artifacts through darwin/llamadart/Package.swift.
- Disabled the legacy hook-managed Apple bundle path for Flutter iOS/macOS
  builds, avoiding wrapper/framework MinimumOSVersion mismatches in App
  Store uploads. Standalone Dart macOS keeps the native-assets dylib fallback.
- Raised the Flutter Apple runtime floors to iOS 16.4 and macOS 14.0 to match
  the published XCFramework artifacts.
Runtime defaults and release automation:
- Android native builds still include both llama_cpp and litert_lm by
  default; iOS, macOS, Linux, and Windows now default to llama_cpp only.
- Added native release pin automation so the maintainer sync workflow updates
  Apple SPM checksums from published native release asset digests.
CI reliability:
- Cached and retried tiny GGUF test-model downloads used by VM integration
  tests so main-branch CI is less exposed to Hugging Face 429 rate limits.
- Excluded local SwiftPM artifact caches from pub archives; Flutter Apple
  consumers still resolve the published remote XCFramework targets.
Compatibility note: no Dart API breaking changes. Flutter Apple apps must
target iOS 16.4/macOS 14.0 or newer, and non-Android native apps that ship
.litertlm models should opt in with llamadart_native_runtimes.

Assets 2

01 Jun 23:07

leehack

v0.7.0

a0f6b4c

v0.7.0

LiteRT-LM backend and runtime selection:
- Added first-class .litertlm routing through LlamaBackend() on native and
  web targets, with native bundle downloads from leehack/litert-lm-native
  and web loading through @litert-lm/core.
- Added ModelParams.liteRtLmBackend so callers can select LiteRT-LM CPU,
  GPU, or Android NPU execution where supported. auto chooses GPU on
  Android/macOS and CPU elsewhere on native targets.
- Added cached Hugging Face .litertlm loading through
  loadModelSource(...), preserving the selected LiteRT-LM backend after the
  cache manager resolves the local file.
- Added native LiteRT-LM tokenization, detokenization, log-level control,
  runtime metrics, and high-level ChatSession token counting support.
- Added hooks.user_defines.llamadart.llamadart_native_tag,
  llamadart_native_repository, and llamadart_native_path so apps can test
  a different compatible native runtime source without patching llamadart.
- Updated the Windows runtime fallback scanner to discover custom GitHub and
  local archive cache namespaces when .dart_tool/lib is unavailable.
LiteRT-LM chat, templates, and generation quality:
- Added GenerationParams.speculativeDecoding for native LiteRT-LM. The
  default remains disabled; llama.cpp, WebGPU, and LiteRT-LM web reject the
  option until their speculative paths are implemented.
- Fixed Gemma 4 .litertlm thinking and tool calling by replacing the stub
  template with the canonical Gemma 4 chat template, parsing the runtime
  thought channel as reasoning, and suppressing reasoning deltas when callers
  set enableThinking: false.
- Added a filename-keyed .litertlm chat-template registry seeded with
  Gemma 4/3/3n and Qwen 2.5/3. Pass ModelParams.chatTemplate to override
  detection for other models.
- Fixed LiteRT-LM tool calling for grammar-using handlers by forwarding
  supportsGrammarConstraints from the active NativeAutoBackend delegate.
- Stopped structured tool-call streams from leaking raw Hermes/Qwen JSON or
  Gemma <|tool_call> markers as assistant content before the final
  tool_calls chunk.
Web and chat-app support:
- Added ModelParams.preferMemory64 and ModelParams.modelBytesHint so
  large WebGPU GGUF models such as Gemma 4 E2B can choose the 64-bit bridge
  core before hitting the wasm32 address-space limit.
- Fixed web .litertlm chat-app turns by swallowing unsupported token-count
  refreshes, avoiding unsupported minP/penalty parameters for LiteRT-LM
  web generation, and replacing the stuck "Loading model 0%" label with an
  indeterminate load message.
- Halved web .litertlm load time by skipping WebGPU CacheStorage prefetch
  for LiteRT-LM models, which are fetched directly by @litert-lm/core.
- Fixed web GGUF downloads reporting success before the bridge was ready by
  awaiting window.__llamadartBridgeReadyPromise, requiring the bridge
  prefetch API, and surfacing actionable errors for old bridge assets.
- Allowed benign Hugging Face ?download=true URLs to be prefetched into the
  browser cache while still skipping credentialed or signed URLs.
Lifecycle, cancellation, and native stability:
- Fixed iOS .litertlm loading by resolving embedded LiteRtLm and
  StreamProxy frameworks from the app bundle, matching the macOS runtime
  path behavior.
- Improved LiteRT-LM diagnostics before model load, including selected
  CPU/GPU/NPU backend reporting, platform availability errors, and complete
  dynamic-library candidate failures.
- Validated platform-specific LiteRT-LM companion libraries during
  native-asset setup so incomplete runtime bundles fail at build time.
- Hardened native and LiteRT-LM cancellation/disposal so in-flight generation
  no longer races token release, worker teardown, engine deletion, closed
  response ports, or stream writes after cancellation.
- Freed multimodal prompt buffers on tokenize/eval error paths, serialized
  multimodal projector load/unload, and closed a leaked native-backend
  handshake reply port.
Correctness and download resilience:
- ChatSession now forwards empty-choices completion chunks instead of
  throwing, strips multiple <think> blocks, and trims history only on
  user-message turn boundaries.
- LlamaEngine.generate wraps unexpected backend errors in
  LlamaInferenceException so callers catching LlamaException see the
  documented error type.
- Tool-call parsing now uses stable fallback ids, tolerates code-fence
  language tokens without trailing delimiters, and keeps commas inside quoted
  argument values.
- JSON-schema-to-GBNF conversion now resolves $refs nested inside other
  $ref targets and fails loudly on unresolvable or external $refs.
- Array grammar generation validates minItems/maxItems, model downloads
  use connection and idle-read timeouts, and partial-download resume is
  restricted to files with stored validators.
Benchmarks, docs, and validation:
- Added fair Gemma 4 LiteRT-LM versus llama.cpp/GGUF benchmark tooling for
  Android, macOS, and web, with speculative-decoding metrics, Pixel benchmark
  failure detection, and target-specific timeouts.
- Added tool/gguf_chat_features_smoke.dart and the
  chat-app-web-gemma4-webgpu-smoke E2E scenario for real-model parser and
  WebGPU mem64 validation.
- Updated README, website docs, and doc/litert_lm_templates.md for backend
  selection, platform/runtime support, package-size controls, benchmark
  results, model templates, and current LiteRT-LM capability limits.
Compatibility note: no public API breaking changes for existing GGUF /
llama.cpp callers. LiteRT-LM support is additive, with deprecated benchmark
wrappers retained for compatibility; unsupported llama.cpp-only parameters are
rejected for .litertlm loads instead of being silently ignored.

Assets 2

28 May 14:30

leehack

v0.6.17

39a4d23

v0.6.17

Native runtime sync:
- Updated native hook pinning and regenerated bindings through
  leehack/llamadart-native@b9371, picking up llama.cpp b9371.
- Picked up the Apple mobile Metal stability fix that disables Metal
  residency sets on iOS/tvOS/visionOS native bundles, avoiding affected
  device context-creation failures such as MTLLibraryErrorDomain Code=3.
Compatibility note: no public API breaking changes in 0.6.17;
existing 0.6.16 callers remain compatible. The release only refreshes
the pinned native runtime and generated low-level bindings.

Assets 2

25 May 17:45

leehack

v0.6.16

319e6a4

v0.6.16

Native runtime diagnostics:
- Fixed native getVramInfo() so it reports free/total VRAM from
  llama.cpp GPU-class backend devices when available, using props-based
  memory reporting first and the legacy memory probe as a fallback.
- Routed native VRAM probing through the ggml registry fallback path so
  Windows split bundles resolve backend-device symbols from the runtime that
  owns the device registry.
WebGPU and chat app fixes:
- Improved browser recovery for large remote WebGPU model/projector loads by
  retrying wasm32 model-staging aborts with the wasm64 core before surfacing
  memory-pressure failures.
- Improved the runnable chat app's web remote-model startup path so model
  assets are prefetched into browser cache when available, browser
  CacheStorage failures fall back to direct network loading, and
  credentialed/signed model URLs skip persistent browser cache storage.
Model download UX:
- Improved the runnable chat app's mobile download behavior so lifecycle
  pauses no longer deliberately cancel active foreground downloads; the app
  now lets short screen-lock/background interruptions continue when the OS
  permits and still keeps explicit pause/dispose cancellation paths.
- Added in-app and docs guidance for mobile large-model downloads, including
  resumable partial files, foreground Dart lifecycle limits, and the need for
  opt-in native background download/model-store integrations for robust
  cross-app GGUF management.
Compatibility note: no public API breaking changes in 0.6.16;
existing 0.6.15 callers remain compatible. The changes improve native VRAM
diagnostics, WebGPU browser recovery, and chat app download lifecycle
behavior.

Assets 2

22 May 11:43

leehack

v0.6.15

81aeaba

v0.6.15

What's Changed

chore(deps): bump path-to-regexp from 0.1.12 to 0.1.13 in /website in the npm_and_yarn group across 1 directory by @dependabot[bot] in #100
chore(website): refresh docs dependencies by @leehack in #154
test: add local E2E scenario runner by @leehack in #155
Fix GLM-OCR multimodal prompt rendering by @leehack in #157
Refactor chat template render context serialization by @leehack in #158
Prepare llamadart 0.6.15 release by @leehack in #159

Full Changelog: v0.6.14...v0.6.15

Contributors

leehack and dependabot

Assets 2

16 May 14:04

leehack

v0.6.14

7a55735

v0.6.14

What's Changed

docs: document WebGPU readiness checks by @leehack in #148
chore(native): sync native release b9159 by @github-actions[bot] in #150
feat: add model download controller by @leehack in #149
ci: update actions for Node 24 by @leehack in #151
chore: update WebGPU bridge assets to v0.1.16 by @leehack in #152
chore: prepare v0.6.14 release by @leehack in #153

Full Changelog: v0.6.13...v0.6.14

Contributors

leehack

Assets 2

14 May 14:27

leehack

v0.6.13

cd8b1d0

v0.6.13

What's Changed

fix: cascade WebGPU batch defaults by @leehack in #121
feat: add model download cache manager by @leehack in #129
feat(api): expose state_save_file / state_load_file on LlamaEngine by @thereisnotime in #123
fix: serialize same-key model downloads by @leehack in #139
fix: recover cache metadata sidecars by @leehack in #140
docs: codify agent quality guidelines by @leehack in #142
feat(webgpu): wire bridge state persistence by @leehack in #141
fix: tolerate missing optional webgpu checksum assets by @leehack in #143
docs: add production-readiness PR checklist by @leehack in #144
fix(models): reject remote-only options for local sources by @leehack in #145
feat(models): improve Hugging Face source ergonomics by @leehack in #146
chore: prepare v0.6.13 release by @leehack in #147

Full Changelog: v0.6.12...v0.6.13

Contributors

leehack and thereisnotime

Assets 2

09 May 00:57

leehack

v0.6.12

745910a

v0.6.12

What's Changed

fix(native): test b9016 runtime and expose mainGpu by @leehack in #113
Filter backend-owned runtime DLLs by @leehack in #115
feat(api): expose load-time tuning knobs on ModelParams by @thereisnotime in #116
Sync WebGPU bridge assets to llama.cpp b9016 by @leehack in #117
Update WebGPU bridge assets to v0.1.14 by @leehack in #118
Prepare 0.6.12 release by @leehack in #119

New Contributors

@thereisnotime made their first contribution in #116

Full Changelog: v0.6.11...v0.6.12

Contributors

leehack and thereisnotime

Assets 2

28 Apr 16:14

leehack

v0.6.11

3b8e101

v0.6.11

What's Changed

chore(native): sync native release b8778 by @github-actions[bot] in #106
fix(gemma4): stream thought channels as thinking by @leehack in #108
chore(native): sync native release b8955 by @github-actions[bot] in #109
chore(release): prepare 0.6.11 by @leehack in #110

Full Changelog: v0.6.10...v0.6.11

Contributors

leehack

Assets 2

Releases: leehack/llamadart

v0.7.2

Uh oh!

v0.7.1

Uh oh!

v0.7.0

Uh oh!

v0.6.17

Uh oh!

v0.6.16

Uh oh!

v0.6.15

What's Changed

Contributors

Uh oh!

v0.6.14

What's Changed

Contributors

Uh oh!

v0.6.13

What's Changed

Contributors

Uh oh!

v0.6.12

What's Changed

New Contributors

Contributors

Uh oh!

v0.6.11

What's Changed

Contributors

Uh oh!