Add text-generation-v2 demo with dropdown model selector and non-GQA,GPU-resident KV support by arisha07 · Pull Request #4 · Honry/webnn-developer-preview

arisha07 · 2026-06-04T23:31:52Z

No description provided.

…GPU-resident KV support

arisha07 · 2026-06-05T20:20:56Z

Adds a new demos/text-generation-v2/ demo alongside the existing text-generation demo. It introduces a dropdown model selector with an explicit "Load Model" button (no auto-load), and extends LLM inference to support both GQA (fused) and non-GQA (standard attention) models with GPU-resident KV cache.

What's new

Dropdown model selector : models are defined in a MODELS config object; adding a new model requires only a config entry, no HTML changes
Non-GQA model support : Models generated from Optimum and NNCF flow alongside GQA fused models in the same demo
GPU-resident KV cache for non-GQA : a standalone WebNN slice graph dispatches present[1:] → past each step entirely on GPU (no GPU→CPU→GPU roundtrip); only logits (~2.4MB) are read back to CPU per step vs ~134MB with CPU KV
freeDimensionOverrides for non-GQA : all dims set statically at session creation, eliminating dynamic-shape dispatch failures

Non-GQA onnx models are available - https://gfx-assets.fm.intel.com/artifactory/gfx-ort-ovep-assets/genai/models/onnx/
For testing use models which are less than 2GB in size. Please update the model path accordingly in main.js

URL : http://localhost:8080/demos/text-generation-v2/?devicetype=gpu this will load ort wasm lib from local dist present in /demos/text-generation-v2/dist.

Add text-generation-v2 demo with dropdown model selector and non-GQA …

7b4d6d6

…GPU-resident KV support

arisha07 marked this pull request as ready for review June 5, 2026 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add text-generation-v2 demo with dropdown model selector and non-GQA,GPU-resident KV support#4

Add text-generation-v2 demo with dropdown model selector and non-GQA,GPU-resident KV support#4
arisha07 wants to merge 1 commit into
Honry:support_qwen3from
arisha07:text-gen-v2

arisha07 commented Jun 4, 2026

Uh oh!

arisha07 commented Jun 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arisha07 commented Jun 4, 2026

Uh oh!

arisha07 commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's new

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

arisha07 commented Jun 5, 2026 •

edited

Loading