Skip to content

Add text-generation-v2 demo with dropdown model selector and non-GQA,GPU-resident KV support#4

Open
arisha07 wants to merge 1 commit into
Honry:support_qwen3from
arisha07:text-gen-v2
Open

Add text-generation-v2 demo with dropdown model selector and non-GQA,GPU-resident KV support#4
arisha07 wants to merge 1 commit into
Honry:support_qwen3from
arisha07:text-gen-v2

Conversation

@arisha07

@arisha07 arisha07 commented Jun 4, 2026

Copy link
Copy Markdown

No description provided.

@arisha07

arisha07 commented Jun 5, 2026

Copy link
Copy Markdown
Author

Adds a new demos/text-generation-v2/ demo alongside the existing text-generation demo. It introduces a dropdown model selector with an explicit "Load Model" button (no auto-load), and extends LLM inference to support both GQA (fused) and non-GQA (standard attention) models with GPU-resident KV cache.

What's new

  1. Dropdown model selector : models are defined in a MODELS config object; adding a new model requires only a config entry, no HTML changes
  2. Non-GQA model support : Models generated from Optimum and NNCF flow alongside GQA fused models in the same demo
  3. GPU-resident KV cache for non-GQA : a standalone WebNN slice graph dispatches present[1:] → past each step entirely on GPU (no GPU→CPU→GPU roundtrip); only logits (~2.4MB) are read back to CPU per step vs ~134MB with CPU KV
  4. freeDimensionOverrides for non-GQA : all dims set statically at session creation, eliminating dynamic-shape dispatch failures

Non-GQA onnx models are available - https://gfx-assets.fm.intel.com/artifactory/gfx-ort-ovep-assets/genai/models/onnx/
For testing use models which are less than 2GB in size. Please update the model path accordingly in main.js

URL : http://localhost:8080/demos/text-generation-v2/?devicetype=gpu this will load ort wasm lib from local dist present in /demos/text-generation-v2/dist.

@arisha07 arisha07 marked this pull request as ready for review June 5, 2026 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant