ctx.fork() produces incoherent output on Vulkan/Windows — CopyD2D in aux_server.cpp round-trips through ggml_backend_tensor_get which doesn't support partial tensor reads on ggml-Vulkan

## Environment
- OS: Windows 11
- GPU: NVIDIA RTX 2050
- Backend: ggml-Vulkan (portable driver)
- Model: Qwen3-0.6B

## Symptom
Calling `ctx.fork()` in an inferlet causes all subsequent generation from the forked context to produce incoherent/garbage output, even with a single fork and no concurrency. Plain `Context::new()` without fork works correctly with the same model.

## Root Cause
`AuxServer::handle_command_` in `driver/portable/src/aux_server.cpp` (line 294) implements `CopyD2D` by round-tripping each KV page through a host buffer via `ggml_backend_tensor_get` / `ggml_backend_tensor_set` with a non-zero byte offset (`src_off = pair.src * page_bytes`). On ggml-Vulkan, partial tensor reads at non-zero offsets appear to return zeros or garbage, silently corrupting the copied KV pages. The comment in the code already acknowledges this is not universally supported across backends.

This is consistent with `ctx.fork()` working correctly on Metal — `ggml_backend_tensor_get` with offset works on Metal/CUDA but not on Vulkan.

## Workaround
Avoid `ctx.fork()` entirely — create a fresh `Context::new()` per branch and replay prior turns as text. Verified working at `num_branches=2` (8 concurrent leaves), 512 tokens/step. Branch: `fix/tot-fork-corruption`.

## Suggested Fix
For the Vulkan backend, either:
- (a) Fall back to `CopyD2H` + `CopyH2D` (round-trip through CPU swap pool) instead of the direct `CopyD2D` path
- (b) Fix `ggml_backend_tensor_get` offset support in ggml-Vulkan upstream

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ctx.fork() produces incoherent output on Vulkan/Windows — CopyD2D in aux_server.cpp round-trips through ggml_backend_tensor_get which doesn't support partial tensor reads on ggml-Vulkan #418

Environment

Symptom

Root Cause

Workaround

Suggested Fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

ctx.fork() produces incoherent output on Vulkan/Windows — CopyD2D in aux_server.cpp round-trips through ggml_backend_tensor_get which doesn't support partial tensor reads on ggml-Vulkan #418

Description

Environment

Symptom

Root Cause

Workaround

Suggested Fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions