VLM: wire performance instrumentation and logits output by stikves · Pull Request #72 · apple/coreai-models

stikves · 2026-07-01T05:12:43Z

Wire performance metrics and logits output for the VLM inference path, matching the LLM path behavior.

Changes

Report prompt throughput (prefill t/s) and generation throughput in the performance summary
Wire --print-logits for VLM: shows top-5 token probabilities per generated step
Wire --save-logits for VLM: saves top-K logits to JSON (same format as LLM path)
Make TokenLogits and TopLogitEntry properties public (cross-module access)

Sample output

$ llm-runner --model <vlm-bundle> --image photo.jpg --prompt "What is this?" --max-tokens 10

Generating...
The image features a small kitten sitting on

Performance Summary:
==================================================
Model Load: 15299.1ms
Prompt:     1028.3ms, 590 tokens, 573.8 tokens/sec
Generation: 480.8ms, 9 tokens, 18.7 tokens/sec
Total:      24.479s
==================================================

With --print-logits:

Generating...
  logits top5: [450]=26.812 [910]=25.406 [512]=24.422 [319]=22.609 [739]=22.109 The
  logits top5: [1967]=25.562 [9088]=19.609 [7623]=18.797 [15373]=18.734 [17739]=17.516 image
  ...

With --save-logits /tmp/vlm_logits.json:

{
  "tokens": [{
    "token_id": 450,
    "incremental_text": "The",
    "top_logits": [
      {"token_id": 450, "incremental_text": "The", "logit": 26.8125},
      {"token_id": 910, "incremental_text": "This", "logit": 25.406}
    ]
  }]
}

Test plan

Build passes
VLM shows correct prompt/generation t/s in performance summary
--print-logits displays top-5 logits per token during VLM generation
--save-logits produces valid JSON with top-K entries
Verbose output (--verbose) shows full timing breakdown table
Verify no regression on text-only LLM path

- Report prompt throughput (prefill t/s) and generation throughput for VLM inference, matching the LLM path performance summary - Wire --print-logits for VLM: shows top-5 token probabilities per step - Wire --save-logits for VLM: saves top-K logits to JSON file - Make TokenLogits and TopLogitEntry properties public (needed by runner) Tested with LLaVA-1.5-7B bundle: prompt 590 tokens at 579 t/s, generation 20 tokens at 19.4 t/s. Logits JSON output verified.

carinapeng · 2026-07-01T22:53:18Z

Thank you for taking this on @stikves !

We seem to have implemented the runner-level design ( in runVLMInference, call setPromptTokenCount(vlmTokens.count) and wrap the prefill + generation) #70

I proposed engine level here as well, seems to me it could be a more sustainable design because if we instrument CoreAISequentialVLMEngine then metrics can work for any caller, that's how we do it for text engines as well

I wonder if that'd be a better design to be more generic?

- Report prompt throughput (prefill t/s) and generation throughput for VLM inference, matching the LLM path performance summary - Wire --print-logits for VLM: shows top-5 token probabilities per step - Wire --save-logits for VLM: saves top-K logits to JSON via LogitsWriter - Make TokenLogits and TopLogitEntry properties public (cross-module access) Tested with VLM bundle: prompt 590 tokens at 579 t/s, generation 20 tokens at 19.4 t/s. Logits JSON output verified.

stikves force-pushed the sukru/vlm-instrumentation branch from 60c0a79 to dbde899 Compare July 1, 2026 05:16

stikves marked this pull request as ready for review July 1, 2026 05:17

stikves force-pushed the sukru/vlm-instrumentation branch from a084bb9 to 5c68d49 Compare July 1, 2026 05:23

stikves requested review from alejandro-isaza, carinapeng, kevchengcodes and tjia1818 July 1, 2026 17:35

stikves self-assigned this Jul 1, 2026

carinapeng mentioned this pull request Jul 1, 2026

Populate performance summary in VLM runner / engine #70

Open

carinapeng reviewed Jul 1, 2026

View reviewed changes

Comment thread swift/Sources/Tools/llm-runner/LLMRunnerMain.swift

carinapeng reviewed Jul 1, 2026

View reviewed changes

Comment thread swift/Sources/Tools/llm-runner/LLMRunnerMain.swift Outdated

tjia1818 approved these changes Jul 1, 2026

View reviewed changes

stikves force-pushed the sukru/vlm-instrumentation branch from 20cfdc5 to 4e11890 Compare July 1, 2026 23:15

stikves force-pushed the sukru/vlm-instrumentation branch from 9d33bfa to 91fecb2 Compare July 1, 2026 23:20

Merge branch 'main' into sukru/vlm-instrumentation

9321116

stikves merged commit 1303957 into apple:main Jul 2, 2026
3 checks passed

stikves deleted the sukru/vlm-instrumentation branch July 2, 2026 03:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VLM: wire performance instrumentation and logits output#72

VLM: wire performance instrumentation and logits output#72
stikves merged 3 commits into
apple:mainfrom
stikves:sukru/vlm-instrumentation

stikves commented Jul 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

carinapeng commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

stikves commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Sample output

Test plan

Uh oh!

Uh oh!

Uh oh!

carinapeng commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stikves commented Jul 1, 2026 •

edited

Loading