Skip to content

Populate performance summary in VLM runner / engine #70

Description

@carinapeng

What are you trying to build?

Currently the text only models support this, but we should add this to the VLM path as well, ie the token counts and timings. In runVLMInference, call setPromptTokenCount(vlmTokens.count) and wrap the prefill + generation loop

Another design is to instrument CoreAISequentialVLMEngine to record .prompt/.extend spans the way the text engines do, so metrics work for any caller

Where are the current docs or utilities unclear?

N/A

Expected improvement

More readable and informative outputs

Additional context

No response

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions