What are you trying to build?
Currently the text only models support this, but we should add this to the VLM path as well, ie the token counts and timings. In runVLMInference, call setPromptTokenCount(vlmTokens.count) and wrap the prefill + generation loop
Another design is to instrument CoreAISequentialVLMEngine to record .prompt/.extend spans the way the text engines do, so metrics work for any caller
Where are the current docs or utilities unclear?
N/A
Expected improvement
More readable and informative outputs
Additional context
No response
What are you trying to build?
Currently the text only models support this, but we should add this to the VLM path as well, ie the token counts and timings. In
runVLMInference, callsetPromptTokenCount(vlmTokens.count)and wrap the prefill + generation loopAnother design is to instrument
CoreAISequentialVLMEngineto record.prompt/.extendspans the way the text engines do, so metrics work for any callerWhere are the current docs or utilities unclear?
N/A
Expected improvement
More readable and informative outputs
Additional context
No response