feat: add token usage metrics#1
Open
ashrobertsdragon wants to merge 2 commits into
Open
Conversation
…ent run Captures input_tokens, output_tokens, total_tokens, and model from the pydantic-ai RunResult after every successful LLM call in run_agent_async, emitting them via the existing on_observe callback as ObservationType.METRIC. The package remains ignorant of user and run identity.
Reviewer's GuideAdds token usage metric emission to the async agent execution flow and introduces a unit test to verify that METRIC observations with token counts are produced on successful runs. Sequence diagram for async agent run with token usage metric emissionsequenceDiagram
actor Caller
participant run_agent_async
participant Model
participant emit_observation
participant Observer
Caller->>run_agent_async: invoke(model, on_observe, ...)
run_agent_async->>Model: execute_async()
Model-->>run_agent_async: res(output, usage)
run_agent_async->>emit_observation: emit_observation(on_observe, EVENT, agent, Agent run completed...)
emit_observation-->>Observer: on_observe(EVENT, meta)
run_agent_async->>run_agent_async: usage = res.usage()
run_agent_async->>emit_observation: emit_observation(on_observe, METRIC, agent, Token usage for model, {model, input_tokens, output_tokens, total_tokens})
emit_observation-->>Observer: on_observe(METRIC, metric_meta)
run_agent_async-->>Caller: res.output
Flow diagram for run_agent_async with token usage metricflowchart TD
A[Start run_agent_async] --> B[Call model to execute asynchronously]
B --> C[Receive res with output and usage]
C --> D[Emit EVENT observation for successful agent run]
D --> E["Extract usage = res.usage()"]
E --> F[Compute total_tokens = input_tokens + output_tokens]
F --> G[Emit METRIC observation with model and token counts]
G --> H[Return res.output]
H --> I[End]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- Consider guarding
res.usage()inrun_agent_async(e.g., handleNoneor missing attributes) so that a missing usage payload doesn't cause the whole agent run to fail when emitting metrics. - The
test_run_agent_async_emits_metric_eventassumes exactly oneMETRICevent, which could become brittle if additional metrics are emitted in the future; filtering by a more specific property (e.g., message or metadata key) would make the assertion more robust.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider guarding `res.usage()` in `run_agent_async` (e.g., handle `None` or missing attributes) so that a missing usage payload doesn't cause the whole agent run to fail when emitting metrics.
- The `test_run_agent_async_emits_metric_event` assumes exactly one `METRIC` event, which could become brittle if additional metrics are emitted in the future; filtering by a more specific property (e.g., message or metadata key) would make the assertion more robust.
## Individual Comments
### Comment 1
<location path="src/lorebinders/agent/factory.py" line_range="128-127" />
<code_context>
f"Agent run completed with model {model}",
meta,
)
+ usage = res.usage()
+ emit_observation(
+ on_observe,
+ ObservationType.METRIC,
+ "agent",
+ f"Token usage for model {model}",
+ {
+ "model": model,
+ "input_tokens": usage.input_tokens,
+ "output_tokens": usage.output_tokens,
+ "total_tokens": usage.input_tokens + usage.output_tokens,
+ },
+ )
return res.output
except Exception as e:
</code_context>
<issue_to_address>
**issue (bug_risk):** Guard against `res.usage()` failures so they don't turn a successful agent run into an error.
Because `res.usage()` and metric emission are inside the main `try`, any error there will be caught by the outer `except` and incorrectly turn a successful agent run into a failure. To keep `run_agent_async`’s behavior unchanged, please wrap the usage/metric logic in its own `try/except` (ideally catching specific exceptions) and log a warning so usage collection remains best-effort and non-fatal.
</issue_to_address>
### Comment 2
<location path="tests/unit/agents/test_factory.py" line_range="82-86" />
<code_context>
+ o for o in observations if o.type == ObservationType.METRIC
+ ]
+ assert len(metric_events) == 1
+ meta = metric_events[0].metadata
+ assert isinstance(meta["input_tokens"], int)
+ assert isinstance(meta["output_tokens"], int)
+ assert isinstance(meta["total_tokens"], int)
+ assert meta["total_tokens"] == meta["input_tokens"] + meta["output_tokens"]
+ assert "model" in meta
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen assertions on token values by checking for non-negative counts.
Currently the test only checks that token counts are integers and that `total_tokens` equals the sum of input and output tokens. Please also assert that `input_tokens`, `output_tokens`, and `total_tokens` are `>= 0` so tests fail if negative token values are ever emitted.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Comment on lines
+82
to
+86
| meta = metric_events[0].metadata | ||
| assert isinstance(meta["input_tokens"], int) | ||
| assert isinstance(meta["output_tokens"], int) | ||
| assert isinstance(meta["total_tokens"], int) | ||
| assert meta["total_tokens"] == meta["input_tokens"] + meta["output_tokens"] |
There was a problem hiding this comment.
suggestion (testing): Strengthen assertions on token values by checking for non-negative counts.
Currently the test only checks that token counts are integers and that total_tokens equals the sum of input and output tokens. Please also assert that input_tokens, output_tokens, and total_tokens are >= 0 so tests fail if negative token values are ever emitted.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary of changes
This PR adds token usage tracking to the asynchronous agent execution flow. When an agent run completes successfully, it now extracts token statistics (input, output, and total) and emits them as a metric observation.
Key features/fixes
run_agent_async, the model's usage data is now captured and emitted using theMETRICobservation type.test_run_agent_async_emits_metric_eventto verify that metric events are correctly triggered with valid token counts upon successful execution.Breaking changes
None.
Summary by Sourcery
Track and emit token usage metrics after asynchronous agent runs and validate this behavior with a new unit test.
New Features:
Tests: