feat(telemetry): Fix error message telemetry for tool calls#5797
Open
Achuth17 wants to merge 2 commits into
Open
feat(telemetry): Fix error message telemetry for tool calls#5797Achuth17 wants to merge 2 commits into
Achuth17 wants to merge 2 commits into
Conversation
The tool error details are not populated correctly in traces for some some tool call categories like REST Tool or MCP Tool.
Collaborator
|
Response from ADK Triaging Agent Hello @Achuth17, thank you for creating this PR! This PR is a feature/bug fix related to telemetry. To help our reviewers better understand and verify the changes, could you please provide:
This information will help the reviewers review your PR more efficiently. Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The tool error details are not populated correctly in traces for some some tool call categories like REST Tool or MCP Tool.
Please ensure you have read the contribution guide before creating a pull request.
Link to Issue or Description of Change
1. Link to an existing issue (if applicable):
2. Or, if no issue exists, describe the change:
If applicable, please follow the issue templates to provide as much detail as
possible.
Problem:
A clear and concise description of what the problem is.
Tool spans were missing the
error.typeattribute when a tool returned an error as a dict (e.g.{'error': ...}from aFunctionTool,{'isError': True}from an MCP tool,{'status': 'error'}from a REST/discovery tool) rather than raising an exception. Since most ADK tool categories signal errors via return value rather than exceptions, this meant tool error telemetry was effectively absent for the majority of real-world tool failures, making it hard to detect and diagnose tool errors from traces.Solution:
A clear and concise description of what you want to happen and why you choose
this solution.
What: Detect dict-shaped errors in tool responses and populate
error.typeon the tool span with a category-specific string (e.g.TOOL_ERROR,MCP_TOOL_ERROR,HTTP_ERROR). Implemented via an opt-in private hook_detect_error_in_response(self, response)on individual tool classes (FunctionTool,MCPTool,RestApiTool,BashTool,GoogleTool,DiscoveryEngineSearchTool, environment tools, skill tools); the function-call dispatcher calls it viagetattrafter a successful tool invocation and threads the detected string throughtel_ctx.error_typeintotrace_tool_call. Exception-raised errors continue to take precedence.Why this solution:
BaseToolAPI change) — call sites usegetattr(tool, '_detect_error_in_response', None), so we don't commit to a public API while the design settles; a public redesign can come later without breaking tools.error_typeis only consulted when no exception was raised, preserving existing behavior for tools that already throw.Testing Plan
Please describe the tests that you ran to verify your changes. This is required
for all PRs that are not small documentation or typo fixes.
Unit Tests:
Tests run:
tests/unittests/telemetry/test_spans.py+tests/unittests/flows/llm_flows/test_functions_simple.py(targeted) — 99/99 passed. Covers new tests for each tool's_detect_error_in_responsehook, theerror_typekwarg ontrace_tool_call(including exception-takes-precedence), guard thatBaseTooldoesn't expose the hook, and e2e tests verifying: dict errors recorded, success dicts ignored, exception precedence, detection skipped on auth/confirmation requests, and detector exceptions don't break tool calls.tests/unittests/telemetry/+tests/unittests/flows/llm_flows/(broader regression) — 517/517 passed.tests/unittests/tools/(regression on the 8 modified tool files) — 1394/1394 passed.I have added or updated unit tests for my change.
All unit tests pass locally.
Please include a summary of passed
pytestresults.Manual End-to-End (E2E) Tests:
Please provide instructions on how to manually test your changes, including any
necessary setup or configuration. Please provide logs or screenshots to help
reviewers better understand the fix.
Checklist
Additional context
Add any other context or screenshots about the feature request here.