feat: Add step-length metrics for measuring Claude's autonomy#13
Open
mylee04 wants to merge 1 commit into
Open
feat: Add step-length metrics for measuring Claude's autonomy#13mylee04 wants to merge 1 commit into
mylee04 wants to merge 1 commit into
Conversation
Implements step-length tracking as suggested by Chip Huyen to measure consecutive tool uses before interruption. This helps users understand how autonomously Claude operates in their projects. Changes: - Add step-length calculation in stats.py tracking consecutive tool-using commands - Create distribution and time-series visualizations in dashboard - Add comprehensive test suite with 8 tests covering all edge cases - Display metrics: average, min, max step-length and distribution The feature shows: - Step-Length Distribution: Bar chart of sequence frequency - Step-Length Over Time: Line chart tracking trends - Tool-specific analysis in backend (for future use) Tests: All 8 tests passing
Owner
Author
|
@chiphuyen You are very welcome! Yeah you're correct. It looks similar to user command. |
Owner
|
Thanks for the follow-up. I'll wait to merge until we have custom chart feature (to avoid making the dashboard crowded), but I'll keep the PR up here in case folks want to use this chart! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Summary
I saw the discussion in the comments about tracking step-length metrics, so I decided to give it a try! This PR implements the ability to track consecutive tool-using commands before interruption, which helps understand how autonomously Claude operates.
What it does
Implementation
stats.pyExample metrics
{ "average_step_length": 2.38, "max_step_length": 4, "step_length_distribution": {"1": 2, "2": 3, "3": 1, "4": 2} }The charts appear after the Interruption Rate chart in the dashboard and automatically hide when there's no data.
Let me know if you'd like any changes or have feedback!