Skip to content

feat: Add step-length metrics for measuring Claude's autonomy#13

Open
mylee04 wants to merge 1 commit into
chiphuyen:mainfrom
mylee04:feature/step-length-metrics
Open

feat: Add step-length metrics for measuring Claude's autonomy#13
mylee04 wants to merge 1 commit into
chiphuyen:mainfrom
mylee04:feature/step-length-metrics

Conversation

@mylee04

@mylee04 mylee04 commented Jul 26, 2025

Copy link
Copy Markdown

Summary

I saw the discussion in the comments about tracking step-length metrics, so I decided to give it a try! This PR implements the ability to track consecutive tool-using commands before interruption, which helps understand how autonomously Claude operates.

What it does

  • Tracks how many consecutive commands use tools before being interrupted
  • Shows distribution of step lengths (e.g., how often Claude takes 1, 2, 3+ steps)
  • Displays trends over time to see if Claude is becoming more autonomous
  • Calculates average, min, and max step lengths

Implementation

  • Backend: Added step-length calculation in stats.py
  • Frontend: Added two new charts in the dashboard
    • Step-Length Distribution (bar chart)
    • Step-Length Metrics Over Time (line chart)
  • Tests: Added comprehensive test suite (8 tests, all passing)

Example metrics

{
  "average_step_length": 2.38,
  "max_step_length": 4,
  "step_length_distribution": {"1": 2, "2": 3, "3": 1, "4": 2}
}

The charts appear after the Interruption Rate chart in the dashboard and automatically hide when there's no data.

Let me know if you'd like any changes or have feedback!

Implements step-length tracking as suggested by Chip Huyen to measure
consecutive tool uses before interruption. This helps users understand
how autonomously Claude operates in their projects.

Changes:
- Add step-length calculation in stats.py tracking consecutive tool-using commands
- Create distribution and time-series visualizations in dashboard
- Add comprehensive test suite with 8 tests covering all edge cases
- Display metrics: average, min, max step-length and distribution

The feature shows:
- Step-Length Distribution: Bar chart of sequence frequency
- Step-Length Over Time: Line chart tracking trends
- Tool-specific analysis in backend (for future use)

Tests: All 8 tests passing
@chiphuyen

Copy link
Copy Markdown
Owner

Thanks for the contribution!

Can you share a screenshot of what it might look like?

I did consider adding a similar chart, but then I realized it makes the chart page a bit crowded. I think a good feature would be to let users customize what set of charts they want for each project.

Currently, I think it might also look similar to the User Commands chart, which maps the distributions of steps per command.

Screenshot 2025-08-04 at 6 20 52 PM

@mylee04

mylee04 commented Aug 5, 2025

Copy link
Copy Markdown
Author

@chiphuyen You are very welcome! Yeah you're correct. It looks similar to user command.
Screenshot 2025-08-04 at 10 09 06 PM

@chiphuyen

Copy link
Copy Markdown
Owner

Thanks for the follow-up. I'll wait to merge until we have custom chart feature (to avoid making the dashboard crowded), but I'll keep the PR up here in case folks want to use this chart!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants