This was generated by AI during triage.
Problem
During flaky Android replay/test runs, snapshot capture latency can jump from the expected ~400-600ms range to ~1.5-2.3s. That is a strong signal that the run environment is unhealthy or that the snapshot path has degraded, but Agent Device currently does not summarize that signal in a way humans or agents can act on.
This affected React Navigation Android suite investigation: slow snapshots correlated with noisy/flaky runs, stale daemon suspicion, emulator load, Metro/app stuck states, or Android snapshot helper slowdown/fallback.
Proposal
Collect snapshot timing stats for each open session and include them in relevant command/run output.
Suggested stats:
count
p50
p95
max
- platform/backend detail when available, for example Android helper vs stock fallback
- helper fallback/error count when available
For agent-device test and agent-device replay, include aggregate snapshot stats in the final structured result.
For individual commands that capture snapshots, expose a small diagnostic payload in JSON and optionally print a warning to stderr in non-JSON mode when the current command or session crosses a threshold.
Warning Behavior
If snapshot p95 is high for the current run/session, print a scoped warning such as:
Warning: Android snapshots are slow in this run: p95 2180ms over 34 captures. Possible causes: emulator load, app/Metro stuck, helper fallback, stale daemon.
Keep stdout stable for normal command results. Prefer stderr for non-JSON warnings.
Agent-Actionable Guidance
The diagnostic should distinguish reliable actions from guesses.
Reliable checks/actions:
- report the slowdown and avoid trusting perf comparisons from that run
- retry a flaky test once when retries are enabled
- clean up daemons owned by the current
test/replay run
- report Android helper fallback counts/errors
- check Metro reachability when the run is clearly RN/Expo/dev-client based
Potential but not always safe:
- refresh
adb reverse for known Metro ports
- collect screenshot/log artifacts
- wait for app/device idle
Avoid automatic broad recovery:
- killing arbitrary stale daemons
- rebooting emulators
- restarting Metro
- assuming app stuck vs host/device load without supporting evidence
Acceptance Criteria
- Snapshot capture durations are recorded per session/run.
test/replay final results include aggregate snapshot stats.
- JSON output includes machine-readable fields that an agent can inspect.
- Non-JSON output prints a warning when snapshot p95 crosses a conservative threshold.
- Warnings are scoped and actionable, not noisy for normal runs.
- Existing command stdout contracts remain stable.
- Unit/integration coverage exercises aggregation and warning rendering.
Problem
During flaky Android replay/test runs, snapshot capture latency can jump from the expected ~400-600ms range to ~1.5-2.3s. That is a strong signal that the run environment is unhealthy or that the snapshot path has degraded, but Agent Device currently does not summarize that signal in a way humans or agents can act on.
This affected React Navigation Android suite investigation: slow snapshots correlated with noisy/flaky runs, stale daemon suspicion, emulator load, Metro/app stuck states, or Android snapshot helper slowdown/fallback.
Proposal
Collect snapshot timing stats for each open session and include them in relevant command/run output.
Suggested stats:
countp50p95maxFor
agent-device testandagent-device replay, include aggregate snapshot stats in the final structured result.For individual commands that capture snapshots, expose a small diagnostic payload in JSON and optionally print a warning to stderr in non-JSON mode when the current command or session crosses a threshold.
Warning Behavior
If snapshot
p95is high for the current run/session, print a scoped warning such as:Keep stdout stable for normal command results. Prefer stderr for non-JSON warnings.
Agent-Actionable Guidance
The diagnostic should distinguish reliable actions from guesses.
Reliable checks/actions:
test/replayrunPotential but not always safe:
adb reversefor known Metro portsAvoid automatic broad recovery:
Acceptance Criteria
test/replayfinal results include aggregate snapshot stats.