Skip to content

perf(tui): fix CPU lockup and O(N) scans during waiting/running states#494

Open
mattimustang wants to merge 4 commits into
usestrix:mainfrom
mattimustang:fix/tui-cpu-lockup-clean
Open

perf(tui): fix CPU lockup and O(N) scans during waiting/running states#494
mattimustang wants to merge 4 commits into
usestrix:mainfrom
mattimustang:fix/tui-cpu-lockup-clean

Conversation

@mattimustang
Copy link
Copy Markdown

Summary

Fixes 100% CPU utilisation that occurred during the waiting state between agent runs, caused by redundant 60fps re-renders and O(N) scans over all tool executions on every frame.

  • Fix O(N) scans across all tool executions by indexing lookups through agent_data["tool_executions"] instead of scanning the full tracer.tool_executions dict
  • Eliminate redundant 60fps renders during waiting state by skipping agent status display updates when the dot animation timer is inactive
  • Cache completed event renders to avoid re-rendering unchanged content on every refresh during running state

Details

During the waiting state the TUI was re-rendering all events at 60fps even though nothing changed, and several helper methods (_agent_has_real_activity, _agent_vulnerability_count, _get_agent_name_for_vulnerability, _gather_agent_events) each did a full O(N) scan over tracer.tool_executions to find events belonging to a given agent. Combined, this drove CPU to 100% while the app appeared idle. These methods now use the per-agent tool_executions index for O(1) lookup, and status display updates are skipped entirely when no animation is running.

Test plan

  • Run a scan with multiple agents and verify CPU usage drops to near-zero during the waiting state
  • Confirm agent events display correctly in the chat view during and after a run
  • Confirm vulnerability counts display correctly

sandiyochristan and others added 3 commits May 20, 2026 21:45
* feat: add HTTP request smuggling skill

Add a new vulnerability skill covering HTTP request smuggling (HRS)
across CL.TE, TE.CL, H2.CL, and H2.TE desync variants. HRS is absent
from the existing skill set despite being a distinct, high-impact
vulnerability class frequently present in any architecture using a
reverse proxy or CDN in front of an application server.

Coverage:
- CL.TE: front-end uses Content-Length, back-end uses Transfer-Encoding
- TE.CL: front-end uses Transfer-Encoding, back-end uses Content-Length
- H2.CL: HTTP/2 front-end downgrades to HTTP/1.1 with injected Content-Length
- H2.TE: Transfer-Encoding header injection through HTTP/2 desync
- Transfer-Encoding obfuscation techniques (tab, space, duplicate, xchunked)
- Front-end security control bypass via smuggled prefix
- Cross-user request capture for session token theft
- Response queue poisoning and WebSocket handshake hijacking
- Timing-based and differential response detection methodology
- HTTP/2 specific probing techniques

Includes raw HTTP examples for each variant, step-by-step testing
methodology, exploitation PoCs, false-positive conditions, and
infrastructure topology guidance.

* fix: correct TE.CL probe, pseudo-header terminology, PoC Content-Length values, \x20 representation

Four reviewer findings addressed:

P1 — TE.CL timing-probe description inverted: previous text said
'Content-Length set to fewer bytes than the chunk content' which
describes socket-poisoning behavior (differential response), not a
timeout. Corrected to: send a complete chunked body with CL set to MORE
bytes than provided so the back-end waits for data that never arrives.
Also corrected Testing Methodology step 3 to match.

P2 — pseudo-header terminology: 'content-length' is a regular HTTP/2
header, not a pseudo-header (pseudo-headers are exclusively :method,
:path, :authority, :scheme). Fixed the H2.CL explanation (line 75),
HTTP/2-specific detection bullet, and Pro Tip usestrix#4 which referred to
':content-length pseudo-header'.

P2 — PoC Content-Length values: outer Content-Length in the bypass PoC
corrected from 116 to 100 (actual byte count of the body shown); capture
PoC corrected from 129 to 120.

P2 — \x20 representation: replaced the \x20 escape sequence in the code
block (which renders as a literal four-character string, not a space byte)
with an explanatory comment and actual whitespace characters so the intent
is unambiguous.

* Update strix/skills/vulnerabilities/http_request_smuggling.md

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
…state

Three hot methods were scanning the entire tool_executions dict on every
tick instead of using the per-agent index already maintained by the Tracer.
This made CPU cost proportional to total accumulated tool executions, which
is worst exactly when agents finish and enter waiting/stopped state.

- _agent_has_real_activity: was O(all_tool_executions) at 60ms; now uses
  agents[agent_id]["tool_executions"] index
- _agent_vulnerability_count: same full scan per agent per 350ms tick;
  now scoped to the agent's own executions
- _gather_agent_events: same full scan on every 350ms tick, even before
  the cache check that would discard the result; now scoped per agent

Also stop calling _update_agent_status_display from _animate_dots when the
selected agent is in "waiting" state. The waiting display is static text
("Send message to resume") that never changes until the user acts, but the
60ms timer was pushing Textual widget updates for it at 16fps anyway. The
350ms _update_ui_from_tracer call is sufficient to render the waiting state.
…nning state

Three more performance issues in the running state hot path:

Per-event render cache in _get_rendered_events_content: every 350ms tick
during active streaming caused a full re-render of all events in the
conversation — every chat message through AgentMessageRenderer (including
Pygments syntax highlighting for code blocks) and every tool event. Chat
messages and completed/failed tool events are now cached by (event_id,
status) and only re-rendered when their status changes. Running tool events
are re-rendered each tick as their content may still update.

Skip duplicate _update_agent_status_display in _update_ui_from_tracer when
the dot animation timer is active: _animate_dots (60ms) already calls it
for "running" agents, so the unconditional call from _update_ui_from_tracer
(350ms) was redundant, doubling the widget update rate during active scans.

Fix _get_agent_name_for_vulnerability to use per-agent tool execution index
instead of scanning all tool_executions, consistent with the other O(N)
scan fixes from the previous commit.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 22, 2026

Greptile Summary

This PR fixes excessive CPU usage in the TUI during idle/waiting states by replacing four O(N) linear scans over tracer.tool_executions with O(1) per-agent indexed lookups, and adding a completed-event render cache to avoid re-rendering unchanged tool/chat content on every tick.

  • _agent_has_real_activity, _agent_vulnerability_count, _get_agent_name_for_vulnerability, and _gather_agent_events now iterate only the per-agent tool_executions list instead of the entire global dict.
  • _update_agent_status_display() is skipped in the 0.35 s refresh tick when the dot animation timer is inactive, and the sweep-frame counter advances only for "running" (not "waiting") agents.
  • A new _event_render_cache stores completed/failed/error tool renders and finalized chat renders, keyed by event_id (and event_id + status for tools), cleared on agent switch.

Confidence Score: 4/5

Safe to merge; the core logic changes are correct and well-scoped.

The tool-execution indexing is correct and consistent across all four refactored methods, and the render-cache invalidation on agent switch is properly wired. A single leftover O(N) scan over chat_messages in _gather_agent_events means the fix is incomplete for long scans with heavy chat traffic, but it does not introduce any regression.

strix/interface/tui.py — specifically _gather_agent_events where the chat_messages linear scan was not optimized alongside tool_executions.

Important Files Changed

Filename Overview
strix/interface/tui.py Four helper methods converted from O(N) global scans to O(1) indexed lookups via agent_data["tool_executions"]; event render cache added for completed tool/chat events; _update_agent_status_display() call gated behind animation-timer check. One residual O(N) scan over chat_messages remains in _gather_agent_events.
strix/skills/vulnerabilities/http_request_smuggling.md New skill knowledge file documenting HTTP request smuggling detection/exploitation techniques; no code changes, documentation only.
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
strix/interface/tui.py:1472-1482
**O(N) chat_messages scan left unoptimized**

`_gather_agent_events` now uses O(1) indexed lookups for tool executions, but `chat_messages` is still filtered with a linear scan (`for msg in self.tracer.chat_messages if msg.get("agent_id") == agent_id`). This function is called on every refresh tick, so a long-running scan with many messages across multiple agents will still exhibit the same per-frame O(N) scan cost, just for a different collection. Consider adding a per-agent index to `chat_messages` in the tracer (similar to `tool_executions`) to make this O(1) as well.

Reviews (1): Last reviewed commit: "perf(tui): cache event renders and elimi..." | Re-trigger Greptile

Comment thread strix/interface/tui.py
@mattimustang
Copy link
Copy Markdown
Author

Fixed in d0cbaec.

Added chat_messages_by_agent: dict[str, list[dict[str, Any]]] to Tracer and populate it at write time (setdefault(agent_id, []).append(message_data)). _gather_agent_events now does an O(1) dict lookup instead of a full scan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants