Skip to content

feat(trace): SequenceDetector primitive - session-window multi-step attack detection#522

Open
stealthwhizz wants to merge 10 commits into
GenAI-Security-Project:mainfrom
stealthwhizz:feature/trace-sequence-detector
Open

feat(trace): SequenceDetector primitive - session-window multi-step attack detection#522
stealthwhizz wants to merge 10 commits into
GenAI-Security-Project:mainfrom
stealthwhizz:feature/trace-sequence-detector

Conversation

@stealthwhizz

Copy link
Copy Markdown
Contributor

Summary

  • Adds SequenceDetector to finbot/ctf/detectors/primitives/ — the first detector in FinBot that can match attack patterns spanning multiple events across a session or workflow window
  • Adds StepSpec TypedDict matching the interface approved during community bonding
  • Adds composite DB index (session_id, timestamp, event_type) on ctf_events to keep session-window history queries fast
  • 17 unit tests + 1 p95 benchmark test (8.22ms p95 on 1,000 rows, limit 10ms)

This is the Week 1 deliverable for TRACE (GSoC 2026).

What it does

All 14 existing detectors fire on a single event. SequenceDetector queries CTFEvent history to match an ordered sequence of steps within a configurable window. Challenge authors configure it from YAML — no Python required:

detector_class: SequenceDetector
detector_config:
  steps:
    - event_type: "agent.*.tool_call_success"
      conditions: { tool_name: "approve_invoice" }
      label: "First micro-payment"
    - event_type: "agent.*.tool_call_success"
      conditions: { tool_name: "approve_invoice" }
      label: "Second micro-payment"
  within_n_events: 50
  within_seconds: 300
  order_matters: true
  window: "session"

Files changed

File Change
finbot/ctf/detectors/primitives/sequence_detector.py New primitive
finbot/ctf/detectors/primitives/__init__.py Export SequenceDetector, StepSpec
migrations/versions/2026_06_03_add_ctf_event_session_index.py Composite index
tests/unit/ctf/test_sequence_detector.py 17 unit tests
tests/unit/ctf/test_sequence_detector_benchmark.py p95 benchmark

Test plan

  • 17 unit tests pass: full sequence, partial, ordering enforcement, session/workflow windows, all condition operators, glob event_type matching
  • p95 benchmark: 8.22ms on 1,000-row session with composite index (limit: 10ms)
  • Regression: existing detector suite unaffected (no changes to base, registry, or existing detectors)

Notes for reviewers

  • The within_n_events window uses ORDER BY timestamp DESC LIMIT n then reverses — this keeps the query bounded while preserving chronological order for step matching
  • Conditions check details JSON first, then fall back to named CTFEvent columns (tool_name, agent_name, etc.) — this avoids ambiguity between payload fields and model attributes
  • The composite index covers the session-window query shape: WHERE session_id = ? ORDER BY timestamp ASC; event_type is included for potential index-only scans

Adds StepSpec dataclass and SequenceDetector base structure to
finbot/ctf/detectors/primitives/. Includes config validation,
get_relevant_event_types(), and stubbed private helpers for
history querying, step matching, and time-window checks.
check_event() and all helpers are NotImplementedError stubs
pending implementation.
…gration

- Add SequenceDetector to finbot/ctf/detectors/primitives/
  Detects multi-step attack patterns across a session or workflow window.
  Supports ordered step matching, glob event_type patterns, within_n_events
  and within_seconds windows, and all ToolCallDetector field operators.
  Challenge authors configure it from YAML with no Python required.

- Add composite index idx_ctf_event_session_ts_type on (session_id, timestamp,
  event_type) to keep session-window history queries below 10ms p95.

- Export SequenceDetector from finbot/ctf/detectors/primitives/__init__.py

- Add 17 unit tests covering full sequence detection, partial sequences,
  order enforcement, session/workflow windows, condition operators, and
  glob event_type matching.
- Add StepSpec TypedDict to sequence_detector.py matching the approved
  interface spec; export it from primitives __init__

- Add benchmark test: seeds 1,000 CTFEvent rows with composite index,
  runs check_event 100 times, asserts p95 < 10ms
  Current result: p50 ~7ms, p95 ~8ms on SQLite
Copilot AI review requested due to automatic review settings June 2, 2026 19:49

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new SequenceDetector primitive for multi-step pattern detection, plus database/index support and accompanying tests/benchmarking to validate correctness and query performance.

Changes:

  • Introduce SequenceDetector with configurable step matching over session/workflow windows.
  • Add Alembic migration for a composite ctf_events index used by session-window queries.
  • Add unit tests and a SQLite benchmark to catch functional and query-latency regressions; refactor vendor DB session acquisition to use get_db().

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
finbot/ctf/detectors/primitives/sequence_detector.py New detector implementation: config validation, history query, step/condition matching.
finbot/ctf/detectors/primitives/__init__.py Exports SequenceDetector and StepSpec.
migrations/versions/2026_06_03_add_ctf_event_session_index.py Adds composite index intended to speed up session-window queries.
tests/unit/ctf/test_sequence_detector.py Unit coverage for config validation, ordering, windows, and some condition operators.
tests/unit/ctf/test_sequence_detector_benchmark.py Benchmark test for p95 latency of check_event query path.
finbot/tools/data/vendor.py Switches vendor tools from db_session() context manager to get_db() generator pattern.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread finbot/tools/data/vendor.py
Comment thread tests/unit/ctf/test_sequence_detector_benchmark.py
Comment thread tests/unit/ctf/test_sequence_detector_benchmark.py Outdated
Comment thread migrations/versions/2026_06_03_add_ctf_event_session_index.py
Comment thread finbot/ctf/detectors/primitives/sequence_detector.py
Comment thread finbot/ctf/detectors/primitives/sequence_detector.py Outdated
Comment thread tests/unit/ctf/test_sequence_detector.py Outdated
- _check_condition: AND all operators instead of returning on the first;
  conditions like {gte: 10, lte: 20} now work correctly
- within_seconds: return detected=False with clear message when event has
  no timestamp, instead of silently falling back to datetime.now()
- Migration: add namespace as leading index column to match actual query
  filter shape (WHERE namespace=? AND session_id=?)
- Benchmark: use StaticPool so all connections share the same in-memory DB;
  switch to @pytest.mark.asyncio + await instead of get_event_loop();
  adjust SQLite p95 limit to 50ms with note that Postgres target is 10ms
- Remove unused DetectionResult import from test_sequence_detector.py
@stealthwhizz stealthwhizz changed the title feat(trace): SequenceDetector primitive — session-window multi-step attack detection feat(trace): SequenceDetector primitive - session-window multi-step attack detection Jun 4, 2026
- Fix within_n_events docstring: describes history window size (latest N
  events loaded for session), not max events between steps
- Gate p95 benchmark behind RUN_BENCHMARKS env var so it does not cause
  flaky failures on CI runners with variable CPU load;
  run with: RUN_BENCHMARKS=1 pytest

@steadhac steadhac left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this — one blocking issue plus two smaller fixes. Requesting changes on the first.
Blocking (lines 132–162): With order_matters=False, search_from stays at 0, so every step scans history from the start independently. A single event matching multiple steps can satisfy step 1 and step 2 on its own — a 2-step sequence fires from one event, which is exploitable in a CTF. Track consumed history indices and skip them for later steps so no two steps claim the same event.
Should fix (line 255): Unknown/typo operators ("eqals", "startswith") match no elif branch and fall through as "always passes" → false positives. Add an else that raises at config-validation time, or logs a warning and returns False.
Minor (lines 193–197): The 10-string frozenset({...}) is rebuilt on every _matches_step call. Move it to a module-level _CTF_COLUMNS constant.

elif op == "matches":
if not re.search(expected, str(actual), re.IGNORECASE):
return False

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 255 - If someone writes a typo ("eqals") or an unsupported operator ("startswith"), none of the elif branches match, so the loop just moves on without returning False. The condition is silently treated as "always passes," which can produce false-positive detections.
How to fix it: Add an else clause at the end of the operator chain that either raises a ValueError at config-validation time, or at minimum logs a warning and returns False at match time:

else:
logger.warning("Unknown condition operator %r — treating as no-match", op)
return False

"event_type", "event_category", "event_subtype",
"session_id", "workflow_id", "namespace", "user_id",
"vendor_id", "agent_name", "tool_name", "severity",
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 193 - 197 The frozenset({...}) literal with 10 strings is created fresh every single time _matches_step is called. Since this runs inside the step-matching loop (once per event × once per step in history), it adds unnecessary allocation overhead. It's not a bug, but it's wasteful.

Move it to a module-level constant:

At module level, near the top of the file
_CTF_COLUMNS: frozenset[str] = frozenset({
"event_type", "event_category", "event_subtype",
"session_id", "workflow_id", "namespace", "user_id",
"vendor_id", "agent_name", "tool_name", "severity",
})

)
if order_matters:
search_from = found_at + 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines: 132–162 When order_matters=False, search_from stays at 0 for every step. This means each step scans the full history independently from the beginning. If a single event matches two different steps (e.g. event_type = "agent.*.tool_call_success" with no conditions), it will satisfy both step 1 and step 2 on its own, making a 2-step sequence trigger from just 1 event. In a CTF, this is exploitable.

Track which history indices have already been consumed and skip them for subsequent steps. Once an event is matched to a step, mark it as consumed so no other step can claim it

- Track consumed history indices in step matching loop so a single event
  cannot satisfy two steps when order_matters=False; prevents a 2-step
  sequence from firing on one event

- Add else clause in _check_condition for unknown operators: logs a
  warning and returns False instead of silently treating as always-passes

- Move _ctf_columns frozenset to module-level _CTF_COLUMNS constant to
  avoid rebuilding it on every _matches_step call
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants