feat(trace): SequenceDetector primitive - session-window multi-step attack detection by stealthwhizz · Pull Request #522 · GenAI-Security-Project/finbot-ctf

stealthwhizz · 2026-06-02T19:49:44Z

Summary

Adds SequenceDetector to finbot/ctf/detectors/primitives/ — the first detector in FinBot that can match attack patterns spanning multiple events across a session or workflow window
Adds StepSpec TypedDict matching the interface approved during community bonding
Adds composite DB index (session_id, timestamp, event_type) on ctf_events to keep session-window history queries fast
17 unit tests + 1 p95 benchmark test (8.22ms p95 on 1,000 rows, limit 10ms)

This is the Week 1 deliverable for TRACE (GSoC 2026).

What it does

All 14 existing detectors fire on a single event. SequenceDetector queries CTFEvent history to match an ordered sequence of steps within a configurable window. Challenge authors configure it from YAML — no Python required:

detector_class: SequenceDetector
detector_config:
  steps:
    - event_type: "agent.*.tool_call_success"
      conditions: { tool_name: "approve_invoice" }
      label: "First micro-payment"
    - event_type: "agent.*.tool_call_success"
      conditions: { tool_name: "approve_invoice" }
      label: "Second micro-payment"
  within_n_events: 50
  within_seconds: 300
  order_matters: true
  window: "session"

Files changed

File	Change
`finbot/ctf/detectors/primitives/sequence_detector.py`	New primitive
`finbot/ctf/detectors/primitives/__init__.py`	Export `SequenceDetector`, `StepSpec`
`migrations/versions/2026_06_03_add_ctf_event_session_index.py`	Composite index
`tests/unit/ctf/test_sequence_detector.py`	17 unit tests
`tests/unit/ctf/test_sequence_detector_benchmark.py`	p95 benchmark

Test plan

17 unit tests pass: full sequence, partial, ordering enforcement, session/workflow windows, all condition operators, glob event_type matching
p95 benchmark: 8.22ms on 1,000-row session with composite index (limit: 10ms)
Regression: existing detector suite unaffected (no changes to base, registry, or existing detectors)

Notes for reviewers

The within_n_events window uses ORDER BY timestamp DESC LIMIT n then reverses — this keeps the query bounded while preserving chronological order for step matching
Conditions check details JSON first, then fall back to named CTFEvent columns (tool_name, agent_name, etc.) — this avoids ambiguity between payload fields and model attributes
The composite index covers the session-window query shape: WHERE session_id = ? ORDER BY timestamp ASC; event_type is included for potential index-only scans

…ot-ctf

Adds StepSpec dataclass and SequenceDetector base structure to finbot/ctf/detectors/primitives/. Includes config validation, get_relevant_event_types(), and stubbed private helpers for history querying, step matching, and time-window checks. check_event() and all helpers are NotImplementedError stubs pending implementation.

…gration - Add SequenceDetector to finbot/ctf/detectors/primitives/ Detects multi-step attack patterns across a session or workflow window. Supports ordered step matching, glob event_type patterns, within_n_events and within_seconds windows, and all ToolCallDetector field operators. Challenge authors configure it from YAML with no Python required. - Add composite index idx_ctf_event_session_ts_type on (session_id, timestamp, event_type) to keep session-window history queries below 10ms p95. - Export SequenceDetector from finbot/ctf/detectors/primitives/__init__.py - Add 17 unit tests covering full sequence detection, partial sequences, order enforcement, session/workflow windows, condition operators, and glob event_type matching.

- Add StepSpec TypedDict to sequence_detector.py matching the approved interface spec; export it from primitives __init__ - Add benchmark test: seeds 1,000 CTFEvent rows with composite index, runs check_event 100 times, asserts p95 < 10ms Current result: p50 ~7ms, p95 ~8ms on SQLite

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new SequenceDetector primitive for multi-step pattern detection, plus database/index support and accompanying tests/benchmarking to validate correctness and query performance.

Changes:

Introduce SequenceDetector with configurable step matching over session/workflow windows.
Add Alembic migration for a composite ctf_events index used by session-window queries.
Add unit tests and a SQLite benchmark to catch functional and query-latency regressions; refactor vendor DB session acquisition to use get_db().

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`finbot/ctf/detectors/primitives/sequence_detector.py`	New detector implementation: config validation, history query, step/condition matching.
`finbot/ctf/detectors/primitives/__init__.py`	Exports `SequenceDetector` and `StepSpec`.
`migrations/versions/2026_06_03_add_ctf_event_session_index.py`	Adds composite index intended to speed up session-window queries.
`tests/unit/ctf/test_sequence_detector.py`	Unit coverage for config validation, ordering, windows, and some condition operators.
`tests/unit/ctf/test_sequence_detector_benchmark.py`	Benchmark test for p95 latency of `check_event` query path.
`finbot/tools/data/vendor.py`	Switches vendor tools from `db_session()` context manager to `get_db()` generator pattern.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- _check_condition: AND all operators instead of returning on the first; conditions like {gte: 10, lte: 20} now work correctly - within_seconds: return detected=False with clear message when event has no timestamp, instead of silently falling back to datetime.now() - Migration: add namespace as leading index column to match actual query filter shape (WHERE namespace=? AND session_id=?) - Benchmark: use StaticPool so all connections share the same in-memory DB; switch to @pytest.mark.asyncio + await instead of get_event_loop(); adjust SQLite p95 limit to 50ms with note that Postgres target is 10ms - Remove unused DetectionResult import from test_sequence_detector.py

- Fix within_n_events docstring: describes history window size (latest N events loaded for session), not max events between steps - Gate p95 benchmark behind RUN_BENCHMARKS env var so it does not cause flaky failures on CI runners with variable CPU load; run with: RUN_BENCHMARKS=1 pytest

steadhac

Thanks for this — one blocking issue plus two smaller fixes. Requesting changes on the first.
Blocking (lines 132–162): With order_matters=False, search_from stays at 0, so every step scans history from the start independently. A single event matching multiple steps can satisfy step 1 and step 2 on its own — a 2-step sequence fires from one event, which is exploitable in a CTF. Track consumed history indices and skip them for later steps so no two steps claim the same event.
Should fix (line 255): Unknown/typo operators ("eqals", "startswith") match no elif branch and fall through as "always passes" → false positives. Add an else that raises at config-validation time, or logs a warning and returns False.
Minor (lines 193–197): The 10-string frozenset({...}) is rebuilt on every _matches_step call. Move it to a module-level _CTF_COLUMNS constant.

steadhac · 2026-06-05T21:46:01Z

+            elif op == "matches":
+                if not re.search(expected, str(actual), re.IGNORECASE):
+                    return False
+


line 255 - If someone writes a typo ("eqals") or an unsupported operator ("startswith"), none of the elif branches match, so the loop just moves on without returning False. The condition is silently treated as "always passes," which can produce false-positive detections.
How to fix it: Add an else clause at the end of the operator chain that either raises a ValueError at config-validation time, or at minimum logs a warning and returns False at match time:

else:
logger.warning("Unknown condition operator %r — treating as no-match", op)
return False

steadhac · 2026-06-05T22:09:49Z

+            "event_type", "event_category", "event_subtype",
+            "session_id", "workflow_id", "namespace", "user_id",
+            "vendor_id", "agent_name", "tool_name", "severity",
+        })


line 193 - 197 The frozenset({...}) literal with 10 strings is created fresh every single time _matches_step is called. Since this runs inside the step-matching loop (once per event × once per step in history), it adds unnecessary allocation overhead. It's not a bug, but it's wasteful.

Move it to a module-level constant:

At module level, near the top of the file
_CTF_COLUMNS: frozenset[str] = frozenset({
"event_type", "event_category", "event_subtype",
"session_id", "workflow_id", "namespace", "user_id",
"vendor_id", "agent_name", "tool_name", "severity",
})

steadhac · 2026-06-06T01:08:53Z

+            )
+            if order_matters:
+                search_from = found_at + 1
+


Lines: 132–162 When order_matters=False, search_from stays at 0 for every step. This means each step scans the full history independently from the beginning. If a single event matches two different steps (e.g. event_type = "agent.*.tool_call_success" with no conditions), it will satisfy both step 1 and step 2 on its own, making a 2-step sequence trigger from just 1 event. In a CTF, this is exploitable.

Track which history indices have already been consumed and skip them for subsequent steps. Once an event is matched to a step, mark it as consumed so no other step can claim it

- Track consumed history indices in step matching loop so a single event cannot satisfy two steps when order_matters=False; prevents a 2-step sequence from firing on one event - Add else clause in _check_condition for unknown operators: logs a warning and returns False instead of silently treating as always-passes - Move _ctf_columns frozenset to module-level _CTF_COLUMNS constant to avoid rebuilding it on every _matches_step call

stealthwhizz added 7 commits March 19, 2026 00:15

merge: sync fork with upstream main

e9a575d

Merge branch 'main' of https://github.com/GenAI-Security-Project/finb…

de48021

…ot-ctf

Merge branch 'main' of https://github.com/GenAI-Security-Project/finb…

9b2b476

…ot-ctf

Merge branch 'main' of https://github.com/GenAI-Security-Project/finb…

ac36d6a

…ot-ctf

Copilot AI review requested due to automatic review settings June 2, 2026 19:49

Copilot AI reviewed Jun 2, 2026

View reviewed changes

stealthwhizz changed the title ~~feat(trace): SequenceDetector primitive — session-window multi-step attack detection~~ feat(trace): SequenceDetector primitive - session-window multi-step attack detection Jun 4, 2026

steadhac reviewed Jun 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(trace): SequenceDetector primitive - session-window multi-step attack detection#522

feat(trace): SequenceDetector primitive - session-window multi-step attack detection#522
stealthwhizz wants to merge 10 commits into
GenAI-Security-Project:mainfrom
stealthwhizz:feature/trace-sequence-detector

stealthwhizz commented Jun 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

steadhac left a comment

Uh oh!

steadhac Jun 5, 2026

Uh oh!

steadhac Jun 5, 2026

Uh oh!

steadhac Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

stealthwhizz commented Jun 2, 2026

Summary

What it does

Files changed

Test plan

Notes for reviewers

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

steadhac left a comment

Choose a reason for hiding this comment

Uh oh!

steadhac Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

steadhac Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

steadhac Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants