Skip to content

Design persistent memory structure for MongoDB that accumulates across runs #77

@andthattoo

Description

@andthattoo

Parent: #76

Summary

Instead of introducing a new OrgProfile model, design a memory structure that fits into the existing user/team collections in MongoDB. This structure should accumulate knowledge across runs and be directly usable for guiding future runs.

Motivation

We already have user and team collections in MongoDB. Rather than creating a separate persistence layer, the memory structure should:

  • Live alongside existing user/team documents (or as a linked collection)
  • Grow incrementally after each run without requiring batch reprocessing
  • Be queryable at run start to inject relevant context into agent prompts

Design Considerations

What to store

  • Tech stack & patterns: detected frameworks, libraries, contract patterns (proxy/upgradeable, governor/timelock, etc.)
  • Vulnerability history: per vulnerability class — count, average severity, FP rate, user verdicts
  • Suppression rules: derived from repeated false-positive verdicts
  • Fix preferences: explicit guidelines + inferred style from accepted patches
  • Run metadata: timestamps, repos scanned, findings count, precision metrics

Schema design questions

  • Embedded subdocuments vs separate collection with team reference?
  • How to handle per-repo vs per-team granularity (a team may have multiple repos with different patterns)?
  • Append-only log vs rolling aggregation (or both — log for auditability, aggregation for fast reads)?
  • TTL / decay — should old signal lose weight over time?

Query patterns at run time

The structure needs to support fast reads at run_exploit() time:

  1. "Give me the full memory for team X, repo Y" (context injection)
  2. "What vulnerability classes have >50% FP rate for this team?" (suppression)
  3. "What fix style does this team prefer for reentrancy issues?" (fixer guidance)

Integration with run pipeline

  • Write path: after each run completes, upsert memory with new findings/verdicts
  • Read path: at run_exploit() start, load memory and inject into agent prompts
  • Should be lightweight enough to not add noticeable latency to run startup

Outcome

A MongoDB schema design (with indexes) and a thin data access layer that the dispatcher can use to read memory at run start and write memory at run end.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions