Skip to content

DEV-1464: graph-backed Cypher pre-filter for search#164

Merged
ZmeiGorynych merged 20 commits into
mainfrom
egor/dev-1464-unify-memories-search-on-sqlite-fts5-sqlite-vec-graphqlite
Jun 7, 2026
Merged

DEV-1464: graph-backed Cypher pre-filter for search#164
ZmeiGorynych merged 20 commits into
mainfrom
egor/dev-1464-unify-memories-search-on-sqlite-fts5-sqlite-vec-graphqlite

Conversation

@ZmeiGorynych
Copy link
Copy Markdown
Member

@ZmeiGorynych ZmeiGorynych commented Jun 3, 2026

Summary

  • Adds slayer/search/graph.py: ephemeral in-memory LadybugDB property graph built from StorageBackend state on demand; graph schema covers Memory, Datasource, Model, Column, Measure, Aggregation nodes and MENTIONS / CONTAINS / JOINS edges; hidden models and columns excluded
  • Adds cypher_filter: Optional[str] to SearchService.search() — runs a read-only openCypher MATCH … RETURN … AS id query and pre-filters all three RRF channels (BM25, tantivy, embeddings) to the returned canonical ID set
  • Adds StorageBackend.graph_fingerprint() (default "0"); concrete overrides on YAMLStorage (max YAML file mtime) and SQLiteStorage (db file mtime); double-checked-locking per-storage-path cache in graph.py
  • Wires cypher_filter through all four surfaces: REST SearchRequest, MCP search() tool, SlayerClient.search(), CLI --cypher-filter
  • Renames embedding_search pip extra → advanced_search (adds kuzu alongside litellm + numpy); updates all source, docs, skills, and CLAUDE.md references
  • Fixes pre-existing test_embeddings_client.py failures when litellm is not installed (pytest.importorskip guards)

Test plan

  • poetry run pytest tests/test_search_graph.py — 20 pass, 38 skip (skip = ladybug not installed)
  • poetry run pytest tests/test_search_three_channel.py tests/test_embeddings_client.py — all pass
  • poetry run pytest --ignore=tests/flight --ignore=tests/integration — 3383 pass, 47 skip
  • poetry run ruff check slayer/ tests/ — no issues
  • With pip install motley-slayer[advanced_search] (kuzu present): graph tests go green; slayer search --cypher-filter "MATCH (m:Memory) RETURN m.id AS id" returns memories

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added optional OpenCypher cypher_filter pre-filter for searches (with graceful fallback when graph support isn’t installed)
    • CLI/search surfaces now accept --cypher-filter and --max-results
  • Breaking Changes

    • Search responses now return a unified flat results list (replacing separate memories/entities/example_queries)
    • Per-bucket caps consolidated into a single max_results
  • Documentation

    • Renamed install/feature gate: embedding_searchadvanced_search across docs and examples

Adds an ephemeral in-memory LadybugDB property graph built from
StorageBackend state. An optional cypher_filter parameter on search()
runs an openCypher MATCH query and narrows all three RRF channels to
the returned canonical IDs.

- slayer/search/graph.py: new module — schema creation, node/edge
  insertion, double-checked-locking cache keyed by storage fingerprint,
  Cypher safety validation, get_filtered_ids() entry point
- StorageBackend.graph_fingerprint(): new method (file mtime); concrete
  overrides on YAMLStorage and SQLiteStorage
- SearchService.search(): cypher_filter kwarg; candidate_ids propagated
  to _run_channel_1, _build_channel_1_entity_ranking, _run_channel_2,
  _run_channel_3
- API surfaces: SearchRequest, MCP search(), SlayerClient.search(),
  CLI --cypher-filter all wired through
- extra renamed embedding_search → advanced_search (adds kuzu); all
  references updated in source, docs, skills, CLAUDE.md
- tests: test_search_graph.py (new), test_embeddings_client.py and
  test_search_three_channel.py updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@linear
Copy link
Copy Markdown

linear Bot commented Jun 3, 2026

DEV-1464

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR renames the embedding extra to advanced_search, adds an optional cypher_filter (LadybugDB-backed with a naive fallback), consolidates search outputs to a flat results: List[SearchHit] with a single max_results cap, implements storage graph-fingerprint caching, and updates all surfaces, clients, tests, and docs.

Changes

Advanced Search Feature Rollout

Layer / File(s) Summary
Docs and naming migration
.claude/skills/slayer-overview.md, CLAUDE.md, docs/concepts/*, docs/getting-started/mcp.md, docs/examples/09_lightning_talk/lightning_talk_nb.ipynb, slayer/embeddings/*, slayer/engine/ingestion.py, slayer/search/retrievers/embeddings.py
Renamed embedding_searchadvanced_search in docs, notebook examples, and docstrings; updated CLI/help text and examples to use max_results/results and document cypher_filter.
Dependency & fingerprint contract
pyproject.toml, slayer/storage/base.py, slayer/storage/sqlite_storage.py, slayer/storage/yaml_storage.py, slayer/storage/join_sync.py
Added optional ladybug and new advanced_search extra; introduced async graph_fingerprint() on StorageBackend and implementations for SQLite/YAML/JoinSync for graph cache staleness checks.
Cypher fallback & graph engine
slayer/search/cypher_naive.py, slayer/search/graph.py
Added naive label-only Cypher parser fallback and a LadybugDB-backed ephemeral graph builder with schema, population, Cypher safety validation, fingerprinted caching, async rebuild locking, and get_filtered_ids.
SearchService core refactor
slayer/search/service.py
Unified response model: SearchHit + SearchResponse.results; replaced per-bucket caps with max_results; added cypher_filter allowlist flow (graph or naive), early-empty short-circuit, RRF fusion into a flat ranked list, stale-query warning updates, and in-place column-hit refresh adaptation.
Surfaces & client wiring
slayer/api/server.py, slayer/cli.py, slayer/mcp/server.py, slayer/client/slayer_client.py, slayer/search/__init__.py
REST/MCP/CLI/client accept max_results and cypher_filter, forward them to SearchService, broaden error handling to SlayerError where used, update CLI output to unified results, and export SearchHit (remove legacy hit classes).
Tests: graph, surfaces, unified results
tests/test_search_graph.py, tests/test_cypher_naive.py, tests/test_search_unified.py, tests/*search_*.py, tests/integration/*, tests/search_helpers.py
Added graph and Cypher-validator tests; migrated many existing tests to use unified resp.results and max_results; added helpers seed_warehouse_models and call_mcp_tool; updated integration, surface, and notebook tests accordingly.

Sequence Diagram

sequenceDiagram
  participant Client
  participant SearchService
  participant Graph
  participant Retriever
  participant Storage
  Client->>SearchService: search(question, entities, cypher_filter, max_results)
  SearchService->>Graph: get_filtered_ids(cypher_filter, storage)
  Graph->>Storage: graph_fingerprint()
  Graph-->>SearchService: frozenset(candidate_ids)
  SearchService->>Retriever: retrieve(all_memories, candidate_ids)
  Retriever-->>SearchService: memory/entity rankings
  SearchService->>SearchService: RRF fuse + apply max_results cap
  SearchService-->>Client: SearchResponse(results=[SearchHit, ...])
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • MotleyAI/slayer#166: Overlapping DEV-1532-style refactor implementing unified flat results and cypher_filter with naive fallback.
  • MotleyAI/slayer#145: Related updates to the lightning-talk notebook and its tests to the unified resp.results/max_results contract.
  • MotleyAI/slayer#160: Overlaps on core SearchService orchestration and unified SearchHit flow.

Suggested reviewers

  • whimo

Poem

🐰 I hopped through docs and graph nodes bright,

Advanced search now guides the query's flight.
Cypher whispers ids through Ladybug's hall,
One flat results list answers every call.
The rabbit cheers: unified, tidy, and light.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch egor/dev-1464-unify-memories-search-on-sqlite-fts5-sqlite-vec-graphqlite

ZmeiGorynych and others added 5 commits June 3, 2026 15:24
- graph.py: strip quoted string literals before mutation-keyword scan so
  queries like WHERE m.learning CONTAINS 'call me' are not rejected
- graph.py: comment that Database() no-args is in-memory in kuzu ≥ 0.3
- graph.py: close evicted cache entries (conn + db) to release handles;
  same in clear_cache()
- service.py: pass candidate_ids to _recency_fallback so cypher_filter is
  respected on the empty-input path
- service.py: warn when a named entity is excluded by cypher_filter
- tests: coverage for literal-in-string false-positive fix and recency
  fallback + cypher_filter interaction

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
LadybugDB is the active successor to KuzuDB — same codebase, new name
after the original repo was archived post-acquisition. The pip package
is now `ladybug`; the old `kuzu` name is retained as a legacy fallback
in the import probe.

- pyproject.toml: dep `kuzu` → `ladybug` in both `advanced_search` and `all`
- graph.py: try `ladybug` before `kuzu` in is_available() / _import_graph_module()
- graph.py: update module docstring and comments to reflect the name
- CLAUDE.md, docs/concepts/search.md: drop "LadybugDB/Kuzu" → "LadybugDB"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove all kuzu fallback references from graph.py and pyproject.toml.
LadybugDB (pip: ladybug) is the sole graph backend; kuzu is no longer
mentioned anywhere in the codebase.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extract _insert_model_child_nodes and _connect_entity_mention helpers to
  bring _insert_nodes (21→10) and _insert_mentions_edges (24→9) under
  Sonar's 15-point cognitive complexity ceiling (S3776)
- Replace unused ds variable with _ in _insert_nodes (S1481)
- Add _YAML_EXTS tuple constant; use endswith(_YAML_EXTS) in
  graph_fingerprint to eliminate duplicate literals (S1192/S8513)
- Replace NamedTemporaryFile context managers in async test functions with
  tempfile.mkstemp() to avoid flagging synchronous file I/O in async
  context (S7493)
- Use result.get_column_names() in get_filtered_ids so the id column is
  read by name rather than by position; RETURN n.name, n.id AS id no
  longer silently returns names (Codex)
- Run poetry lock to add ladybug to the lock file (Codex)
- Clean remaining kuzu references from docstrings/comments

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extract _build_valid_entity_sets to bring _insert_mentions_edges
  cognitive complexity from 16 to 3 (Sonar S3776)
- Add NOSONAR(S1192) on _LEGACY_RENAMES — full filenames, not the
  extension constant; the three .yaml occurrences are semantically
  distinct (Sonar S1192)
- Include file_count in YAMLStorage.graph_fingerprint so deleting a
  YAML file invalidates the cache even when it was not the newest file
  (Codex); add test_yaml_storage_fingerprint_changes_after_delete
- Add embedding_search = ["litellm", "numpy"] backward-compat alias
  so installs pinned to motley-slayer[embedding_search] keep working
  (Codex); embedding_search lacks ladybug — advanced_search is the
  full-featured replacement

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
slayer/search/graph.py (1)

344-365: ⚡ Quick win

Extract the valid-id set construction out of _insert_mentions_edges().

This helper is already tripping the Sonar cognitive-complexity gate, and the set-building block can move cleanly into a tiny helper without changing behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/search/graph.py` around lines 344 - 365, Extract the block that builds
ds_set, valid_models, valid_columns, valid_measures, valid_aggs, and
valid_memory_canonicals out of _insert_mentions_edges into a small helper (e.g.,
_build_valid_id_sets or similar) that accepts visible_models, datasource_names,
and memories and returns those five sets; replace the in-function code in
_insert_mentions_edges with a call to that helper and use the returned sets
exactly as before to preserve behavior and reduce cognitive complexity.
slayer/storage/yaml_storage.py (1)

51-51: ⚡ Quick win

Reuse _YAML_EXTS in the other filename checks too.

The new constant only fixes the new fingerprint path; list_models() and list_datasources() still hard-code (".yaml", ".yml"), so the current duplicate-literal Sonar finding stays open and the extension list can drift again.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/storage/yaml_storage.py` at line 51, Replace the hard-coded extension
tuples in list_models() and list_datasources() with the new _YAML_EXTS constant
to avoid duplicated literals and keep extensions consistent; locate the
occurrences in the methods list_models and list_datasources and change their
filename/extension checks to use _YAML_EXTS (the same constant already used for
the fingerprint path) so all YAML checks reference the single source of truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@slayer/search/graph.py`:
- Around line 470-473: The code currently swallows all exceptions when calling
storage.graph_fingerprint(), which hides real backend bugs; change the error
handling so only OSError is treated as a cache-miss and sets current_fp = None
(e.g., catch OSError), and let any other exceptions raised by
storage.graph_fingerprint() propagate (do not catch Exception). Update the
try/except around storage.graph_fingerprint() accordingly so only OSError is
handled.
- Around line 112-116: The current check using _AS_ID_RE on the cypher string
only ensures "AS id" appears anywhere, which allows queries that don't actually
RETURN a single aliased id column; update validation to either parse/regex-check
that the FINAL RETURN clause exposes exactly one column aliased "id" (e.g.
ensure RETURN ... AS id is in the trailing RETURN, not just anywhere) or defer
to runtime by executing the query and inspecting the returned column names in
get_filtered_ids() and raise the same ValueError if the result set does not
contain exactly one "id" column; update the logic around _AS_ID_RE and
get_filtered_ids() accordingly so the user-facing ValueError is raised based on
actual returned columns (reference variables/functions: _AS_ID_RE, cypher,
get_filtered_ids()).

In `@slayer/search/service.py`:
- Around line 1007-1012: The filter step silently drops explicitly named memory
refs; detect memory refs in candidate_ids (entries starting with
_MEMORY_PREFIX), compute which of those IDs are missing after applying the scope
filter (scope derived from all_memories), and emit a warning listing the
excluded memory:<id> entries so callers know their explicit memory references
were removed by the cypher_filter; use the same logger/mechanism used elsewhere
(see _build_channel_1_entity_ranking) to produce a clear message that these
memory:<id> items were filtered out by the cypher_filter.

In `@slayer/storage/base.py`:
- Around line 704-719: The graph_fingerprint method must be converted to an
async API: change the StorageBackend.graph_fingerprint function to "async def
graph_fingerprint(self) -> str" and return the fingerprint from an awaitable;
update all concrete implementations (e.g., YAMLStorage and SQLiteStorage) to
make their graph_fingerprint async too (YAMLStorage can keep synchronous file
I/O inside the async method, SQLiteStorage should use asyncio.to_thread for
blocking calls) and ensure callers (notably slayer.search.graph) await
storage.graph_fingerprint() instead of calling it synchronously so the storage
backend contract remains uniformly async.

In `@slayer/storage/yaml_storage.py`:
- Around line 80-98: The graph_fingerprint function currently swallows OSError
when os.path.getmtime() fails, which can return a stale max_mtime; change the
error handling so that OSError is not ignored but instead is propagated to the
caller to force a rebuild. Locate graph_fingerprint (and the loop using
self.base_dir, _YAML_EXTS and os.path.getmtime) and replace the try/except that
does "except OSError: pass" with behavior that re-raises the exception (or
remove the try/except entirely) so any stat failure surfaces to the caller.

---

Nitpick comments:
In `@slayer/search/graph.py`:
- Around line 344-365: Extract the block that builds ds_set, valid_models,
valid_columns, valid_measures, valid_aggs, and valid_memory_canonicals out of
_insert_mentions_edges into a small helper (e.g., _build_valid_id_sets or
similar) that accepts visible_models, datasource_names, and memories and returns
those five sets; replace the in-function code in _insert_mentions_edges with a
call to that helper and use the returned sets exactly as before to preserve
behavior and reduce cognitive complexity.

In `@slayer/storage/yaml_storage.py`:
- Line 51: Replace the hard-coded extension tuples in list_models() and
list_datasources() with the new _YAML_EXTS constant to avoid duplicated literals
and keep extensions consistent; locate the occurrences in the methods
list_models and list_datasources and change their filename/extension checks to
use _YAML_EXTS (the same constant already used for the fingerprint path) so all
YAML checks reference the single source of truth.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 098aee68-4c93-498f-ad03-3602c539c34b

📥 Commits

Reviewing files that changed from the base of the PR and between 05ee330 and 9730bed.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (25)
  • .claude/skills/slayer-overview.md
  • CLAUDE.md
  • docs/concepts/ingestion.md
  • docs/concepts/memories.md
  • docs/concepts/search.md
  • docs/examples/09_lightning_talk/lightning_talk_nb.ipynb
  • docs/getting-started/mcp.md
  • pyproject.toml
  • slayer/api/server.py
  • slayer/cli.py
  • slayer/client/slayer_client.py
  • slayer/embeddings/client.py
  • slayer/embeddings/ranker.py
  • slayer/embeddings/service.py
  • slayer/engine/ingestion.py
  • slayer/mcp/server.py
  • slayer/search/__init__.py
  • slayer/search/graph.py
  • slayer/search/service.py
  • slayer/storage/base.py
  • slayer/storage/sqlite_storage.py
  • slayer/storage/yaml_storage.py
  • tests/test_embeddings_client.py
  • tests/test_search_graph.py
  • tests/test_search_three_channel.py

Comment thread slayer/search/graph.py
Comment thread slayer/search/graph.py
Comment thread slayer/search/service.py Outdated
Comment thread slayer/storage/base.py
Comment thread slayer/storage/yaml_storage.py Outdated
ZmeiGorynych and others added 8 commits June 3, 2026 17:53
The previous lock was written before embedding_search was added to
pyproject.toml extras; litellm and numpy were not mapped to that extra.
Re-running poetry lock adds the embedding_search → litellm/numpy mapping.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Make graph_fingerprint() async in StorageBackend, YAMLStorage, SQLiteStorage
- YAML fingerprint now includes file_count to detect deletions; OSError propagates
- graph.py: await fingerprint, narrow except to OSError, wrap cypher execution errors
- service.py: add _memory_id_cypher_filter_warnings() and emit on cypher_filter exclusions
- tests: add await to all graph_fingerprint() calls

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- base.py: add asyncio import; base graph_fingerprint adds await asyncio.sleep(0)
  to satisfy Sonar S7503 (async def must use await)
- join_sync.py: JoinSyncStorage.graph_fingerprint() delegates to inner,
  fixing the stale-forever cache bug on CLI/API/MCP paths using from_path()
- graph.py: _storage_key() unwraps JoinSyncStorage to get the stable path key;
  get_filtered_ids() raises a friendly ValueError when id column is absent
- tests: add JoinSyncStorage fingerprint delegation test and
  friendly-error test for missing id column

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sults cap, naive Cypher fallback

Replace the three-bucket search response (memories/example_queries/entities) with
a single ranked results: List[SearchHit] list capped by max_results. Remove the
per-bucket max_memories/max_example_queries/max_entities params. Add a naive Cypher
parser (cypher_naive.py) that supports MATCH (n:Label1:Label2) RETURN n.id AS id
without the advanced_search extra, raising SlayerError for complex Cypher. Fix
LadybugDB 0.15.x incompatibilities: Column→ModelColumn node table rename, $desc→$descr
parameter rename, Connection() API change.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add ModelColumn alias to naive parser label map (CR thread 2)
- Fix ReDoS in _NAIVE_PATTERN: structured (w+(:w+)*) avoids overlapping
  quantifiers (Sonar S5852 hotspot)
- Add SlayerError to except clauses in REST, CLI, and MCP search handlers
  so invalid cypher_filter expressions return 400/clean error, not 500/crash
  (Codex finding)
- Convert intentional TypeError tests to **{} kwarg form to avoid Sonar
  S930 BLOCKER false positives
- Extract _build_hit_from_fused_key and _apply_cypher_filter helpers,
  _query_bearing_memory_hits, and simplify _call_mcp_tool to reduce
  cognitive complexity below S3776 threshold (CC 21/18 -> <15)
- Move shared test model seed and MCP tool helper to tests/search_helpers.py
  to reduce cross-file duplicate-lines density below gate threshold

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Simplify search handler except tuples to (SlayerError, ValueError):
  EntityResolutionError and AmbiguousModelError are SlayerError subclasses
  and redundant in the same tuple (Sonar S5713)
- Suppress S5843 on _NAIVE_PATTERN: each \s* and \w token maps to a
  distinct syntactic Cypher element; complexity cannot be reduced without
  losing expressiveness
- Update naive parser docs to mention Column/ModelColumn alias

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix naive regex to backreference MATCH variable name in RETURN clause
  so MATCH (m:Memory) RETURN n.id AS id is correctly rejected (Codex)
- Add warning when named memory:<id> refs are excluded by naive kind_filter,
  for parity with the graph path's candidate_ids exclusion warnings (Codex)
- Move NOSONAR(S5843) to the flagged pattern string line
- Use keyword args (pattern=, flags=) for re.compile calls per repo
  convention (CR nitpick)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ost-cypher

DEV-1532: unify search() — flat results list, single max_results cap, naive Cypher fallback
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
tests/test_search_named_entity_surfacing.py (1)

715-734: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

This test no longer exercises the low-cap path it describes.

The docstring says this is a capped scenario (max_results=1), but the call uses max_results=20. That weakens coverage for “warning still emitted under tight cap”.

Suggested fix
 async def test_stale_query_warning_for_named_memory_id_cap_zero(
     storage: YAMLStorage, service: SearchService,
 ) -> None:
@@
     resp = await service.search(
         entities=[f"memory:{seed.id}"],
-        max_results=20,
+        max_results=1,
     )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_search_named_entity_surfacing.py` around lines 715 - 734, The test
test_stale_query_warning_for_named_memory_id_cap_zero claims to exercise the
low-cap path but calls service.search with max_results=20; change that call to
use max_results=1 so the test actually enforces the tight-cap scenario described
in the docstring (update the service.search invocation where
entities=[f"memory:{seed.id}"] to pass max_results=1).
slayer/search/service.py (2)

947-961: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Explain when cypher_filter removes every recency candidate.

In the empty-input path, candidate_ids or kind_filter can reduce recency_memories to [], but the response still warns that it's “returning newest memories by recency.” That message is false in this branch and makes an entity-only filter look like a no-op instead of an intentional empty result.

💡 Suggested adjustment
         if candidate_ids is not None:
             recency_memories = [
                 m for m in recency_memories
                 if f"memory:{m.id}" in candidate_ids
             ]
         if kind_filter is not None and "memory" not in kind_filter:
             recency_memories = []
+        if not recency_memories and (
+            candidate_ids is not None or kind_filter is not None
+        ):
+            warnings.append(
+                "cypher_filter excluded all memory candidates for the "
+                "empty-input recency fallback."
+            )
         recency_memories.sort(key=lambda m: m.created_at, reverse=True)

Based on learnings, Search cypher_filter graph pre-filter (DEV-1464): all surfaces accept optional cypher_filter ... returned IDs become hard allowlist for all three channels.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/search/service.py` around lines 947 - 961, The warning "returning
newest memories by recency" is misleading when post-filtering (e.g.,
candidate_ids, kind_filter or cypher_filter pre-filter) leaves recency_memories
empty; update the logic in the empty-input path (around
_filter_memories_by_datasource, self._storage.list_memories) to compute
recency_memories first, then check if candidate_ids or kind_filter (and any
cypher_filter-derived allowlist) have reduced it to [] and either (a) change the
warnings.append text to explicitly say "no recency candidates after filters;
returning no memories" or (b) suppress the recency warning and instead append a
specific warning about filters eliminating all recency candidates; ensure you
reference candidate_ids, kind_filter, and any cypher_filter-derived allowlist
when constructing the new message.

1247-1254: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don't blame missing embeddings when the allowlist filtered them out.

After the new candidate_ids filter runs, rows can become empty even though embeddings exist and ingestion is fine. The current warning tells users to rerun slayer ingest, which is misleading for this path.

💡 Suggested adjustment
         live_canonicals = corpus.canonical_to_kind.keys()
         rows = [r for r in rows if r.canonical_id in live_canonicals]
         # DEV-1464: apply Cypher pre-filter to the embedding corpus.
+        rows_before_candidate_filter = len(rows)
         if candidate_ids is not None:
             rows = [r for r in rows if r.canonical_id in candidate_ids]
         if not rows:
+            if candidate_ids is not None and rows_before_candidate_filter > 0:
+                return [], [], [
+                    "embedding channel skipped: cypher_filter excluded all "
+                    "embedding candidates."
+                ]
             return [], [], [
                 f"embedding channel skipped: no embedding rows for model "
                 f"{service.model_name!r}. Run `slayer ingest` to populate.",
             ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/search/service.py` around lines 1247 - 1254, The current check that
returns the "no embedding rows" error after applying the candidate_ids filter
can mislead users; update the conditional that checks rows after filtering (the
block using candidate_ids, rows, and service.model_name) to return a different,
accurate message when candidate_ids is provided—e.g., "embedding channel
skipped: no embedding rows for model 'X' after candidate_ids filter; embeddings
may exist but were filtered by your allowlist"—and keep the original
ingest-suggestion only when candidate_ids is None so the user is not told to
rerun ingestion when rows were filtered out by candidate_ids.
🧹 Nitpick comments (3)
tests/test_search_service.py (1)

413-435: ⚡ Quick win

Legacy max_example_queries naming is now misleading in these tests.

Both tests assert flat max_results behavior, not an independent example-queries cap. Renaming them would better reflect the contract being tested.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_search_service.py` around lines 413 - 435, Rename the two tests
that misleadingly reference "max_example_queries" to reflect they are testing
flat max_results behavior: update test_max_example_queries_default_is_two and
test_max_example_queries_caps_independently to names like
test_search_respects_max_results_default and
test_search_respects_max_results_cap, and adjust their inline comments to state
they assert SearchService.search's max_results limits (referencing
service_with_query_memories.search and response.results) so the test names and
descriptions match the contract being validated.
tests/test_search_named_entity_surfacing.py (1)

406-424: ⚡ Quick win

Rename this test to match what it now verifies.

The function name says “suppresses” output, but the assertion verifies the named entity does surface. This is misleading and makes failures harder to interpret.

Suggested cleanup
-async def test_max_entities_zero_suppresses_channel_1_named_output(
+async def test_named_entity_surfaces_with_positive_max_results(
     service: SearchService,
 ) -> None:
-    # With max_results=1, a single memory hit (if any) would fill the slot,
-    # but since no memories are tagged on this entity and no question is provided,
-    # only the named entity hit would appear. Use max_results=20 and filter.
-    # The original test checked max_entities=0 suppressed entity output.
-    # With the flat API, we just verify that entity hits are present when max_results>0.
-    # To mirror the old behavior of max_entities=0, we verify the named entity
-    # surfaces when max_results is large enough.
     resp = await service.search(
         entities=["warehouse.orders.amount"],
         max_results=20,
     )
     entity_hits = [h for h in resp.results if h.kind != "memory"]
-    # Named entity should surface (unlike old max_entities=0 which suppressed it).
-    # This test is now a positive assertion: the entity IS present.
     assert any(h.id == "warehouse.orders.amount" for h in entity_hits)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_search_named_entity_surfacing.py` around lines 406 - 424, The test
function test_max_entities_zero_suppresses_channel_1_named_output is misnamed
because it asserts the named entity DOES surface; rename it to a positive,
accurate name (e.g., test_named_entity_surfaces_when_max_results_large) and
update any references/imports or test markers accordingly; keep the test body
(calls service.search with entities=["warehouse.orders.amount"], max_results=20
and the assertion on resp.results) unchanged so the behavior remains the same
while the name matches the verified outcome.
slayer/search/service.py (1)

914-916: ⚡ Quick win

Use keyword arguments for the graph lookup call.

This is the only new multi-parameter call in the cypher path that still relies on argument order, which is out of step with the repo convention.

As per coding guidelines, **/*.py: Use keyword arguments for functions with more than 1 parameter.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@slayer/search/service.py` around lines 914 - 916, The call to
_search_graph.get_filtered_ids currently relies on positional arguments
(cypher_filter, self._storage); change it to use keyword arguments to match repo
convention for multi-parameter functions—call get_filtered_ids with named
parameters (e.g., cypher_filter=..., storage=...) using the existing symbols
cypher_filter and self._storage so the call is explicit and order-independent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@slayer/api/server.py`:
- Around line 145-152: SearchRequest.max_results is an unconstrained int which
lets invalid values through to SearchService.search; change its declaration to
use Pydantic's Field with a minimum of 1 (e.g., Field(default=10, ge=1)) so the
model self-validates and OpenAPI shows the correct constraint, and add the
necessary import (Field) at the top of the file; keep SearchService.search's
existing runtime check for <1 for defense in depth.

In `@slayer/cli.py`:
- Around line 595-603: Update the "--cypher-filter" help text in slayer/cli.py
(the cypher_filter option) to document the multi-label fallback: mention that
MATCH patterns like "MATCH (n:Label1:Label2) RETURN n.id AS id" are supported
and are treated as a union of labels (not a single conjunctive label), matching
the behavior implemented in slayer/search/cypher_naive.py; keep the note about
advanced_search limitations and, for consistency, mirror the wording used in
docs/concepts/search.md so the CLI advertises the actual parser behavior.
- Around line 774-778: The loop over response.results currently prints every
memory hit the same (using hit.kind == "memory"), making learning-only and
query-bearing memories indistinguishable; update the formatter in the loop that
references response.results / hit.kind / hit.id / hit.score / hit.text to detect
whether a memory hit contains a query (check for the query-bearing field your
model uses, e.g., hit.query, hit.query_text, or hit.example_query) and adjust
the prefix accordingly (e.g., "M" for plain memory and "M[Q]" or "M(Q)" for
query-bearing memories) so the CLI output clearly distinguishes the two types.

In `@slayer/mcp/server.py`:
- Around line 2884-2888: Update the cypher_filter help text in server.py to
document that the naive fallback parser (slayer/search/cypher_naive.py) accepts
multi-label MATCH clauses with union semantics (e.g., MATCH (n:LabelA) RETURN
n.id AS id UNION MATCH (n:LabelB) RETURN n.id AS id) and that the public-facing
label set for the non-advanced_search path includes Memory, Datasource, Model,
Column, Measure, Aggregation; ensure the wording replaces the single-label
implication and explicitly notes the union-style multi-label support and the
exact allowed labels so agents see the correct contract (reference:
cypher_filter docstring and the naive parser in cypher_naive.py).

In `@slayer/search/__init__.py`:
- Around line 11-19: The package removed public aliases EntityHit and MemoryHit
causing import-time failures; restore them as deprecated re-exports in
slayer.search.__init__ by importing or assigning EntityHit = SearchHit (or the
actual types from slayer.search.service) and MemoryHit = SearchHit (or their
real service symbols) and include "EntityHit" and "MemoryHit" in the __all__
list so existing "from slayer.search import EntityHit, MemoryHit" calls keep
working while signaling deprecation.

---

Outside diff comments:
In `@slayer/search/service.py`:
- Around line 947-961: The warning "returning newest memories by recency" is
misleading when post-filtering (e.g., candidate_ids, kind_filter or
cypher_filter pre-filter) leaves recency_memories empty; update the logic in the
empty-input path (around _filter_memories_by_datasource,
self._storage.list_memories) to compute recency_memories first, then check if
candidate_ids or kind_filter (and any cypher_filter-derived allowlist) have
reduced it to [] and either (a) change the warnings.append text to explicitly
say "no recency candidates after filters; returning no memories" or (b) suppress
the recency warning and instead append a specific warning about filters
eliminating all recency candidates; ensure you reference candidate_ids,
kind_filter, and any cypher_filter-derived allowlist when constructing the new
message.
- Around line 1247-1254: The current check that returns the "no embedding rows"
error after applying the candidate_ids filter can mislead users; update the
conditional that checks rows after filtering (the block using candidate_ids,
rows, and service.model_name) to return a different, accurate message when
candidate_ids is provided—e.g., "embedding channel skipped: no embedding rows
for model 'X' after candidate_ids filter; embeddings may exist but were filtered
by your allowlist"—and keep the original ingest-suggestion only when
candidate_ids is None so the user is not told to rerun ingestion when rows were
filtered out by candidate_ids.

In `@tests/test_search_named_entity_surfacing.py`:
- Around line 715-734: The test
test_stale_query_warning_for_named_memory_id_cap_zero claims to exercise the
low-cap path but calls service.search with max_results=20; change that call to
use max_results=1 so the test actually enforces the tight-cap scenario described
in the docstring (update the service.search invocation where
entities=[f"memory:{seed.id}"] to pass max_results=1).

---

Nitpick comments:
In `@slayer/search/service.py`:
- Around line 914-916: The call to _search_graph.get_filtered_ids currently
relies on positional arguments (cypher_filter, self._storage); change it to use
keyword arguments to match repo convention for multi-parameter functions—call
get_filtered_ids with named parameters (e.g., cypher_filter=..., storage=...)
using the existing symbols cypher_filter and self._storage so the call is
explicit and order-independent.

In `@tests/test_search_named_entity_surfacing.py`:
- Around line 406-424: The test function
test_max_entities_zero_suppresses_channel_1_named_output is misnamed because it
asserts the named entity DOES surface; rename it to a positive, accurate name
(e.g., test_named_entity_surfaces_when_max_results_large) and update any
references/imports or test markers accordingly; keep the test body (calls
service.search with entities=["warehouse.orders.amount"], max_results=20 and the
assertion on resp.results) unchanged so the behavior remains the same while the
name matches the verified outcome.

In `@tests/test_search_service.py`:
- Around line 413-435: Rename the two tests that misleadingly reference
"max_example_queries" to reflect they are testing flat max_results behavior:
update test_max_example_queries_default_is_two and
test_max_example_queries_caps_independently to names like
test_search_respects_max_results_default and
test_search_respects_max_results_cap, and adjust their inline comments to state
they assert SearchService.search's max_results limits (referencing
service_with_query_memories.search and response.results) so the test names and
descriptions match the contract being validated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d30e421c-b137-4165-96e6-d8fa46081235

📥 Commits

Reviewing files that changed from the base of the PR and between 58036dc and b04a0dd.

📒 Files selected for processing (27)
  • CLAUDE.md
  • docs/concepts/search.md
  • docs/examples/09_lightning_talk/lightning_talk_nb.ipynb
  • slayer/api/server.py
  • slayer/cli.py
  • slayer/client/slayer_client.py
  • slayer/mcp/server.py
  • slayer/search/__init__.py
  • slayer/search/cypher_naive.py
  • slayer/search/graph.py
  • slayer/search/service.py
  • tests/integration/test_integration.py
  • tests/search_helpers.py
  • tests/test_cypher_naive.py
  • tests/test_lightning_talk_notebook.py
  • tests/test_memory_id_in_responses.py
  • tests/test_search_datasource_filter.py
  • tests/test_search_example_query_stale_warning.py
  • tests/test_search_graph.py
  • tests/test_search_invariance.py
  • tests/test_search_lazy_gc_in_memory.py
  • tests/test_search_lenient_validation.py
  • tests/test_search_named_entity_surfacing.py
  • tests/test_search_service.py
  • tests/test_search_surfaces.py
  • tests/test_search_three_channel.py
  • tests/test_search_unified.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • slayer/search/graph.py

Comment thread slayer/api/server.py
Comment thread slayer/cli.py
Comment thread slayer/cli.py
Comment thread slayer/mcp/server.py Outdated
Comment thread slayer/search/__init__.py
…emories-search-on-sqlite-fts5-sqlite-vec-graphqlite

# Conflicts:
#	poetry.lock
#	pyproject.toml
#	slayer/embeddings/service.py
#	slayer/engine/ingestion.py
#	slayer/search/service.py
#	tests/test_search_service.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_search_service_orchestration.py`:
- Around line 188-198: The test uses wall-clock elapsed thresholds to infer
parallelism for service.search which is flaky; instead, instrument the mocked
retrievers used in the test to record per-retriever start and end timestamps
(e.g., add start/stop markers inside the async mock retriever coroutine used by
the test), run the search, then assert that the recorded intervals overlap (for
example assert max(start_times) < min(end_times) or that at least two retriever
intervals overlap) rather than relying on elapsed < delay * 2.5; update the
assertions that reference elapsed and remove the timing threshold checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e7c370eb-ed35-44f9-9b9f-d585776ace27

📥 Commits

Reviewing files that changed from the base of the PR and between b04a0dd and 19b7a95.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (15)
  • CLAUDE.md
  • docs/concepts/search.md
  • pyproject.toml
  • slayer/api/server.py
  • slayer/engine/ingestion.py
  • slayer/mcp/server.py
  • slayer/search/retrievers/embeddings.py
  • slayer/search/service.py
  • slayer/storage/sqlite_storage.py
  • tests/test_search_graph.py
  • tests/test_search_invariance.py
  • tests/test_search_service.py
  • tests/test_search_service_orchestration.py
  • tests/test_search_surfaces.py
  • tests/test_search_three_channel.py
✅ Files skipped from review due to trivial changes (3)
  • slayer/search/retrievers/embeddings.py
  • CLAUDE.md
  • slayer/engine/ingestion.py
🚧 Files skipped from review as they are similar to previous changes (10)
  • pyproject.toml
  • slayer/api/server.py
  • docs/concepts/search.md
  • tests/test_search_three_channel.py
  • tests/test_search_invariance.py
  • slayer/mcp/server.py
  • tests/test_search_surfaces.py
  • slayer/storage/sqlite_storage.py
  • slayer/search/service.py
  • tests/test_search_graph.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_search_service_orchestration.py`:
- Around line 188-198: The test uses wall-clock elapsed thresholds to infer
parallelism for service.search which is flaky; instead, instrument the mocked
retrievers used in the test to record per-retriever start and end timestamps
(e.g., add start/stop markers inside the async mock retriever coroutine used by
the test), run the search, then assert that the recorded intervals overlap (for
example assert max(start_times) < min(end_times) or that at least two retriever
intervals overlap) rather than relying on elapsed < delay * 2.5; update the
assertions that reference elapsed and remove the timing threshold checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e7c370eb-ed35-44f9-9b9f-d585776ace27

📥 Commits

Reviewing files that changed from the base of the PR and between b04a0dd and 19b7a95.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (15)
  • CLAUDE.md
  • docs/concepts/search.md
  • pyproject.toml
  • slayer/api/server.py
  • slayer/engine/ingestion.py
  • slayer/mcp/server.py
  • slayer/search/retrievers/embeddings.py
  • slayer/search/service.py
  • slayer/storage/sqlite_storage.py
  • tests/test_search_graph.py
  • tests/test_search_invariance.py
  • tests/test_search_service.py
  • tests/test_search_service_orchestration.py
  • tests/test_search_surfaces.py
  • tests/test_search_three_channel.py
✅ Files skipped from review due to trivial changes (3)
  • slayer/search/retrievers/embeddings.py
  • CLAUDE.md
  • slayer/engine/ingestion.py
🚧 Files skipped from review as they are similar to previous changes (10)
  • pyproject.toml
  • slayer/api/server.py
  • docs/concepts/search.md
  • tests/test_search_three_channel.py
  • tests/test_search_invariance.py
  • slayer/mcp/server.py
  • tests/test_search_surfaces.py
  • slayer/storage/sqlite_storage.py
  • slayer/search/service.py
  • tests/test_search_graph.py
🛑 Comments failed to post (1)
tests/test_search_service_orchestration.py (1)

188-198: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid wall-clock timing assertions for parallelism checks.

Line 188 and Line 193 rely on elapsed-time thresholds; this can be flaky on loaded CI even when asyncio.gather is correct. Prefer overlap/event-based assertions (e.g., record start/stop markers per retriever and assert overlap directly).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_search_service_orchestration.py` around lines 188 - 198, The test
uses wall-clock elapsed thresholds to infer parallelism for service.search which
is flaky; instead, instrument the mocked retrievers used in the test to record
per-retriever start and end timestamps (e.g., add start/stop markers inside the
async mock retriever coroutine used by the test), run the search, then assert
that the recorded intervals overlap (for example assert max(start_times) <
min(end_times) or that at least two retriever intervals overlap) rather than
relying on elapsed < delay * 2.5; update the assertions that reference elapsed
and remove the timing threshold checks.

ZmeiGorynych and others added 5 commits June 5, 2026 16:05
Service correctness:
- service.py: run the off-datasource warning against the pre-cypher-filter
  memory set so a memory rooted at the requested datasource but excluded
  by cypher_filter doesn't get a spurious 'not rooted at datasource'
  warning on top of the legitimate cypher_filter warning.
- service.py: in the empty-input recency fallback, emit a dedicated
  'cypher_filter excluded all memory candidates' warning when the filter
  zeroed an otherwise-populated pool — the generic 'returning newest'
  message otherwise reads as a healthy empty corpus.

API / docs:
- api/server.py: SearchRequest.max_results now Field(default=10, ge=1)
  so REST self-validates the constraint and OpenAPI surfaces it.
- cli.py / mcp/server.py: --cypher-filter / cypher_filter help text
  documents the multi-label MATCH (n:Label1:Label2) union semantics
  supported by the naive fallback, plus the allowed label set.
- cli.py: text-mode results distinguish query-bearing memory hits with
  a 'Q' prefix vs 'M' for learning-only.

Tests:
- new test pinning that cypher_filter-excluded memories rooted at the
  requested datasource do NOT emit the off-datasource warning.
- new test pinning the recency-fallback warning fires when filters clear
  the pool.
- renamed two test_search_service.py tests that referenced the removed
  max_example_queries kwarg.
- renamed test_max_entities_zero_suppresses_channel_1_named_output (the
  body asserts the named entity surfaces — the name said the opposite).
- test_search_named_entity_surfacing.py 'cap_zero' stale-query test now
  uses max_results=1 instead of 20 so it actually exercises tight-cap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
openCypher's REMOVE clause strips properties / labels and is a mutating
clause, but _MUTATION_RE was only checking for CREATE | MERGE | DELETE |
SET | DROP | CALL. A query like `MATCH (n) REMOVE n.name RETURN n.id AS
id` was passing validation and could mutate the cached in-memory
LadybugDB graph, corrupting subsequent cypher_filter results until the
cache rebuilt from a fingerprint bump.

Adds REMOVE to the regex and two parametrised rejection cases (property
removal and label removal). The cypher_filter advertised read-only
contract is now actually enforced.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Brings the user-reachable interface, reference, and getting-started
pages plus the orphan interfaces/ files in line with the DEV-1532
unified-flat-results shape and the DEV-1464 graph-backed cypher_filter.
Also strips residual typed-bucket ("memories / example_queries /
entities") language from memories.md, the lightning-talk page, and the
slayer-overview skill, and replaces the lightning talk's "Coming next:
Cypher" line with a here-today description.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- slayer/search/service.py: switch the get_filtered_ids call inside
  _apply_cypher_filter to keyword arguments, per the repo convention
  for multi-parameter functions (CodeRabbit).
- tests/test_search_graph.py: each candidate-ID test that patches
  slayer.search.graph.get_filtered_ids now also stacks a
  patch("slayer.search.graph.is_available", return_value=True) so the
  full-graph branch of _apply_cypher_filter is forced on regardless of
  whether the optional ladybug package is installed locally (Codex).
  Affects test_candidate_ids_narrow_memory_corpus_seen_by_retrievers,
  test_stale_canonical_ids_in_candidate_set_cause_no_error,
  test_cypher_filter_applied_to_recency_fallback,
  test_cypher_filter_exclusion_does_not_emit_off_datasource_warning,
  and test_recency_fallback_warns_when_filters_excluded_all.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…dex)

The previous _validate_cypher was a denylist over a small mutation-keyword
set plus a semicolon ban and an "AS id" anywhere-check. That left
non-mutating but still-dangerous Kuzu clauses — LOAD FROM '/etc/passwd',
UNWIND scans, IMPORT/EXPORT DATABASE, INSTALL/LOAD extension — able to
reach conn.execute() from the public search surface so long as they
contained "AS id" somewhere. The documented contract has always been
read-only MATCH-only traversal; the validator now enforces it.

slayer/search/graph.py: add _STARTS_WITH_MATCH_RE and check it before
the mutation-keyword scan. The new check is whitespace- and
case-insensitive so legitimate inputs ("  match (m:Memory) RETURN m.id
AS id") still pass.

tests/test_search_graph.py: extend the rejection parametrize block with
six new attack-vector strings (LOAD FROM, UNWIND, IMPORT DATABASE,
EXPORT DATABASE, INSTALL httpfs, bare RETURN) plus empty/whitespace-only
inputs; add a positive-case test for whitespace + case variants of the
leading MATCH; update test_validate_cypher_still_rejects_bare_mutation_keyword
so the bare-CALL case asserts the new "must start with MATCH" message
(it's now caught one layer earlier than the keyword denylist).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jun 7, 2026

@ZmeiGorynych ZmeiGorynych merged commit 95cc40e into main Jun 7, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant