Skip to content

perf: replace B-tree CONTAINS scan with fulltext index for search()#17

Closed
aksOps wants to merge 2 commits into
feat/fix-nestjs-detector-guardsfrom
feat/fulltext-search-index
Closed

perf: replace B-tree CONTAINS scan with fulltext index for search()#17
aksOps wants to merge 2 commits into
feat/fix-nestjs-detector-guardsfrom
feat/fulltext-search-index

Conversation

@aksOps

@aksOps aksOps commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes RAN-66.

B-tree indexes on label_lower/fqn_lower cannot serve CONTAINS predicates in Neo4j — every call to search() caused a full graph scan regardless of the index. This replaces the CONTAINS queries with a fulltext index backed by Lucene.

  • Fulltext index (search_index) created on (n.label_lower, n.fqn_lower) using the keyword analyzer — preserves whole-property tokens so FQNs with dots aren't split by the standard tokeniser
  • GraphStore.bulkSave() — creates search_index alongside the existing B-tree indexes (used by the serve-path bootstrap)
  • EnrichCommand — creates search_index in the secondary-index block and adds CALL db.awaitIndexes(300) so the index is fully ready before the first query
  • search(String, int) and search(String) — use db.index.fulltext.queryNodes() with *text* wildcard wrapping for substring matching
  • toLuceneQuery() helper — lowercases input and escapes Lucene special characters before wrapping

Test plan

  • GraphStoreTest#shouldSearch — passes (mocked, verifies plumbing)
  • GraphStoreExtendedTest#shouldSearchWithLimit — passes
  • EnrichCommandTest — all 3 tests pass
  • Full suite: mvn test — 1473 tests, 0 failures

🤖 Generated with Claude Code

aksOps and others added 2 commits April 1, 2026 17:23
…STENS edges

Tibco EMS, Azure Service Bus/Event Hub, and Spring Events emit different
edge kinds than Kafka/RabbitMQ. TopicLinker previously only matched
PRODUCES/CONSUMES, silently dropping cross-service CALLS edges for
all three messaging patterns.

- Add SENDS_TO and RECEIVES_FROM (Tibco/Azure) as producer/consumer edges
- Add PUBLISHES and LISTENS (Spring Events) as producer/consumer edges
- Add EVENT and MESSAGE_QUEUE node kinds to topic matching (alongside TOPIC/QUEUE)
- Add 4 new test cases: SENDS_TO/RECEIVES_FROM, PUBLISHES/LISTENS, MESSAGE_QUEUE, determinism

Co-Authored-By: Paperclip <noreply@paperclip.ing>
B-tree indexes on label_lower/fqn_lower cannot serve CONTAINS queries in
Neo4j — every search caused a full graph scan. Replace with a fulltext
index using the keyword analyzer so wildcard (*text*) queries are backed
by an index.

- Add FULLTEXT INDEX search_index on (n.label_lower, n.fqn_lower) in
  both GraphStore.bulkSave() and EnrichCommand secondary-index block
- Use keyword analyzer to preserve whole-property tokens (avoids Lucene
  tokenisation splitting FQNs on dots)
- Replace search() CONTAINS queries with
  db.index.fulltext.queryNodes() + *text* wildcard wrapping
- Escape Lucene special characters before wrapping in toLuceneQuery()
- Add CALL db.awaitIndexes(300) after secondary index creation in
  EnrichCommand so the first search request hits the index

Fixes RAN-66

Co-Authored-By: Paperclip <noreply@paperclip.ing>
@aksOps

aksOps commented Apr 3, 2026

Copy link
Copy Markdown
Contributor Author

Code review

Found 2 issues:

  1. EnrichCommand does not write label_lower/fqn_lowersearch_index will index empty fields, breaking all search results

The fulltext index is created on [n.label_lower, n.fqn_lower], but the UNWIND bulk-load path in EnrichCommand.enrichFromCache() only writes label and fqn to the node property map — the lowercased variants are never populated. Every search via the standard index → enrich → serve pipeline will return zero results because the indexed fields are empty on all nodes. The GraphStore.bulkSave() path correctly writes label_lower/fqn_lower (lines 191-192), but EnrichCommand bypasses bulkSave() and its UNWIND batch does not include these properties.

https://github.com/RandomCodeSpace/code-iq/blob/4c3310a08c44c303361c52a432087d065b51b9ec/src/main/java/io/github/randomcodespace/iq/cli/EnrichCommand.java#L194-L200

Fix: add label_lower and fqn_lower to the node props map in EnrichCommand before the UNWIND batch, mirroring what GraphStore.nodeToProps() does at lines 191-192.

  1. toLuceneQuery() does not escape * and ? — user input with wildcards is misinterpreted

The escape list covers 15 Lucene special characters but omits * and ? (the wildcard operators). A search for foo*bar or name? will be passed through to Lucene as multi-wildcard and single-character-wildcard queries respectively instead of matching the literal characters. Both should be escaped before the *...* wrapping is applied.

https://github.com/RandomCodeSpace/code-iq/blob/4c3310a08c44c303361c52a432087d065b51b9ec/src/main/java/io/github/randomcodespace/iq/graph/GraphStore.java#L272-L291

Fix: add .replace("*", "\\*").replace("?", "\\?") to the escape chain in toLuceneQuery().

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@aksOps

aksOps commented Apr 3, 2026

Copy link
Copy Markdown
Contributor Author

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

@aksOps

aksOps commented Apr 3, 2026

Copy link
Copy Markdown
Contributor Author

Closing as chain/review PR — superseded by #16 which targets main directly. All changes from this branch are now in main.

@aksOps aksOps closed this Apr 3, 2026
@aksOps aksOps deleted the feat/fulltext-search-index branch April 3, 2026 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant