Skip to content

feat: graph-augmented search (index_graph, search_graph, graph_neighbors, hybrid_search)#47

Merged
iamvirul merged 12 commits into
mainfrom
feat/graph-search
Jun 9, 2026
Merged

feat: graph-augmented search (index_graph, search_graph, graph_neighbors, hybrid_search)#47
iamvirul merged 12 commits into
mainfrom
feat/graph-search

Conversation

@iamvirul

@iamvirul iamvirul commented Jun 8, 2026

Copy link
Copy Markdown
Member

Pull Request

Type of Change

  • New feature
  • Documentation update
  • Chore (build process, CI/CD, dependency updates)
  • Test improvement

Description

Integrates a graphify-inspired knowledge graph alongside VecGrep's existing vector search, giving AI assistants both semantic similarity and structural code relationships in one plugin.

Vector search is strong for behavioural queries ("find code that does X") but blind to structure ("what calls this function?", "what does this class inherit?"). This PR adds a pure tree-sitter graph layer — no LLM required — and exposes it via 4 new MCP tools.

Token usage benchmarks (measured on VecGrep itself):

Mode Avg tokens Savings vs raw read
Raw file read (baseline) 26,009 -
search_code (top_k=8) ~3,007 88%
hybrid_search (top_k=8) ~3,324 87%
search_graph (limit=8) ~47 >99%

Latency (median, 5 runs):

Mode Latency
search_graph ~3ms (~30x faster than vector)
hybrid_search ~76ms
search_code ~83ms

Related Issues / PRs

Closes #46

Changes Made

  • Added src/vecgrep/graph.pyGraphStore class: tree-sitter AST extraction, build(), search(), neighbors(), chunk_graph_scores(), JSON persistence via networkx
  • Added 4 MCP tools to src/vecgrep/server.py:
    • index_graph(path, force) — builds the knowledge graph (496 nodes, 1251 edges on VecGrep itself)
    • search_graph(query, path, limit) — keyword search over node labels, ~47 tokens avg, ~3ms
    • graph_neighbors(node_id, path, depth) — callers/callees/imports/contains/inherits up to N hops
    • hybrid_search(query, path, top_k, alpha) — blends alpha * vector_score + (1-alpha) * graph_score
  • Added networkx>=3.2 to core deps
  • Pinned tree-sitter==0.21.3 — compatible with tree-sitter-languages 1.10.x (0.22+ broke the API)
  • Added tests/test_graph.py — 23 tests covering extraction, search, neighbors, scores, disk reload
  • Updated tests/conftest.py — preload vecgrep.graph before test_chunker_ast.py patches sys.modules
  • Updated CHANGELOG.md — Unreleased section
  • Updated README.md — Benchmarks section, graph tools docs, no emojis

Testing

  • Unit tests — 23 new tests in tests/test_graph.py, all pass
  • Integration tests — full suite: 218 passed, 0 failed
  • Manual testing — all 4 MCP tools tested live via Claude Code against VecGrep codebase

Manual test results:

  • index_graph — built graph of VecGrep: 496 nodes, 1251 edges, 35 files
  • search_graph("VectorStore") — exact match, score 1.00, degree 39
  • graph_neighbors("VectorStore", depth=1) — 18 callers, 18 methods, correct parent class
  • hybrid_search("VectorStore search method") — fixed wrong vector result (CHANGELOG.md -> store.py), graph_score 1.00

Checklist

  • My code follows the project's style guidelines (ruff passes)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

Screenshots

index_graph output:

Graph built for /Users/.../VecGrep: 496 nodes, 1251 edges, 35 files processed.

search_graph("VectorStore") output:

[1] CLASS  VectorStore  (score: 1.00, degree: 39)
    src/vecgrep/store.py:49-352
    id: vecgrep_store_vectorstore

hybrid_search result quality fix:

Query: "VectorStore search method"
search_code  #1 -> [WRONG] CHANGELOG.md  (vec: 0.53)
hybrid       #1 -> [OK]    store.py       (blended: 0.70, vec: 0.49, graph: 1.00)

@PairReviewer PairReviewer left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces a graph-augmented search layer to VecGrep, blending semantic vector search with structural code relationships via a tree-sitter-based knowledge graph. The implementation is robust, with comprehensive error handling, input validation, and thorough testing. No bugs, security issues, or architectural violations were found; dependencies are correctly pinned and all new features are well-documented. The code is production-ready and can be merged as is.

Note

Review metadata for this run is included below.

Recent review info

Verdict: APPROVE

Inline comments: 0

@iamvirul iamvirul self-assigned this Jun 8, 2026
@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.76543% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/vecgrep/graph.py 98.33% 6 Missing ⚠️

📢 Thoughts on this report? Let us know!

@iamvirul iamvirul merged commit 1911453 into main Jun 9, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Graph-augmented search — integrate graphify-style knowledge graph alongside vector search

2 participants