Skip to content

Add semantic search with embedding-based retrieval #7

@jeremiepas

Description

@jeremiepas

Problem

All query matching is substring-based (T.isInfixOf). An agent asking "how does authentication work" won't match a node labeled "LoginService" — no textual overlap. Memory agents need semantic similarity search.

Solution

Implement hybrid search combining:

  1. Text matching (current): substring score
  2. Embedding similarity: cosine similarity between query and node embeddings
  3. Community proximity: bonus for nodes in same community as top matches

New modules:

  • Domain.Embedding — EmbeddingVector type, cosineSimilarity, euclideanDistance
  • UseCase.SemanticSearch — hybridSearch, semanticSearchNodes
  • Infrastructure.Embedding.OpenAI — OpenAI embeddings API client
  • Infrastructure.Embedding.Local — Local sentence-transformers (future)

Scoring formula:

totalScore = textScore * 1.0 + embScore * 0.7 + commScore * 0.3

New MCP tool:

{
  "name": "search_semantic",
  "description": "Search nodes using hybrid text + semantic similarity",
  "inputSchema": {
    "query": "Search query",
    "mode": "hybrid|text|embedding",
    "limit": 10
  }
}

Acceptance Criteria

  • Domain.Embedding module with EmbeddingVector type and similarity functions
  • Hybrid search outperforms substring-only on conceptual queries
  • Fallback to text-only when no embeddings available (NoEmbeddings provider)
  • OpenAI embeddings API integration works
  • Performance: < 100ms for 10k nodes
  • Embeddings stored separately in embeddings.json (not in graph.json)

Files to Create/Modify

  • src/Graphos/Domain/Embedding.hs (NEW)
  • src/Graphos/Domain/Types/Node.hs — add nodeEmbedding field
  • src/Graphos/UseCase/SemanticSearch.hs (NEW)
  • src/Graphos/Infrastructure/Embedding/OpenAI.hs (NEW)
  • src/Graphos/Infrastructure/Server/MCP.hs — add search_semantic tool

Effort: 3-5 days

Priority: Critical

See: docs/proposals/memory-agent/05-technical-specifications.md (Spec 2, 3)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestmemory-agentMemory agent capabilitiesphase-2Phase 2: Semantic Search

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions