Skip to content

[CodeGraph] Add optional local structural code-map provider for MAP research #310

Description

@azalio

Source

Local source note: /Users/azalio/Downloads/Telegram Desktop/codegraph_the_open_source_knowledge_graph_that_makes_ai_coding_t.md, extracted from Medium article "CodeGraph: The Open-Source Knowledge Graph That Makes AI Coding Tools Dramatically Cheaper" (https://medium.com/kd-agentic/codegraph-the-open-source-knowledge-graph-that-makes-ai-coding-tools-dramatically-cheaper-190f8b89f8a7).

Source-specific idea used here: CodeGraph reduces agent exploration cost by prebuilding a local structural code graph from parsed source, then answering "what calls this?", "what imports this?", and "where is the handler/entrypoint?" style questions in one tool call instead of file-by-file agent search.

Relevant source takeaways

  • CodeGraph indexes source into a local SQLite/FTS-backed graph of symbols and edges such as imports, calls, inheritance, implementations, and framework routes.
  • The useful mechanism for MAP is not the exact product claim or star count; it is the architecture: separate codebase discovery from solving, return compact structural evidence, and avoid repeated broad searches inside the main Actor context.
  • Tree-sitter is called out because it can parse incomplete code and supports multiple languages; that matters for agent workflows where the tree may be temporarily uncompilable.
  • The article distinguishes semantic/vector search from deterministic structural relationships. MAP's research artifacts already prefer exact file/line evidence; a structural provider would make the upstream localization less probabilistic.
  • Local-first/no-cloud is relevant to MAP's current local workflow posture and avoids requiring user code to leave the machine.
  • Route recognition is useful, but it should be treated as one query family on top of a structural map rather than the first implementation slice.

Repo evidence

  • src/mapify_cli/repo_insight.py:1-5 says repo insight analyzes project structure for language detection, suggested checks, and key directories. src/mapify_cli/repo_insight.py:13-43, :46-82, :85-119, and :122-162 implement exactly that shallow artifact. It does not index symbols, imports, callers, routes, or line-level relationships.
  • src/mapify_cli/dependency_graph.py:1-6 and :66-85 implement a graph, but it is a workflow subtask DAG for cascade invalidation. It is not a repository code graph.
  • src/mapify_cli/templates_src/agents/research-agent.md.jinja:14-22 and :143-150 still describe research as Glob/Grep/Read. src/mapify_cli/templates_src/codex/agents/researcher.toml.jinja:70-83 likewise instructs provider-neutral file discovery + grep + narrow reads.
  • docs/USAGE.md:1719-1731 documents the current RESEARCH path: persisted ResearchEvidence is mandatory, delegation is conditional, and cold-start/high-risk work uses research-agent/researcher plus save_research/validate_research.
  • docs/ARCHITECTURE.md:55-58 says MCP is optional and provider runtimes can call configured MCP servers. docs/ARCHITECTURE.md:32-36 says MAP does not ship or maintain third-party MCP servers, so this should be an optional integration/detection surface, not vendoring CodeGraph.
  • docs/ARCHITECTURE.md:20-30 confirms MAP already owns local generated provider scaffolding, branch artifacts, token accounting, and optional MCP wiring; a local structural map fits that surface if it is optional and artifact-backed.

Existing issue search

Commands/searches used:

Close issues checked:

Why this is not already covered

MAP has strict research artifacts and some advisory ROI telemetry, but the discovery engine is still prompt-instructed Glob/Grep/Read. The repo has a dependency_graph, but it models MAP subtasks, not code symbols. The remaining gap is a local, optional structural code-map provider that can answer targeted relationship queries and emit normal ResearchEvidence, reducing cold-start search loops without changing Actor semantics.

Problem

Cold-start multi-file MAP tasks still pay the mechanical exploration cost that the article describes: researcher/decomposer/Actor use broad file discovery and text search to infer symbol relationships. That increases token/tool-call spend and makes localization quality depend on prompt discipline rather than an explicit code map.

Proposed slice

Add an optional mapify code-map / structural-discovery provider surface that can populate or query a local repository map and feed MAP's existing ResearchEvidence contract.

Concrete first slice:

  • Add a provider-neutral abstraction such as mapify_cli.code_map with a minimal query model: symbols by name, imports/exports, callers/references where available, and file-level dependency edges.
  • Prefer existing local tooling when present, e.g. detect a CodeGraph MCP/server/config and document it as an optional provider. Do not vendor or maintain CodeGraph as a required dependency.
  • Provide a deterministic fallback for Python projects using stdlib ast plus import scanning, so the feature can be tested without network, MCP credentials, or external binaries.
  • Add a runner command that emits compact JSON compatible with or directly convertible to ResearchEvidence: confidence, status, search_method, search_stats, and <=5 relevant_locations with path/lines/signature/relevance.
  • Teach research-agent/researcher prompts to prefer the structural map for locate/impact/pattern queries when available, then fall back to Glob/Grep/Read with an explicit reason.
  • Keep generated templates single-source: edit src/mapify_cli/templates_src/**.jinja, then render templates.

Out of scope for this slice:

  • Full multi-language tree-sitter implementation.
  • Route recognition for all frameworks.
  • Mandatory MCP installation.
  • Cloud indexing or transmitting source code outside the local machine.

Acceptance criteria

  • mapify code-map query or equivalent deterministic helper returns structural evidence for a fixture repo with at least Python imports/classes/functions and line ranges.
  • ResearchEvidence emitted from the code-map path passes the existing validate_research/research eval expectations.
  • Claude and Codex researcher templates mention the structural-map-first path only when a map provider is available and preserve the current Glob/Grep/Read fallback.
  • Tests cover: no provider available, Python fallback success, stale/missing index fallback, unsafe path rejection, and conversion to <=5 relevant locations.
  • Docs explain optional CodeGraph/MCP integration without making MAP responsible for installing/maintaining third-party MCP servers.
  • make render-templates, make check-render, and relevant pytest suites are expected validation gates for implementation.

Guardrails

  • Do not make CodeGraph a mandatory runtime dependency for mapify init.
  • Do not add cloud indexing, API keys, or source upload.
  • Do not bypass existing ResearchEvidence validation; structural-map output must be normal evidence, not a privileged side channel.
  • Do not treat semantic/vector similarity as a substitute for deterministic relationship evidence.
  • Do not use shadow-mode rollout; gate behind explicit optional availability/config and validate directly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions