[CodeGraph] Add optional local structural code-map provider for MAP research

## Source

Local source note: `/Users/azalio/Downloads/Telegram Desktop/codegraph_the_open_source_knowledge_graph_that_makes_ai_coding_t.md`, extracted from Medium article "CodeGraph: The Open-Source Knowledge Graph That Makes AI Coding Tools Dramatically Cheaper" (`https://medium.com/kd-agentic/codegraph-the-open-source-knowledge-graph-that-makes-ai-coding-tools-dramatically-cheaper-190f8b89f8a7`).

Source-specific idea used here: CodeGraph reduces agent exploration cost by prebuilding a local structural code graph from parsed source, then answering "what calls this?", "what imports this?", and "where is the handler/entrypoint?" style questions in one tool call instead of file-by-file agent search.

## Relevant source takeaways

- CodeGraph indexes source into a local SQLite/FTS-backed graph of symbols and edges such as imports, calls, inheritance, implementations, and framework routes.
- The useful mechanism for MAP is not the exact product claim or star count; it is the architecture: separate codebase discovery from solving, return compact structural evidence, and avoid repeated broad searches inside the main Actor context.
- Tree-sitter is called out because it can parse incomplete code and supports multiple languages; that matters for agent workflows where the tree may be temporarily uncompilable.
- The article distinguishes semantic/vector search from deterministic structural relationships. MAP's research artifacts already prefer exact file/line evidence; a structural provider would make the upstream localization less probabilistic.
- Local-first/no-cloud is relevant to MAP's current local workflow posture and avoids requiring user code to leave the machine.
- Route recognition is useful, but it should be treated as one query family on top of a structural map rather than the first implementation slice.

## Repo evidence

- `src/mapify_cli/repo_insight.py:1-5` says repo insight analyzes project structure for language detection, suggested checks, and key directories. `src/mapify_cli/repo_insight.py:13-43`, `:46-82`, `:85-119`, and `:122-162` implement exactly that shallow artifact. It does not index symbols, imports, callers, routes, or line-level relationships.
- `src/mapify_cli/dependency_graph.py:1-6` and `:66-85` implement a graph, but it is a workflow subtask DAG for cascade invalidation. It is not a repository code graph.
- `src/mapify_cli/templates_src/agents/research-agent.md.jinja:14-22` and `:143-150` still describe research as Glob/Grep/Read. `src/mapify_cli/templates_src/codex/agents/researcher.toml.jinja:70-83` likewise instructs provider-neutral file discovery + grep + narrow reads.
- `docs/USAGE.md:1719-1731` documents the current RESEARCH path: persisted ResearchEvidence is mandatory, delegation is conditional, and cold-start/high-risk work uses research-agent/researcher plus `save_research`/`validate_research`.
- `docs/ARCHITECTURE.md:55-58` says MCP is optional and provider runtimes can call configured MCP servers. `docs/ARCHITECTURE.md:32-36` says MAP does not ship or maintain third-party MCP servers, so this should be an optional integration/detection surface, not vendoring CodeGraph.
- `docs/ARCHITECTURE.md:20-30` confirms MAP already owns local generated provider scaffolding, branch artifacts, token accounting, and optional MCP wiring; a local structural map fits that surface if it is optional and artifact-backed.

## Existing issue search

Commands/searches used:

- `gh issue list --state all --limit 100 --search "CodeGraph OR \"knowledge graph\" OR \"call graph\" OR tree-sitter OR symbols OR \"repo insight\" OR \"repository map\" OR \"token reduction\" OR \"research ROI\""` returned no direct matches.
- `gh issue list --state all --limit 100 --search "repo insight"` returned no matches.
- `gh issue list --state all --limit 100 --search "tree-sitter"` returned no matches.
- `gh issue list --state all --limit 100 --search "symbol graph"` returned no matches.
- `gh issue list --state all --limit 100 --search "parallel wave context map"` returned #303 and #284, but those cover parallel execution/worktree isolation, not structural code localization.

Close issues checked:

- #197 "Add a strict machine-checkable research artifact contract" is closed and validates saved research shape. It does not add a structural discovery provider.
- #200 "Add localization-quality evaluation for research-agent outputs" is closed and scores whether evidence locations are good. It does not change how locations are found.
- #202 "Report research ROI in token accounting and run-health artifacts" is closed and surfaces research cost/quality. It does not introduce a code graph or symbol/caller query path.
- #203 "Teach Actor to consume research evidence without repeating broad exploration" is closed and changes consumption discipline. It assumes research evidence already exists; it does not provide deterministic structural discovery.
- #289 "Token accounting dashboard" is closed and visualizes token/cost telemetry, not localization.

## Why this is not already covered

MAP has strict research artifacts and some advisory ROI telemetry, but the discovery engine is still prompt-instructed Glob/Grep/Read. The repo has a `dependency_graph`, but it models MAP subtasks, not code symbols. The remaining gap is a local, optional structural code-map provider that can answer targeted relationship queries and emit normal ResearchEvidence, reducing cold-start search loops without changing Actor semantics.

## Problem

Cold-start multi-file MAP tasks still pay the mechanical exploration cost that the article describes: researcher/decomposer/Actor use broad file discovery and text search to infer symbol relationships. That increases token/tool-call spend and makes localization quality depend on prompt discipline rather than an explicit code map.

## Proposed slice

Add an optional `mapify code-map` / structural-discovery provider surface that can populate or query a local repository map and feed MAP's existing ResearchEvidence contract.

Concrete first slice:

- Add a provider-neutral abstraction such as `mapify_cli.code_map` with a minimal query model: symbols by name, imports/exports, callers/references where available, and file-level dependency edges.
- Prefer existing local tooling when present, e.g. detect a CodeGraph MCP/server/config and document it as an optional provider. Do not vendor or maintain CodeGraph as a required dependency.
- Provide a deterministic fallback for Python projects using stdlib `ast` plus import scanning, so the feature can be tested without network, MCP credentials, or external binaries.
- Add a runner command that emits compact JSON compatible with or directly convertible to ResearchEvidence: `confidence`, `status`, `search_method`, `search_stats`, and <=5 `relevant_locations` with path/lines/signature/relevance.
- Teach `research-agent`/`researcher` prompts to prefer the structural map for locate/impact/pattern queries when available, then fall back to Glob/Grep/Read with an explicit reason.
- Keep generated templates single-source: edit `src/mapify_cli/templates_src/**.jinja`, then render templates.

Out of scope for this slice:

- Full multi-language tree-sitter implementation.
- Route recognition for all frameworks.
- Mandatory MCP installation.
- Cloud indexing or transmitting source code outside the local machine.

## Acceptance criteria

- `mapify code-map query` or equivalent deterministic helper returns structural evidence for a fixture repo with at least Python imports/classes/functions and line ranges.
- ResearchEvidence emitted from the code-map path passes the existing `validate_research`/research eval expectations.
- Claude and Codex researcher templates mention the structural-map-first path only when a map provider is available and preserve the current Glob/Grep/Read fallback.
- Tests cover: no provider available, Python fallback success, stale/missing index fallback, unsafe path rejection, and conversion to <=5 relevant locations.
- Docs explain optional CodeGraph/MCP integration without making MAP responsible for installing/maintaining third-party MCP servers.
- `make render-templates`, `make check-render`, and relevant pytest suites are expected validation gates for implementation.

## Guardrails

- Do not make CodeGraph a mandatory runtime dependency for `mapify init`.
- Do not add cloud indexing, API keys, or source upload.
- Do not bypass existing ResearchEvidence validation; structural-map output must be normal evidence, not a privileged side channel.
- Do not treat semantic/vector similarity as a substitute for deterministic relationship evidence.
- Do not use shadow-mode rollout; gate behind explicit optional availability/config and validate directly.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CodeGraph] Add optional local structural code-map provider for MAP research #310

Source

Relevant source takeaways

Repo evidence

Existing issue search

Why this is not already covered

Problem

Proposed slice

Acceptance criteria

Guardrails

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[CodeGraph] Add optional local structural code-map provider for MAP research #310

Description

Source

Relevant source takeaways

Repo evidence

Existing issue search

Why this is not already covered

Problem

Proposed slice

Acceptance criteria

Guardrails

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions