Agent Navigability Index (ANI)

A probe-backed scanner that scores how navigable a repository is for AI coding agents.

ANI turns a GitHub repository URL into a 0-100 Agent Navigability Index score, dimension scores, concrete evidence, prioritized recommendations, and machine-readable JSON. It is built for developers and AI tooling teams who believe a serious agent-readiness product should measure whether agents can find the right files, tests, configs, and boundaries with low context waste.

ANI PRD2 is a CLI-first developer product. The public product is simple:

GitHub repository URL -> ANI score + evidence-backed recommendations

Users do not need to connect issue trackers, run agent experiments, install graph services, or provide prior PRs. ANI may build temporary local indexes and probes internally, but it cleans temporary cloned repositories by default and does not require hosted graph or memory infrastructure.

Why This Matters

AI coding agents fail less because they cannot write code at all and more because they cannot reliably answer repository-level questions:

Where is the relevant code?
What files are safe to change?
Which tests or checks prove the change works?
Which generated, vendored, deprecated, or noisy files should be ignored?
What architecture boundaries and ownership rules matter?

ANI measures those navigability signals with static analysis plus repo-local navigation probes, then explains every recommendation in terms of the agent failure mode it should reduce.

Research-Backed Thesis

ANI's narrow thesis is:

Better agent navigation in code leads to better agent performance.

This is grounded in repository-level agent and software-engineering research. ContextBench studies code-context recall, precision, retrieval efficiency, redundancy, and downstream task success. SWE-bench, SWE-agent, Agentless, AutoCodeRover, RepoBench, CrossCodeEval, and RepoCoder all reinforce the same practical point: realistic software tasks depend on localization, context selection, repository relationships, and verification behavior, not only code generation. Classic program-comprehension and modularity research adds the older lesson that boundaries, coupling, and information hiding affect how quickly a maintainer can find the right place to change.

ANI does not claim that its current formula is universally validated. It claims that navigability is a measurable component of agent performance, and it includes a paired before/after validation harness to test whether ANI recommendations move the metrics they target.

What ANI Does Today

Scores public GitHub repositories at exact refs or commits.
Supports deterministic local fixture scans through --local-path.
Performs safe static analysis: file inventory, repo classification, docs, CI/tests, ownership, generated/vendor detection, JS/TS and Python symbol extraction, best-effort dependency graph, and bounded git history metrics.
Generates repo-local navigation probes from imports, symbols, tests, configs, routes, commands, and error strings.
Runs a fixed lexical/path/symbol/graph retriever and measures target recall, context precision, noise in top-k, test discoverability, command confidence, and documentation utility.
Computes a versioned ANI score, confidence, navigation dimension scores, static support dimensions, evidence cards, penalties, bonuses, and measurable recommendation contracts.
Emits JSON and Markdown reports.
Includes an internal recommendation-validation harness with positive/negative controls, headroom and adoption gates, metadata-only traces, and automatic deletion of cloned repositories after each case.
Reads CSV or JSONL benchmark manifests and optional outcome files.
Joins scans to outcomes by outcome_id.
Produces grouped splits/folds, correlations, confidence intervals, quartile lift, baseline-vs-ANI comparison, dimension importance, and failure analysis.

Quickstart

git clone https://github.com/<your-org>/agent-navigability-index.git
cd agent-navigability-index
python -m venv .venv
source .venv/bin/activate
python -m pip install -e ".[dev]"
python -m unittest discover -v

Score a public repository at an exact ref:

ani score https://github.com/org/repo \
  --ref 0123abcd4567ef \
  --format markdown \
  --output ani-report.md

Emit machine-readable JSON:

ani scan https://github.com/org/repo \
  --ref 0123abcd4567ef \
  --format json \
  --output ani-report.json

Run a benchmark validation manifest with public GitHub repos and exact refs:

ani validate path/to/benchmark_manifest.jsonl \
  --outcomes path/to/outcomes.jsonl \
  --out-dir ani_validation_artifacts

Run calibration before any public paired before/after recommendation validation:

ani validate-calibration validation/calibration_repos/manifest.jsonl \
  --out-dir validation/runs \
  --repeats 3 \
  --real-agent-backend codex

Run internal paired before/after recommendation validation only after calibration reports calibration_ready_for_public_a_b:

ani validate-recommendations validation_manifest.jsonl \
  --out-dir validation/runs \
  --workers 3 \
  --real-agent-backend codex \
  --calibration-report validation/runs/<calibration-run>/report.json

Recommendation validation is fail-closed. Deterministic navigation-agent runs are calibration controls; public before/after A/B mode requires a passing calibration report. ANI does not claim public recommendation efficacy unless a real LLM coding-agent trace adapter such as Codex CLI is explicitly enabled and the calibration, controls, headroom, adoption, and paired significance gates all pass.

Render or explain an existing scan:

ani report examples/sample_scan.json
ani explain examples/sample_scan.json

Sample JSON Output

{
  "schema_version": "2.0",
  "model_version": "ani-probe-v2",
  "repo": {
    "url": "https://github.com/example/ani-sample-repo",
    "ref": "62a9ec...",
    "commit_sha": "62a9ec..."
  },
  "scores": {
    "ani_score": 66,
    "distance_from_ideal": 34,
    "confidence": 96,
    "grade": "C",
    "dimension_scores": {
      "entry_point_clarity": 0.90,
      "verification_affordance": 0.85
    },
    "navigation_dimension_scores": {
      "target_recall": 0.82,
      "context_precision": 0.41,
      "verification_discoverability": 0.70
    },
    "navigation_metrics": {
      "file_recall_at_10": 0.83,
      "precision_at_10": 0.30,
      "noise_in_top_k": 0.12
    }
  },
  "probe_results": [
    {
      "family": "source_to_test",
      "query": "greet relevant tests",
      "target_artifacts": ["examples/sample_repo/tests/index.test.ts"],
      "file_recall_at_5": 1.0
    }
  ],
  "evidence": [
    {
      "dimension": "context_retrievability",
      "polarity": "negative",
      "claim": "Probe retrieval precision@10 is 30%, meaning agents would read substantial irrelevant context."
    }
  ],
  "recommendations": [
    {
      "title": "Add package-local source-to-test anchors for failed queries",
      "failure_mode": "For query `greet relevant tests`, the relevant test target ranks outside the top 5, so agents may skip the right regression check.",
      "observed_failures": [
        {
          "query": "greet relevant tests",
          "target_artifacts": ["examples/sample_repo/tests/index.test.ts"],
          "first_relevant_rank": 12,
          "top_irrelevant_results": [{"path": "examples/sample_repo/src/generated.ts", "rank": 2, "noise": true}]
        }
      ],
      "why_agents_waste_context": "The target test is buried behind generated and unrelated files, so an agent is likely to spend reads and tokens before reaching it.",
      "suggested_change": "Add `examples/sample_repo/tests/AGENTS.md` with the failed query terms and direct anchors to `examples/sample_repo/tests/index.test.ts`.",
      "expected_metric_movement": {
        "metric": "source_to_test_recall_at_5",
        "direction": "up",
        "target": "raise to at least 0.75 or improve by 20%"
      },
      "expected_real_agent_behavior_movement": {
        "target_file_recall": "up >= 10 percentage points",
        "files_read": "down >= 10%"
      },
      "validation_status": "probe_backed",
      "evidence_ids": ["ev_governance_safety_missing_codeowners_..."]
    }
  ]
}

See examples/sample_scan.json for a complete generated fixture output. The checked-in fixture artifacts use --local-path; see examples/README.md to regenerate them.

Sample Markdown Report

# Agent Navigability Index: example/ani-sample-repo

## Executive Summary

ANI scanned `example/ani-sample-repo` at commit `...` and produced an evidence-backed score of **66/100**.

## Score Card

| Metric | Value |
|---|---:|
| ANI score | 66 |
| Distance from ideal | 34 |
| Confidence | 96 |

## What To Fix First

Start with **Expose runtime and configuration contracts**...

See examples/sample_report.md for the complete sample report.

Validation Status

ANI includes two validation paths:

Benchmark correlation validation asks whether ANI is associated with external agent outcomes. The checked-in 10-row pilot is small and inconclusive.
Recommendation validation asks whether applying one ANI recommendation improves before/after agent behavior. It first uses static probes to select a causal candidate, then runs the same instrumented navigation agent on the original and patched checkout under fixed budgets.

The current checked-in public pilots support harness reproducibility and score sensitivity. They do not prove statistically significant external LLM-agent performance improvement. Until a paired recommendation run passes the real-agent significance gate, ANI should claim probe-backed navigability diagnosis, not validated agent outcome improvement.

See Validation Methodology, Recommendation Validation, and Benchmark Validation Pilot.

Navigation Dimensions

ANI PRD2 scores eight navigation-first dimensions from 0-1:

Target Recall
Context Precision
Relationship Recoverability
Verification Discoverability
Boundary Clarity
Noise Resistance
Command Confidence
Documentation Utility

The scanner also keeps ten static support dimensions for explainability and confidence. The headline score is probe-first when enough probes exist, then blended with static support signals. Weights are defined in ani/scoring_weights.json.

Confidence Model

ANI separates score from confidence. Confidence answers: "How much of the repository did the scanner understand well enough for this scan to be usable?"

Inputs include clone/ref success, file coverage, parser coverage, dependency graph completeness, probe count/diversity, test/CI detection, generated/vendor exclusion, git history availability, and framework classifier confidence. If confidence drops below 50, ANI suppresses the grade and marks the scan as directional only.

Safety Model

ANI is safe by default:

It clones/fetches repositories and reads files.
It parses static metadata, source text, manifests, docs, and git history.
It creates and runs local navigation probes over a temporary index.
Internal recommendation validation runs an instrumented navigation agent that searches and reads files under fixed budgets.
Real-agent recommendation validation, when explicitly enabled, runs Codex CLI in temporary checkouts and stores metadata-only traces.
It does not install dependencies.
It does not run package scripts.
Public scoring and deterministic calibration do not run tests from the target repository; explicit real-agent validation may capture test commands in temporary checkouts.
It does not import target Python modules.
It does not execute target binaries.
It does not call external LLM APIs with repository source.
It deletes temporary cloned repositories by default.
Validation artifacts store metadata, metrics, changed paths, and summarized tool/file-read traces, not source snapshots.

Current Limitations

ANI supports public GitHub repositories and local fixture/debug scans. Private repo auth is not implemented.
JS/TS and Python have the best static symbol extraction today. Other languages reduce confidence and fall back to file/package-level signals.
Probe quality depends on the repository exposing enough self-supervised signals. Sparse repositories may receive directional scores.
The validation harness is included, but ANI is not globally benchmark-validated until representative paired validation passes the declared significance gate.
GitHub stars and some hosted metadata are represented as nullable baseline fields, not fetched by default.
The model is deterministic and transparent, but still heuristic until calibrated against substantial outcomes.
No web UI, SSO, RBAC, audit logs, retention controls, remediation PRs, or hosted scan history are included in v0.

Documentation

Roadmap

Larger benchmark runs across public corpora.
Weight calibration from held-out benchmark outcomes.
Richer language support and dependency graph coverage.
GitHub metadata enrichment with rate-limit-aware caching.
Hosted report viewer and comparison workflow.
Private repo support with explicit auth, retention, audit, and enterprise controls.
Remediation planning and optional PR workflows after validation.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
ani		ani
config		config
docs		docs
examples		examples
launch		launch
papers		papers
scripts		scripts
tests		tests
validation		validation
.aniignore		.aniignore
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
agent_navigability_index_prd.md		agent_navigability_index_prd.md
agent_navigability_index_prd2.md		agent_navigability_index_prd2.md
agent_repo_structure_research_report.md		agent_repo_structure_research_report.md
ani.boundaries.json		ani.boundaries.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Navigability Index (ANI)

Why This Matters

Research-Backed Thesis

What ANI Does Today

Quickstart

Sample JSON Output

Sample Markdown Report

Validation Status

Navigation Dimensions

Confidence Model

Safety Model

Current Limitations

Documentation

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Navigability Index (ANI)

Why This Matters

Research-Backed Thesis

What ANI Does Today

Quickstart

Sample JSON Output

Sample Markdown Report

Validation Status

Navigation Dimensions

Confidence Model

Safety Model

Current Limitations

Documentation

Roadmap

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages