From 88ebf049f04b02d95dc7e8080d7eb62bff6d40a7 Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Sun, 26 Apr 2026 05:51:41 +0000 Subject: [PATCH 01/23] checkpoint: pre-yolo 2026-04-26T05:51:41 From ef45032561b39af2bc85b832d7a63cc81ef1617b Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 14:42:04 +0000 Subject: [PATCH 02/23] checkpoint: pre-yolo 2026-04-27T14:42:04 From aa51d2df218f05fccdb217c27f37c213a15315d2 Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 14:42:09 +0000 Subject: [PATCH 03/23] checkpoint: pre-yolo 2026-04-27T14:42:09 From c3f87442d3f010419f476e2b33f14d3d007097c6 Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 15:15:03 +0000 Subject: [PATCH 04/23] checkpoint: pre-yolo 2026-04-27T15:15:03 --- PROJECT_SUMMARY.md | 159 ++++++++++++++++++++++++++++++++++ docs/project/architecture.md | 132 ++++++++++++++++++++++++++++ docs/project/build-and-run.md | 152 ++++++++++++++++++++++++++++++++ docs/project/conventions.md | 126 +++++++++++++++++++++++++++ docs/project/data-model.md | 128 +++++++++++++++++++++++++++ docs/project/flows.md | 127 +++++++++++++++++++++++++++ docs/project/ui.md | 136 +++++++++++++++++++++++++++++ 7 files changed, 960 insertions(+) create mode 100644 PROJECT_SUMMARY.md create mode 100644 docs/project/architecture.md create mode 100644 docs/project/build-and-run.md create mode 100644 docs/project/conventions.md create mode 100644 docs/project/data-model.md create mode 100644 docs/project/flows.md create mode 100644 docs/project/ui.md diff --git a/PROJECT_SUMMARY.md b/PROJECT_SUMMARY.md new file mode 100644 index 00000000..cd22a271 --- /dev/null +++ b/PROJECT_SUMMARY.md @@ -0,0 +1,159 @@ +# Project Summary: codeiq + +> Generated by `project-summarizer` on 2026-04-27. Audience: AI agents (and humans) who need to understand and modify this codebase. Every claim should be checkable; items marked `[inferred]` were not directly verified. +> +> **Canonical depth lives in [`CLAUDE.md`](CLAUDE.md)** (~28 KB, agent-oriented, hand-maintained). This file is a thin entry point that summarizes and links into [`CLAUDE.md`](CLAUDE.md), the runbooks under [`shared/runbooks/`](shared/runbooks/), and the deep-dives under [`docs/project/`](docs/project/). Treat `CLAUDE.md` as the source of truth where they overlap. + +## Identity + +- **What it is:** CLI tool + read-only server that scans codebases and builds a deterministic code knowledge graph (no AI, no external APIs — pure static analysis) with a Spring AI MCP server, REST API, and React UI on top of an embedded Neo4j graph. See [`README.md`](README.md), [`CLAUDE.md`](CLAUDE.md) §"What This Project Is". +- **Type:** monorepo (Java backend + React SPA bundled into one JAR) — combined CLI + library + read-only web service. +- **Status:** **active** — 30+ commits in the last 7 days on `main` (mostly RAN-46/52/57 supply-chain work). Last non-checkpoint commit `92c6e00` on 2026-04-26. Several `checkpoint: pre-yolo` auto-commits are noise from a session hook, not real activity. +- **Maven coordinates:** `io.github.randomcodespace.iq:code-iq` (see `` / `` in `pom.xml`). CLI command: `codeiq` (via `java -jar code-iq-*-cli.jar`). +- **Primary languages:** Java 25 (server, CLI, all detectors); TypeScript 5.7 + React 18 (SPA at `src/main/frontend/`). + +## Tech stack + +Read directly from the `pom.xml` `` block and `src/main/frontend/package.json`. + +| Layer | Tech | Source | +|-------|------|--------| +| Runtime | Java 25 | `pom.xml` `25` | +| Web/DI | Spring Boot 4.0.5 | `pom.xml` (parent `spring-boot-starter-parent`) | +| Graph DB | Neo4j Embedded 2026.02.3 (Community) | `pom.xml` `` | +| MCP | Spring AI 2.0.0-M3 (`spring-ai-starter-mcp-server-webmvc`) | `pom.xml` `` | +| CLI | Picocli 4.7.7 (`picocli-spring-boot-starter`) | `pom.xml` `` | +| AST (Java) | JavaParser 3.28.0 | `[CLAUDE.md]` — `pom.xml` references via dep | +| Parsers (35+ langs) | ANTLR 4.13.2 (TS/JS, Python, Go, C#, Rust, C++) | `[CLAUDE.md]` | +| Cache | H2 in embedded mode (incremental analysis cache) | `src/main/java/io/github/randomcodespace/iq/cache/AnalysisCache.java` | +| Frontend | React 18.3 + AntD 5.24 + ECharts 5.6 + react-router 7 | `src/main/frontend/package.json` | +| Frontend build | Vite 6.4 + TS 5.7 → bundled into `src/main/resources/static/` | `src/main/frontend/vite.config.ts` | +| Tests | JUnit (236 test files), Playwright for SPA E2E | `find src/test/java -name '*.java' \| wc -l` = 236 | +| Static analysis | SpotBugs 4.9.8.3, Jacoco 0.8.14, Checkstyle 3.6.0 | `pom.xml` `` / `` / `` | +| Security gates | OSV-Scanner, Trivy, Semgrep, Gitleaks, jscpd, SBOM | `.github/workflows/security.yml` | +| Supply chain | OpenSSF Scorecard + Best Practices (project_id 12650) | `.github/workflows/scorecard.yml`, `.bestpractices.json` | + +**Pinned security overrides** (bumps inside Spring Boot 4.0.5's BOM): Tomcat 11.0.21 (CVE-2026-34483/34487/34500), Jackson 3.1.1 (GHSA-2m67-wjpj-xhg9). Revert when Spring Boot 4.0.6+ catches up. See the `` and `` properties + comments in `pom.xml`. + +## Entry points + +| Entrypoint | File | Purpose | +|---|---|---| +| CLI / Spring Boot main | `src/main/java/io/github/randomcodespace/iq/CodeIqApplication.java` | Boots Spring, picks `serving` vs `indexing` profile from the first arg, hands control to Picocli | +| CLI dispatcher | `src/main/java/io/github/randomcodespace/iq/cli/CodeIqCli.java` | Top-level Picocli `@Command` with 14 subcommands | +| 14 subcommands | `src/main/java/io/github/randomcodespace/iq/cli/{Index,Enrich,Serve,Analyze,Stats,Graph,Query,Find,Cypher,Topology,Flow,Bundle,Cache,Plugins,Version,Config}Command.java` | One file per CLI command (20 files including subcommands and helpers) | +| REST API (5 controllers) | `src/main/java/io/github/randomcodespace/iq/api/{Graph,Flow,Topology,Intelligence}Controller.java` + `SafeFileReader.java` (helper) | 37 read-only endpoints on `/api/**`, `@Profile("serving")` | +| MCP tools (34 tools) | `src/main/java/io/github/randomcodespace/iq/mcp/McpTools.java` | `@McpTool` methods, auto-registered by Spring AI starter | +| SPA entry | `src/main/frontend/src/main.tsx` → `App.tsx` | React 18 + react-router 7, 4 pages | + +## Directory map + +``` +codeiq/ +├── pom.xml — Maven build (single module, JAR) +├── CLAUDE.md — canonical agent-oriented internals doc +├── README.md — human-facing intro + quick start +├── AGENTS.md — repo-root agent entry pointer +├── CHANGELOG.md — Keep-a-Changelog +├── SECURITY.md — vuln disclosure policy +├── LICENSE — Apache-2.0 +├── .bestpractices.json — OpenSSF Best Practices manifest +├── spotbugs-exclude.xml — SpotBugs suppressions +├── codeiq.yml — (optional, per-project config) +├── .github/ +│ ├── workflows/ — 5 workflows: beta-java, ci-java, +│ │ release-java, scorecard, security +│ └── dependabot.yml — Maven + GHA + npm, weekly grouped +├── src/ +│ ├── main/ +│ │ ├── java/io/github/randomcodespace/iq/ — Java sources (see CLAUDE.md "Package Structure") +│ │ ├── frontend/ — React SPA (Vite, builds into resources/static/) +│ │ └── resources/ +│ │ ├── application.yml — Spring config (profile-conditional) +│ │ └── static/ — Vite-built SPA assets (gitignored) +│ └── test/java/ — 236 test files (unit + E2E quality) +├── docs/ +│ ├── codeiq.yml.example — full unified-config schema +│ └── superpowers/baselines/ — phase exit-gate snapshots +├── shared/runbooks/ — engineering-standards, release, rollback, +│ first-time-setup, test-strategy +├── scripts/ — repo-local helpers (e.g. signing setup) +└── .codeiq/ — created at runtime: cache/ (H2) + graph/ (Neo4j) +``` + +Skipped from the map: `target/`, `.git/`, `.classpath`, `.factorypath`, `.project`, `.settings/`, `node_modules/`, `.dockerignore` — generated, IDE, or noise. + +## Run, build, test + +Verified against `.github/workflows/ci-java.yml` (the actual CI gate) and `pom.xml`. + +```bash +# Build (skipping tests, fastest) +mvn clean package -DskipTests + +# Build + test + spotbugs + dependency-check (the CI gate) +mvn verify + +# Build skipping the npm/Vite frontend (backend-only contributors) +mvn test -Dfrontend.skip=true + +# Skip the OWASP NVD download (~1 GB) on first local run +mvn verify -Ddependency-check.skip=true + +# Run a specific test class +mvn test -Dtest=SpringRestDetectorTest + +# Run the pipeline against your code +java -jar target/code-iq-*-cli.jar index /path/to/repo +java -jar target/code-iq-*-cli.jar enrich /path/to/repo +java -jar target/code-iq-*-cli.jar serve /path/to/repo # → http://localhost:8080 +``` + +CI gate is `mvn verify` — runs unit + integration tests **plus** SpotBugs and OWASP dependency-check executions bound to the `verify` phase (`pom.xml`). `mvn test` alone skips the security gate. See `.github/workflows/ci-java.yml`. + +**Required env / external services:** none. codeiq is offline-first by design — Neo4j and H2 are embedded; no external server, no network calls at runtime. Air-gapped install: `git clone` + Maven mirror + `mvn package`. See [`shared/runbooks/first-time-setup.md`](shared/runbooks/first-time-setup.md). + +**Cache + graph dirs at runtime** (created in your scanned repo): +- `.codeiq/cache/` — H2 incremental analysis cache (`CACHE_VERSION=4` constant near the top of `cache/AnalysisCache.java`) +- `.codeiq/graph/graph.db/` — Neo4j Embedded data dir + +## Conventions an agent must respect + +(Top 7. Full list in [`docs/project/conventions.md`](docs/project/conventions.md) and [`CLAUDE.md`](CLAUDE.md) §"Critical Rules" / §"Code Conventions".) + +1. **Serving layer is read-only.** No POST/PUT/DELETE on `/api`, no MCP tool that mutates state. All ingestion happens via CLI (`index`, `enrich`). See `api/GraphController.java` (only `@GetMapping`s) and `mcp/McpTools.java`. +2. **Determinism is non-negotiable.** Same input → byte-identical graph. Sort `Set` iterations (`TreeSet` or `stream().sorted()`); detectors must be stateless `@Component` beans; `GraphBuilder` flushes nodes before edges. See `analyzer/GraphBuilder.java`. +3. **Generic detection, not example-specific.** Every detector must work for all languages/frameworks in its scope. Framework detectors (Quarkus, Fastify, etc.) **must** carry discriminator guards requiring framework-specific imports. +4. **Detectors are auto-discovered Spring `@Component` beans** — no registry edits needed. Drop a class in `detector//`, implement `Detector` (or extend an `Abstract*Detector` base class), add a unit test + a determinism test. +5. **Property keys ≥ 3 occurrences become constants.** `private static final String PROP_FRAMEWORK = "framework";` etc. — see existing detectors. +6. **Configuration hierarchy:** built-in defaults → `~/.codeiq/config.yml` → `./codeiq.yml` → `CODEIQ_
_` env → CLI flags. Single source of truth: `codeiq.yml`. Spring-owned keys (e.g. `codeiq.neo4j.enabled`) stay in `application.yml`. See [`docs/codeiq.yml.example`](docs/codeiq.yml.example) and `CLAUDE.md` §"Configuration". +7. **Air-gapped build target.** No public-internet calls at runtime, all assets bundled local, vendored where possible. Per-org rule in [`shared/runbooks/engineering-standards.md`](shared/runbooks/engineering-standards.md) §7 and [`~/.claude/rules/build.md`](~/.claude/rules/build.md). + +## Gotchas + +(Top items. Full list in [`CLAUDE.md`](CLAUDE.md) §"Gotchas & Lessons Learned" — that section is canonical and longer; cross-reference to it.) + +- **Pipeline order is `index → enrich → serve`.** Don't put analysis in `serve`; it's read-only. `serve` requires a prior `enrich` for a populated Neo4j directory. +- **Neo4j property round-trip uses `prop_*` keys.** Properties are written by `bulkSave` (UNWIND Cypher) with a `prop_` prefix and restored by `nodeFromNeo4j()` in `graph/GraphStore.java`. If you add a new property, verify it survives write→read. +- **Edges must be attached to source nodes before `bulkSave()`.** Cypher `MATCH` silently returns 0 rows for missing source IDs — pre-validate. +- **`@ActiveProfiles("test")` is required on every `@SpringBootTest`** to avoid Neo4j auto-startup conflicts. +- **`AnalysisCache` uses a `ReentrantReadWriteLock`** (not `synchronized`). JEP 491 (Java 25) means lock primitives no longer pin virtual-thread carriers; the read/write lock is what prevents `ClosedChannelException` on H2's MVStore under concurrent virtual-thread access. Don't "simplify" to `synchronized`. +- **Bump `CACHE_VERSION` in `cache/AnalysisCache.java`** (top of file) when you change the file-hash algorithm or H2 schema. Stale caches auto-clear on next run. +- **SnakeYAML parses bare `on` as `Boolean.TRUE`.** Compare YAML keys with `String.valueOf(key)`, not `Boolean.TRUE.equals(key)` (SonarCloud S2159). +- **Determinism gate:** every new detector needs a determinism test (run twice, assert equal output) — see existing `*DetectorTest.java` for the pattern. +- **First `mvn verify` downloads ~1 GB NVD database** for OWASP dependency-check. Override locally with `-Ddependency-check.skip=true`. +- **Counts drift between `CLAUDE.md` and code:** `CLAUDE.md` says "97 detectors" / "32 NodeKinds"; the live count is **99 concrete detectors** (excluding `Abstract*` and `*Helper*`) and **34 `NodeKind` values** (`model/NodeKind.java`). The `NodeKind` javadoc itself still says "32" — stale. Update both `CLAUDE.md` and the javadoc next time someone touches them. +- **Don't merge anything that fails `mvn verify`.** SpotBugs + dependency-check + tests are bound to `verify`, not `test`. + +## Where to look next + +- **Architecture & components** → [`docs/project/architecture.md`](docs/project/architecture.md) +- **Data model (Node/Edge kinds, Neo4j schema, H2 cache)** → [`docs/project/data-model.md`](docs/project/data-model.md) +- **UI (React SPA, Vite, page hierarchy)** → [`docs/project/ui.md`](docs/project/ui.md) +- **Key flows (index→enrich→serve, MCP tool lifecycle)** → [`docs/project/flows.md`](docs/project/flows.md) +- **Conventions (full)** → [`docs/project/conventions.md`](docs/project/conventions.md) +- **Build & run details (Maven phases, ANTLR codegen, frontend embed)** → [`docs/project/build-and-run.md`](docs/project/build-and-run.md) +- **Internal canonical reference (32-section deep doc, hand-maintained)** → [`CLAUDE.md`](CLAUDE.md) +- **Engineering standards / release / rollback** → [`shared/runbooks/`](shared/runbooks/) + +(Skipped: `docs/project/integrations.md` — codeiq makes no runtime calls to external APIs / queues. The `docs/codeiq.yml.example` schema and `shared/runbooks/release.md` cover what little external surface exists at build/release time.) diff --git a/docs/project/architecture.md b/docs/project/architecture.md new file mode 100644 index 00000000..43186093 --- /dev/null +++ b/docs/project/architecture.md @@ -0,0 +1,132 @@ +# Architecture + +## High-level shape + +codeiq is a **two-mode Spring Boot application** that ships as one JAR with the React SPA bundled inside: + +- **Indexing mode** (`index`, `enrich`, and most other CLI commands): Spring profile `indexing`, no web server, virtual-thread-driven file scanning + detector pipeline writing to H2 (cache) then Neo4j Embedded (graph). +- **Serving mode** (`serve` only): Spring profile `serving`, web server up, REST API + Spring AI MCP server + React SPA reading from the already-populated Neo4j directory. Strictly read-only — no detector code runs in this profile. + +``` + ┌──────────────────────────┐ + filesystem ───► │ index (FileDiscovery + │ + (any repo) │ Detectors + GraphBuilder)│ ──► H2 cache (.codeiq/cache/) + └──────────────────────────┘ + │ + ┌──────────────────────────┐ │ + │ enrich (Linkers + │ ◄───────┘ + │ LayerClassifier + │ + │ ServiceDetector + │ + │ LanguageEnricher + │ + │ LexicalEnricher) │ ──► Neo4j (.codeiq/graph/graph.db) + └──────────────────────────┘ │ + │ + developer / agent ◄── REST + MCP + React SPA ◄──── serve ◄─────┘ + (read-only) (Spring profile = serving) +``` + +Profile selection happens in `CodeIqApplication.java`'s `main` (around the `boolean isServe = "serve".equalsIgnoreCase(command)` block): the first CLI arg is matched against `serve` → `serving`; everything else → `indexing`. `indexing` sets `WebApplicationType.NONE`. + +## Components + +### Pipeline orchestrator (`analyzer/`) +- **Lives in:** `src/main/java/io/github/randomcodespace/iq/analyzer/` +- **Responsibility:** Discover files, route to parsers, fan out to detectors on virtual threads, fold results into a single graph buffer, then run cross-file linkers and the layer classifier. +- **Key files:** + - `Analyzer.java` — top-level pipeline (in-memory mode for `analyze` command). + - `FileDiscovery.java` — `git ls-files` first, falls back to directory walk; maps extensions → languages via `FileClassifier.java`. + - `StructuredParser.java` — routes Java to JavaParser, ANTLR-supported langs to `grammar/AntlrParserFactory.java`, others to raw text. + - `GraphBuilder.java` — buffered build (nodes-first, then edges) — determinism guarantee. + - `LayerClassifier.java` — sets `layer ∈ {frontend|backend|infra|shared|unknown}` on every node. + - `ServiceDetector.java` — filesystem walk for build files (30+ build systems) → SERVICE nodes with `CONTAINS` edges. + - `linker/` — 4 linkers run after detectors: `EntityLinker`, `GuardLinker`, `ModuleContainmentLinker`, `TopicLinker` (`Linker.java` is the interface; `LinkResult.java` is the return type). + - `ConfigScanner.java`, `InfrastructureRegistry.java`, `ArchitectureKeywordFilter.java` — supporting passes. +- **Talks to:** `detector/` (fan-out), `cache/AnalysisCache.java` (write), `graph/GraphStore.java` (write — only during `enrich`). +- **Owns:** in-memory graph buffer during a single run. + +### Detector layer (`detector/`) +- **Lives in:** `src/main/java/io/github/randomcodespace/iq/detector/` +- **Responsibility:** 99 concrete detectors that turn parsed files into nodes + edges. Auto-discovered as Spring `@Component`s; no registry to maintain. +- **Categories (one subdir each):** `auth/`, `csharp/`, `frontend/`, `generic/`, `go/`, `iac/`, `jvm/{java,kotlin,scala}/`, `markup/`, `proto/`, `python/`, `script/{shell,...}/`, `sql/`, `structured/`, `systems/{cpp,rust}/`, `typescript/`. +- **Base classes:** `Detector` (interface), `AbstractRegexDetector`, `AbstractJavaParserDetector`, `AbstractAntlrDetector`, `AbstractStructuredDetector`, `AbstractPythonAntlrDetector`, `AbstractPythonDbDetector`, `AbstractTypeScriptDetector`, `AbstractJavaMessagingDetector`. Plus three static helpers: `DetectorDbHelper`, `FrontendDetectorHelper`, `StructuresDetectorHelper`. Full table: see [`conventions.md`](conventions.md) §"Detector base classes". +- **Talks to:** parsed AST input (JavaParser CompilationUnit, ANTLR ParseTree, or raw text) via `DetectorContext`. Writes to a thread-local `DetectorResult`. +- **Owns:** nothing — must be stateless. Spring beans are singletons. + +### Graph store (`graph/`) +- **Lives in:** `src/main/java/io/github/randomcodespace/iq/graph/` +- **Responsibility:** Facade over Neo4j Embedded — UNWIND-batched bulk save for writes, raw Cypher for reads (no Spring Data Neo4j hydration on the read path for performance). +- **Key files:** + - `GraphStore.java` — `bulkSave(List, List)`, `queryNodes(...)`, fulltext search via `db.index.fulltext.queryNodes`. Creates 5 indexes on first save (3 b-tree + 2 fulltext — see [`data-model.md`](data-model.md)). + - `GraphRepository.java` — Spring Data Neo4j repository, used **only on the write path** (legacy). +- **Talks to:** Neo4j Embedded via `org.neo4j.graphdb` API (no Bolt for in-process reads). +- **Owns:** the Neo4j directory at `.codeiq/graph/graph.db/`. + +### Analysis cache (`cache/`) +- **Lives in:** `src/main/java/io/github/randomcodespace/iq/cache/` +- **Responsibility:** Per-file content-hash cache so re-running `index` only re-detects changed files. +- **Key files:** `AnalysisCache.java` (H2 schema + read/write API, `ReentrantReadWriteLock`-guarded, `CACHE_VERSION = 4`), `FileHasher.java` (SHA-256, 64-hex output). + +### REST API (`api/`) +- **Lives in:** `src/main/java/io/github/randomcodespace/iq/api/` +- **Files:** `GraphController.java` (`/api/**`), `FlowController.java` (`/api/flow/**`), `TopologyController.java` (`/api/topology/**`), `IntelligenceController.java` (`/api/intelligence/**`), `SafeFileReader.java` (helper, path-traversal guard). +- All controllers carry `@Profile("serving")` — they aren't loaded in indexing mode. +- 37 endpoints, all read-only. Full enumeration in [`CLAUDE.md`](../../CLAUDE.md) §"Server Endpoints". + +### MCP server (`mcp/`) +- **File:** `src/main/java/io/github/randomcodespace/iq/mcp/McpTools.java` — 34 `@McpTool`-annotated methods. Spring AI's `spring-ai-starter-mcp-server-webmvc` auto-registers them on a streamable HTTP transport at `/mcp`. Read-only. + +### Intelligence enrichment (`intelligence/`) +- **Lives in:** `src/main/java/io/github/randomcodespace/iq/intelligence/` +- **Sub-packages:** `lexical/` (doc-comment + snippet enrichment), `extractor/` (per-language extractors: `java/`, `typescript/`, `python/`, `go/`), `evidence/` (evidence-pack assembly for retrieval), `query/` (`QueryPlanner` for intelligent routing). +- Runs during `enrich` after structural data is in Neo4j; produces `prop_lex_*` properties indexed by the `lexical_index` fulltext index. + +### CLI (`cli/`) +- **Lives in:** `src/main/java/io/github/randomcodespace/iq/cli/` +- **Files:** 20 — `CodeIqCli.java` (top-level), 14 commands (`Index`, `Enrich`, `Serve`, `Analyze`, `Stats`, `Graph`, `Query`, `Find`, `Cypher`, `Topology`, `Flow`, `Bundle`, `Cache`, `Plugins`), config subcommands (`ConfigCommand`, `ConfigExplainSubcommand`, `ConfigValidateSubcommand`), `VersionCommand`, helper `CliOutput`. +- All commands are `@Component`s; Picocli + Spring integration via `picocli-spring-boot-starter`. + +### React SPA (`src/main/frontend/`) +- See [`ui.md`](ui.md). Vite builds into `src/main/resources/static/` — Spring Boot's static handler serves it from inside the JAR when `codeiq.ui.enabled=true`. + +## Layering / dependency rules + +The package graph enforces a one-way flow: + +``` +cli/ ──► analyzer/ ──► detector/ ─► model/ + │ │ + └► linker/ └► grammar/ (ANTLR factory) + │ + ├► cache/ (H2) + └► graph/ (Neo4j) ──► api/ ──► query/ (read path) + │ + └► mcp/ (same QueryService) +``` + +- `model/` (CodeNode, CodeEdge, NodeKind, EdgeKind) is the dependency floor — depends on nothing in this codebase. +- `detector/` may import `model/` and `grammar/` — never `analyzer/`, `cli/`, or `api/`. +- `api/` and `mcp/` may import `query/` and `model/` — never `detector/` or `analyzer/` (read-only at serving time). +- `analyzer/` may import everything below it — it's the orchestrator. + +The `@Profile("serving")` annotation on every controller and on Neo4j-only beans (see `config/Neo4jConfig.java`) is what enforces "no writes during serving" at runtime; the package layering is convention, not a lint rule. + +## Cross-cutting concerns + +- **Logging:** SLF4J + Spring Boot's default Logback. `application.yml` quiets noisy `org.springframework.ai.mcp` and `PostProcessorRegistrationDelegate` to WARN. +- **Error handling:** Pipeline errors are logged + counted, never abort a whole run. Detector exceptions are caught per-file (the run continues with a logged warning); see `Analyzer.java` task wrapping. CLI commands return `int` exit codes via Picocli. +- **Auth / authz:** None — codeiq runs on the developer's machine. The serving layer trusts the loopback caller. CORS is configurable via `codeiq.cors.allowed-origin-patterns` (`application.yml` / `CorsConfig.java`). +- **Observability:** Spring Boot Actuator (`/actuator/health` with liveness + readiness probes per `application.yml`); `health/GraphHealthIndicator.java` reports Neo4j status. No metrics export — by design (offline tool). +- **Config:** Hierarchical, last-wins: built-in defaults → `~/.codeiq/config.yml` → `./codeiq.yml` → `CODEIQ_*` env → CLI flags. `UnifiedConfigBeans` bridges the unified config to the legacy `CodeIqConfig` bean. Spring-owned keys (`codeiq.neo4j.enabled`, `codeiq.neo4j.bolt.port`, `codeiq.cors.allowed-origin-patterns`, `codeiq.ui.enabled`) live in `application.yml` because they drive `@ConditionalOnProperty` / `@Value` wiring. Full schema: [`docs/codeiq.yml.example`](../codeiq.yml.example). + +## Concurrency model + +- Detector fan-out runs on **virtual threads** (`Executors.newVirtualThreadPerTaskExecutor()` in `Analyzer.java`). Java 25 + JEP 491 means `synchronized` and `j.u.c.locks` no longer pin carrier threads, so the cache's `ReentrantReadWriteLock` is purely a logical concurrency primitive — not a workaround. +- Detectors are stateless `@Component` singletons (Spring's default scope). Per-file mutable state lives in method-local `DetectorContext` / `DetectorResult` instances. +- `GraphBuilder` collects results into indexed slots (one per file) so iteration order is independent of thread completion order — this is the determinism guarantee. + +## Why it's shaped this way + +- **Three-stage pipeline (`index`/`enrich`/`serve`) instead of one all-in-one `analyze`:** large codebases (44 K+ files in the original target) blow heap if scanning + Neo4j ingestion happen in the same JVM run. `index` writes to H2 in batches (default 500), `enrich` reads from H2 and bulk-loads with UNWIND. `analyze` is kept as a legacy in-memory shortcut for small repos. See `CLAUDE.md` §"Pipeline". +- **Embedded Neo4j (not a server):** zero-ops deployment for an offline tool; bundle model means the serving host doesn't even need source code, just the `.codeiq/graph/` directory. +- **Read-only serving layer:** lets the server be deployed to a "remote" environment where source code is forbidden, while analysis still happens on the developer's box. See [`CLAUDE.md`](../../CLAUDE.md) §"Critical Rules / Read-Only Serving Layer". +- **Auto-discovery of detectors via `@Component`:** detectors are added by dropping a class — no registry edits, no plugin manifest. The trade-off is that mistakes (forgetting `@Component`) silently disable a detector; the `plugins` CLI command exists to introspect what's actually live. diff --git a/docs/project/build-and-run.md b/docs/project/build-and-run.md new file mode 100644 index 00000000..ba1d5916 --- /dev/null +++ b/docs/project/build-and-run.md @@ -0,0 +1,152 @@ +# Build & Run + +## Prerequisites + +- **Java 25** (Temurin recommended — pinned in CI: `.github/workflows/ci-java.yml` sets `distribution: 'temurin'` and `java-version: '25'` on `actions/setup-java`). +- **Maven 3.9+** (Maven Wrapper not committed; `mvn` from system path is expected). +- **Node.js + npm** for the frontend build. The `frontend-maven-plugin` (configured in `pom.xml`) downloads its own Node automatically — you don't need a system Node unless you run `npm` directly inside `src/main/frontend/`. +- **No Docker, no Postgres, no Redis** — codeiq is offline-first. Neo4j and H2 are embedded. + +## First-time setup + +```bash +git clone https://github.com/RandomCodeSpace/codeiq.git +cd codeiq + +# Quickest validation — skip tests, skip the security gate +mvn clean package -DskipTests -Ddependency-check.skip=true + +# Resulting JAR +ls target/code-iq-*-cli.jar +``` + +The first `mvn verify` (the full CI gate) downloads ~1 GB of NVD data for OWASP dependency-check. Use `-Ddependency-check.skip=true` while iterating locally; CI runs the full check on every push. + +Source for these steps: `pom.xml` (the `` block + plugin executions further down) and [`shared/runbooks/first-time-setup.md`](../../shared/runbooks/first-time-setup.md). + +## Local development loop + +There's no hot-reload story for the Java side — codeiq is a CLI/server, not a long-running dev server. The typical loop: + +```bash +# Edit Java source, then +mvn test -Dtest=YourDetectorTest -Dfrontend.skip=true # fastest single-test cycle +mvn package -DskipTests -Ddependency-check.skip=true # repackage the JAR +java -jar target/code-iq-*-cli.jar index /path/to/scan-target +java -jar target/code-iq-*-cli.jar enrich /path/to/scan-target +java -jar target/code-iq-*-cli.jar serve /path/to/scan-target +``` + +For the **frontend** (live HMR against a running backend): + +```bash +# Terminal 1 — run the Java backend +java -jar target/code-iq-*-cli.jar serve /path/to/scan-target + +# Terminal 2 — run Vite dev server (proxies /api and /mcp to localhost:8080) +cd src/main/frontend +npm install +npm run dev +``` + +Vite proxy config: `src/main/frontend/vite.config.ts` (`server.proxy` at the bottom of the file) — `/api` and `/mcp` go to `http://localhost:8080`. + +## Test layers + +- **Unit + integration (JUnit, ~236 test files):** + ```bash + mvn test # all tests + mvn test -Dtest=SpringRestDetectorTest # one class + mvn test -Dsurefire.useFile=false # verbose stderr to console + ``` + Tests live in `src/test/java/**` mirroring the source-tree package layout. **Detector tests must include positive, negative, and determinism cases** — see existing `*DetectorTest.java`. + +- **E2E quality tests (Context7-grounded ground truth):** + ```bash + E2E_PETCLINIC_DIR=/path/to/spring-petclinic mvn test -Dtest=E2EQualityTest + ``` + Ground-truth JSON lives under `src/test/resources/e2e/ground-truth-*.json`. Skipped automatically when the env var isn't set. + +- **Frontend E2E (Playwright):** + ```bash + cd src/main/frontend + npm run test:e2e # headless + npm run test:e2e:headed # with browser visible + npm run test:e2e:report # open last report + ``` + +- **CI gate:** + ```bash + mvn verify + ``` + Includes everything above (`mvn test` plus `spotbugs:check` and `dependency-check:check` bound to the `verify` phase). Failing any of those breaks the build. See `pom.xml` plugin executions and `.github/workflows/ci-java.yml`. + +## Build artifacts + +- **What:** a single fat JAR — `target/code-iq-*-cli.jar` (Spring Boot repackaged executable JAR). +- **Bundles:** all Java deps + the React SPA built into `src/main/resources/static/` by the `frontend-maven-plugin` during `mvn package`. +- **Maven coordinates:** `io.github.randomcodespace.iq:code-iq` (see `` / `` in `pom.xml`). The artifactId stays `code-iq` historically; the binary command is `codeiq`. +- **Releases:** + - Beta: `.github/workflows/beta-java.yml` — `workflow_dispatch` only → Sonatype Central beta + GitHub pre-release. + - GA: `.github/workflows/release-java.yml` — `workflow_dispatch` with a `version` input → builds a GPG-signed release commit on a detached HEAD, deploys to Sonatype Central, then pushes a GPG-signed annotated `vX.Y.Z` tag + GitHub Release. **No tag-push trigger; no auto-release on merge.** See [`shared/runbooks/release.md`](../../shared/runbooks/release.md). + +## Deploy + +There is no SaaS surface, no container image, no VPS. codeiq runs on the developer's machine. The deploy flow: + +1. User adds the dep / downloads the JAR from Maven Central or GitHub Releases. +2. User runs `codeiq index → enrich → serve` against their own repo. +3. The `serve` mode binds `0.0.0.0:8080` by default — exposed only to the local machine unless the user reconfigures. + +For codeiq's own release (publishing to Maven Central): see [`shared/runbooks/release.md`](../../shared/runbooks/release.md). Rollback: [`shared/runbooks/rollback.md`](../../shared/runbooks/rollback.md). + +## CLI reference + +20 files under `src/main/java/io/github/randomcodespace/iq/cli/` define 14 user-facing commands. Authoritative table is in [`CLAUDE.md`](../../CLAUDE.md) §"CLI Commands"; condensed here: + +| Command | Purpose | Profile | +|---|---|---| +| `index ` | Memory-efficient batched scan → H2 cache | `indexing` | +| `enrich ` | Load H2 → Neo4j; run linkers, classifier, services | `indexing` | +| `serve ` | Read-only REST + MCP + UI on `http://localhost:8080` | **`serving`** | +| `analyze ` | Legacy in-memory all-in-one (small repos only) | `indexing` | +| `stats ` | 7-category statistics from Neo4j | `indexing` | +| `graph ` | Export graph (JSON / YAML / Mermaid / DOT) | `indexing` | +| `query ` | Preset relationship queries (consumers, producers, ...) | `indexing` | +| `find ` | Preset finds (endpoints, guards, entities, topics) | `indexing` | +| `cypher ` | Raw Cypher against Neo4j | `indexing` | +| `topology ` | Service topology (blast radius, cycles, bottlenecks) | `indexing` | +| `flow ` | Architecture flow diagrams | `indexing` | +| `bundle ` | Pack graph + source snapshot into ZIP | `indexing` | +| `cache ` | Inspect / clear / stats H2 cache | `indexing` | +| `plugins ` | List / inspect detectors | `indexing` | +| `config validate` / `config explain` | Unified-config tooling | `indexing` | +| `version` | Show version info | `indexing` | + +Profile selection happens in `CodeIqApplication.java`'s `main` (the `boolean isServe = "serve".equalsIgnoreCase(command)` block) — `serve` activates `serving` (web server on); everything else activates `indexing` (`WebApplicationType.NONE`). + +## Build phases — what runs when + +| Phase | What runs | Source | +|---|---|---| +| `generate-sources` | ANTLR codegen from `*.g4` files | `pom.xml` `antlr4-maven-plugin` | +| `process-resources` | `frontend-maven-plugin`: install Node, `npm ci`, `npm run build` → `src/main/resources/static/` | `pom.xml`, `src/main/frontend/vite.config.ts` (`build.outDir: '../resources/static'`) | +| `compile` / `test-compile` | javac for Java 25 | standard | +| `test` | Surefire — JUnit | standard | +| `verify` | `spotbugs:check`, `dependency-check:check` | `pom.xml` plugin executions; **this is the CI gate** | +| `package` | Spring Boot repackage → executable JAR with embedded SPA | `spring-boot-maven-plugin` | + +## Gotchas + +- **`mvn test` does NOT run the security gate.** SpotBugs and OWASP dependency-check are bound to `verify`. CI runs `mvn verify`. Locally, `mvn verify` is what actually mirrors CI. +- **OWASP NVD download is ~1 GB** and very slow on first run. `-Ddependency-check.skip=true` for fast local cycles; let CI run the full check. +- **`-Dfrontend.skip=true`** skips the frontend-maven-plugin entirely. The default `false` (in the `pom.xml` `` block) means `mvn package` always tries to build the SPA. Backend-only contributors should pass `-Dfrontend.skip=true` to avoid pulling Node. +- **Vite output path is relative-up:** `src/main/frontend/vite.config.ts` writes to `'../resources/static'` (= `src/main/resources/static/`) and uses `emptyOutDir: false` so a stale dir won't be wiped — if you see leftover assets, delete `src/main/resources/static/` manually. +- **ANTLR generated sources go under `target/generated-sources/antlr4/`** (per `antlr4-maven-plugin` defaults). Don't edit them; regenerate via `mvn generate-sources`. Modifying the `.g4` files in `src/main/antlr4/` is the supported edit point. +- **Spring Boot startup overhead is 8–16 s** for the embedded Neo4j + Spring context. Expected; not a perf bug. +- **Default index batch size is 500** (`Indexing batch tuning, see CLAUDE.md`). Larger isn't better; 500 outperformed 1000 in the tuning runs that set the default. +- **Tomcat 11.0.21 + Jackson 3.1.1 are pinned overrides** of Spring Boot 4.0.5's BOM (see `` / `` in `pom.xml`'s ``). Both are security bumps. Revert when Spring Boot 4.0.6+ catches up — keep the rationale comments. +- **`@ActiveProfiles("test")` is required on every `@SpringBootTest`** to avoid Neo4j auto-startup conflicts in integration tests. +- **First-run cache version mismatch wipes `.codeiq/cache/`.** Bump `CACHE_VERSION` (constant near the top of `cache/AnalysisCache.java`) whenever you change the hash algorithm or H2 schema. Existing users will lose cache on next run; that's intentional (incorrect cache > slow cache). +- **`SECURITY.md`, `CHANGELOG.md`, `.bestpractices.json`, `LICENSE`** are part of the OpenSSF Best Practices gate (project_id 12650). Do not delete or rename without coordinating — they are referenced by `.bestpractices.json` and the Scorecard workflow. +- **CI workflow pins all third-party actions by 40-char SHA** (see `.github/workflows/scorecard.yml`, `.github/workflows/codeql.yml` if present). When adding a new action, pin by SHA — Scorecard's `Pinned-Dependencies` check will downgrade us otherwise. diff --git a/docs/project/conventions.md b/docs/project/conventions.md new file mode 100644 index 00000000..4e4d6552 --- /dev/null +++ b/docs/project/conventions.md @@ -0,0 +1,126 @@ +# Conventions + +Rules to follow when modifying codeiq. Each item is grounded in an existing file. The 7 most important ones are summarized in [`PROJECT_SUMMARY.md`](../../PROJECT_SUMMARY.md) §"Conventions an agent must respect"; this file is the long form. + +## Code style + +- **Java 25 idioms encouraged** — records, sealed classes, pattern matching, virtual threads. Don't down-port to older idioms; this codebase is on the latest LTS-track. +- **Constructor injection only.** No field injection (`@Autowired` on fields), no setter injection. See any `@Component` / `@Service` in the codebase, e.g. `api/GraphController.java`. +- **Property-key constants** — when a string literal appears 3+ times in a file, extract: `private static final String PROP_FRAMEWORK = "framework";`. Saves typo bugs and makes refactors greppable. +- **Spring AI MCP annotations:** use `@McpTool` and `@McpToolParam` (Spring AI 2.x), not `@Tool`/`@ToolParam` (older form). See `mcp/McpTools.java`. +- **UTF-8 explicit:** `StandardCharsets.UTF_8` everywhere — never rely on platform default. `Analyzer.java` shows the import. + +## Error handling + +- **Pipeline errors don't abort the run.** Per-file detector exceptions are caught and logged; the file is skipped, the run continues. See task wrapping in `analyzer/Analyzer.java`. +- **CLI commands return `int` exit codes** via Picocli's `Callable` pattern. See any `cli/*Command.java` (e.g. `cli/EnrichCommand.java`). +- **No `System.exit()` from non-CLI code.** `CodeIqApplication.main` is the only place that calls `SpringApplication.exit(...)` and `System.exit(...)`. +- **No silent fallbacks.** If a detector can't parse a file, log it; don't return an empty result that looks indistinguishable from "nothing matched". + +## Naming + +- **Java packages:** `io.github.randomcodespace.iq.` (lowercase, no plurals). Detector subpackages match the language family: `detector/jvm/{java,kotlin,scala}/`, `detector/typescript/`, `detector/python/`, `detector/systems/{cpp,rust}/`. +- **Detector class:** `Detector` — `SpringSecurityDetector`, `FastifyDetector`, `GoStructuresDetector`. Always ends in `Detector`. +- **Detector test class:** `DetectorTest` — colocated under `src/test/java/` with the same package. +- **CLI commands:** `Command` — `IndexCommand`, `EnrichCommand`, `ServeCommand`. Picocli `@Command(name = "")` annotation gives the user-facing name. +- **Node ID format:** `"{prefix}:{filepath}:{type}:{identifier}"` — e.g. `"node:src/main/java/Foo.java:class:Foo"`. The full file path is part of the key — that's how cross-file uniqueness works. +- **Property keys:** snake_case (`auth_type`, `framework`, `roles`). Stored in Neo4j with a `prop_` prefix (`prop_auth_type`, `prop_framework`). +- **Frontend imports:** `@/...` resolves to `src/main/frontend/src/...` (Vite alias in `vite.config.ts`, mirrored in `tsconfig.json`'s `paths`). Always use the alias, never `../../../`. + +## Tests + +- **Location:** `src/test/java//`. ~236 test files total. +- **Layers:** + - **Unit:** plain JUnit, no Spring context. Most detector tests are unit. + - **Integration:** `@SpringBootTest` with `@ActiveProfiles("test")` — required to suppress Neo4j auto-startup. Standalone MockMvc for controller tests (no full context). + - **MCP tools:** test by calling `McpTools` methods directly — no protocol round-trip needed. + - **E2E quality:** `E2EQualityTest` validates against Context7-sourced ground truth (`src/test/resources/e2e/ground-truth-*.json`). Requires the env var `E2E_PETCLINIC_DIR` (or similar) to point at a cloned reference repo. +- **Run a single test:** `mvn test -Dtest=ClassName#methodName`. +- **Every detector needs:** + 1. Positive match — input that should fire, output asserted. + 2. Negative match — input that *looks similar* but shouldn't fire (especially for framework detectors). + 3. **Determinism test** — run the detector twice on the same input, assert output is byte-identical. + +## Logging + +- **SLF4J** via Spring Boot's default Logback. Pattern across the codebase: `private static final Logger log = LoggerFactory.getLogger(MyClass.class);`. +- `application.yml` already silences known-noisy loggers (`org.springframework.ai.mcp` → WARN, `PostProcessorRegistrationDelegate` → WARN). Don't add more bare `org.springframework.*` loggers without good cause. +- **No PII concerns** — codeiq scans the user's own code; logs go to the user's terminal. + +## Adding a new detector + +(Authoritative recipe — slightly expanded from [`CLAUDE.md`](../../CLAUDE.md) §"Adding a New Detector".) + +1. **Pick the right base class** (table below) and create `src/main/java/io/github/randomcodespace/iq/detector//Detector.java`. +2. **Annotate with `@Component`** (Spring auto-discovery) **and `@DetectorInfo(name=..., category=..., parser=ParserType.X, languages={...}, nodeKinds={...}, edgeKinds={...}, properties={...})`** (used by the `plugins` CLI command for introspection). Live examples: `detector/jvm/java/SpringSecurityDetector.java`, `detector/go/GoStructuresDetector.java`. +3. **Implement `detect(DetectorContext ctx)`** — return a `DetectorResult` populated with `CodeNode`s and `CodeEdge`s. Detectors are stateless; the `DetectorContext` is your scratch space. +4. **Framework detectors require a discriminator guard** — e.g. Quarkus must require `import io.quarkus.*`, Fastify must require `import 'fastify'`. Otherwise you'll match Spring controllers as Quarkus or Express as Fastify. **No exceptions** — this rule is enforced by review. +5. **Property-key constants** for any string literal repeated 3+ times. +6. **Add tests** in `src/test/java/.../detector//DetectorTest.java`: positive, negative, determinism. +7. **Run `mvn test`** — all 236+ tests must still pass. +8. **No registry edit needed** — Spring classpath scan picks up the `@Component`. The `plugins list` CLI command will introspect via `@DetectorInfo`. + +### Detector base classes + +| Class | Use when | +|---|---| +| `Detector` (interface) | You need full control; rare | +| `AbstractRegexDetector` | Pattern-only detection (most detectors) | +| `AbstractJavaParserDetector` | Java AST via JavaParser (Spring, JPA, etc.) | +| `AbstractAntlrDetector` | ANTLR grammar-based (TS, Python, Go, C#, Rust, C++) | +| `AbstractStructuredDetector` | Structured config files (YAML, JSON, TOML, INI, properties) | +| `AbstractPythonAntlrDetector` | Python ANTLR detectors (shared parse, getBaseClassesText, extractClassBody) | +| `AbstractPythonDbDetector` | Python ORM detectors (adds ensureDbNode/addDbEdge via DetectorDbHelper) | +| `AbstractTypeScriptDetector` | TS regex detectors (shared getSupportedLanguages, detect→detectWithRegex) | +| `AbstractJavaMessagingDetector` | Java messaging detectors (shared CLASS_RE, extractClassName, addMessagingEdge) | + +### Shared static helpers (don't subclass — call them) + +| Class | Purpose | +|---|---| +| `DetectorDbHelper` | `ensureDbNode` / `addDbEdge` for any detector emitting `DATABASE_CONNECTION` nodes | +| `FrontendDetectorHelper` | `createComponentNode` / `lineAt` for Angular, React, Vue detectors | +| `StructuresDetectorHelper` | `addImportEdge` / `createStructureNode` for Scala/Kotlin structure detectors | + +## Adding a new CLI command + +1. Create `src/main/java/io/github/randomcodespace/iq/cli/Command.java`. +2. Annotate `@Component` and `@picocli.CommandLine.Command(name="", description="...")`. +3. Implement `Callable` returning the exit code. +4. Wire as a subcommand of `CodeIqCli` in `cli/CodeIqCli.java` (it lists subcommands explicitly). +5. If the command needs a Spring profile other than `indexing` (only `serve` does this), update the `if (isServe) ...` block in `CodeIqApplication.main` — note this is **not** generic, so adding another `serving`-profile command means rethinking that conditional. + +## Adding a new REST endpoint + +1. Add a `@GetMapping` method (read-only — no `@PostMapping`/`@PutMapping`/`@DeleteMapping`) to the appropriate controller in `src/main/java/io/github/randomcodespace/iq/api/`. +2. Delegate to `query/QueryService.java` (or one of its peers — `StatsService`, `TopologyService`) — controllers stay thin. +3. **Mirror it in `mcp/McpTools.java`** as a new `@McpTool`. The MCP tool description must explain when an LLM should call it; copy the wording style of existing tools. +4. Add a controller test using standalone MockMvc (no `@SpringBootTest`). + +## Adding a new MCP tool + +1. Add a method on `mcp/McpTools.java` annotated `@McpTool(name="...", description="...")`. +2. Parameters: annotate with `@McpToolParam(description="...")`. +3. Return type: anything Jackson can serialize (typically a `Map` or a record). Jackson's `FAIL_ON_UNKNOWN_PROPERTIES` is globally disabled for MCP-protocol compatibility (`config/JacksonConfig.java`). +4. Test by calling the method directly in a unit test — no protocol round-trip needed. + +## Things to avoid (anti-patterns) + +- **`Set` iteration without sorting** — kills determinism. Use `TreeSet`, `stream().sorted(...)`, or sort the resulting list. +- **Mutable instance state on detectors** — they're Spring singletons; concurrent calls will collide. Per-call state goes in method-local variables / `DetectorContext`. +- **Coarse `synchronized` on `AnalysisCache`** — the `ReentrantReadWriteLock` is deliberate. Don't "simplify" to `synchronized` blocks; that serializes reads unnecessarily. +- **Direct `Boolean.TRUE.equals(yamlKey)`** — SnakeYAML parses bare `on` as `Boolean.TRUE`. Use `String.valueOf(key)` for YAML key comparisons (SonarCloud S2159). +- **Regex with nested non-possessive quantifiers** — use `*+` instead of `*` for nested patterns. `([^"\\]*+(?:\\.[^"\\]*+)*+)` not `([^"\\]*(?:\\.[^"\\]*)*)`. Stack-overflow risk (SonarCloud S5998). +- **Adding a new property to `CodeNode` without round-trip-testing** — Neo4j stores properties as `prop_`; `nodeFromNeo4j()` must restore them. A new property that survives `bulkSave` but not `nodeFromNeo4j` will silently disappear when read back. +- **Edges referencing nodes that don't exist yet** — `bulkSave`'s edge UNWIND silently drops rows whose source/target IDs don't match any node. Pre-validate IDs. +- **Generic patterns in framework detectors** — `router.get(...)` matches Express, Fastify, NestJS, Vue Router, Hono, and probably ten others. Always require a framework-specific import. + +## Don't refactor (intentional non-standard choices) + +- **Single-file `NodeKind` and `EdgeKind` enums.** They're long (32+/27 values) and could be split, but they're load-bearing for cross-file uniqueness and detector readability. Don't split — keeps the type surface in one diff-friendly file. See `model/NodeKind.java`, `model/EdgeKind.java`. +- **No SDN hydration on the read path.** `graph/GraphStore.java` uses raw Cypher + `nodeFromNeo4j()` for reads; `graph/GraphRepository.java` (Spring Data Neo4j) is used **only for writes**. This is deliberate — SDN's hydration overhead was measured and rejected for the read path. Don't unify them. +- **Auto-discovery via Spring `@Component` on detectors, no explicit registry.** Drop in a class, it's live. The `DetectorRegistry` exists to *introspect* the discovered set, not to register them. Don't replace with a manual registry. +- **CLI profile selection in `CodeIqApplication.main` (not via Picocli's mechanism).** It's a string `if/else` on the first arg, and it pre-empts Picocli to set the Spring profile *before* the context starts. Looks ugly; works correctly. SpotBugs flagged the original duplicate branches; the current version was deliberately collapsed. +- **`indexing` profile sets `WebApplicationType.NONE`** — meaning `mvn test` from the IDE without `@ActiveProfiles("test")` will try to start the web server and pin to ports. Always use `@ActiveProfiles("test")` on `@SpringBootTest`. +- **Frontend assets bundled into the JAR (`src/main/resources/static/`)** — no separate frontend deploy. Vite's `outDir: '../resources/static'` is the embed seam; don't move the SPA out of the JAR without re-architecting the deploy story. +- **`prop_*` Neo4j property prefix.** It's a deliberate namespacing scheme to separate domain properties from top-level node attributes (`id`, `kind`, `layer`, etc.). Don't rename. diff --git a/docs/project/data-model.md b/docs/project/data-model.md new file mode 100644 index 00000000..acc540c5 --- /dev/null +++ b/docs/project/data-model.md @@ -0,0 +1,128 @@ +# Data Model + +codeiq's data model has **three storage layers**, each with its own schema and lifetime: + +| Layer | Backing | Purpose | Lifetime | +|---|---|---|---| +| Domain types | Java records / enums | In-memory shape of nodes/edges, single source of truth | Per JVM run | +| Analysis cache | H2 (file-backed, embedded) | Per-file detection results keyed by content hash; enables incremental re-indexing | `.codeiq/cache/` until manually cleared or `CACHE_VERSION` bump | +| Graph | Neo4j Embedded (Community Edition 2026.02.3) | Final enriched graph for queries, MCP tools, REST API | `.codeiq/graph/graph.db/` until manually cleared | + +## Storage + +### Primary datastore — Neo4j Embedded +- **Defined in:** `pom.xml` `2026.02.3`, bootstrapped in `config/Neo4jConfig.java` (only loaded under the `serving` profile via `@ConditionalOnProperty(value="codeiq.neo4j.enabled", havingValue="true")`). +- **Data dir:** `.codeiq/graph/graph.db/` inside the scanned repo. +- **Migration tool:** none — Neo4j is schemaless; indexes/constraints are created idempotently by `GraphStore.bulkSave()`. + +### Secondary datastore — H2 (analysis cache) +- **Defined in:** `cache/AnalysisCache.java`. H2 is a transitive Spring Boot dependency (no explicit version pin in `pom.xml`). +- **Data dir:** `.codeiq/cache/` inside the scanned repo. +- **Schema versioning:** `CACHE_VERSION = 4` constant near the top of `AnalysisCache.java` (currently line 43; grep the symbol if drifted). On startup, cache reads the stored version; if it doesn't match, the H2 file is wiped and recreated. **Bump `CACHE_VERSION` whenever you change the file-hash algorithm or the schema.** + +## Domain types + +### `CodeNode` and `CodeEdge` +- **Defined in:** `model/CodeNode.java`, `model/CodeEdge.java`. +- **Plain Java records / classes** (not JPA entities — Spring Data Neo4j is used only on the write path). Properties live in a `Map`. +- **ID format:** `"{prefix}:{filepath}:{type}:{identifier}"` (e.g. `"node:src/main/java/Foo.java:class:Foo"`). Cross-file uniqueness is enforced by including the full file path. See existing detectors for the prefix convention. + +### `NodeKind` (enum) +- **Defined in:** `model/NodeKind.java`. +- **34 concrete values** (the file's javadoc still claims "32" — known stale, see `PROJECT_SUMMARY.md` §"Gotchas"): + +``` +MODULE, PACKAGE, CLASS, METHOD, ENDPOINT, ENTITY, REPOSITORY, QUERY, +MIGRATION, TOPIC, QUEUE, EVENT, RMI_INTERFACE, CONFIG_FILE, CONFIG_KEY, +WEBSOCKET_ENDPOINT, INTERFACE, ABSTRACT_CLASS, ENUM, ANNOTATION_TYPE, +PROTOCOL_MESSAGE, CONFIG_DEFINITION, DATABASE_CONNECTION, AZURE_RESOURCE, +AZURE_FUNCTION, MESSAGE_QUEUE, INFRA_RESOURCE, COMPONENT, GUARD, +MIDDLEWARE, HOOK, SERVICE, EXTERNAL, SQL_ENTITY +``` + +Each enum constant carries a lowercase `value` (e.g. `CLASS("class")`) used as the string representation in Cypher / JSON / MCP-tool responses. `NodeKind.fromValue(...)` does the reverse lookup via a static `BY_VALUE` map. + +### `EdgeKind` (enum) +- **Defined in:** `model/EdgeKind.java`. +- **27 values** per the file's javadoc (verified count). Includes: + +``` +DEPENDS_ON, IMPORTS, EXTENDS, IMPLEMENTS, CALLS, INJECTS, EXPOSES, +QUERIES, MAPS_TO, PRODUCES, CONSUMES, PUBLISHES, SUBSCRIBES, INVOKES_RMI, +DEFINES, CONTAINS, OVERRIDES, CONNECTS_TO, TRIGGERS, PROVISIONS, +SENDS_TO, RECEIVES_FROM, PROTECTS, RENDERS, REFERENCES_TABLE, ... +``` + +(Some values from the middle of the enum truncated in this summary — read `model/EdgeKind.java` for the authoritative list.) + +### `layer` (string property, not an enum) +Every node carries a `layer` property set by `analyzer/LayerClassifier.java` to one of: `frontend`, `backend`, `infra`, `shared`, `unknown`. Classification is deterministic — based on `kind`, `framework`, and path heuristics. + +## H2 cache schema + +Defined in the `SCHEMA_SQL` text block near the top of `cache/AnalysisCache.java` (grep `SCHEMA_SQL`). Tables (verified from the file): + +| Table | Purpose | +|---|---| +| `cache_meta` | `meta_key` (PK) → `meta_value` — stores the `version` row matching `CACHE_VERSION` | +| `files` | `content_hash` (PK) → file path, language, size, parse timestamp; the unit of cache lookup | +| `nodes` | per-file detected nodes; `row_id` AUTO_INCREMENT PK; FK to `files.content_hash` | +| `edges` | per-file detected edges; FK to `files.content_hash` | +| `analysis_runs` | `run_id` (PK), wall-clock metadata for one `index`/`analyze` invocation | + +**Reserved-word note:** H2 reserves `key`, `value`, `order`. The schema uses `meta_key` / `meta_value` etc. — keep that pattern when extending. + +**Concurrency:** the cache uses a `ReentrantReadWriteLock` (`AnalysisCache.java`). Many virtual-thread readers can run in parallel; writers serialize. This is what avoids `ClosedChannelException` against H2's MVStore file channel under concurrent virtual-thread access. + +## Neo4j schema (created by `GraphStore.bulkSave`) + +Indexes created idempotently (`CREATE … IF NOT EXISTS`) inside `GraphStore.bulkSave()` (`graph/GraphStore.java`, around lines 112–122 at time of writing — grep `CREATE INDEX` to relocate): + +| Index | Type | Property | +|---|---|---| +| (unnamed) | b-tree | `(:CodeNode {id})` | +| (unnamed) | b-tree | `(:CodeNode {label_lower})` | +| (unnamed) | b-tree | `(:CodeNode {fqn_lower})` | +| `search_index` | fulltext | `[label_lower, fqn_lower]` over `:CodeNode` | +| `lexical_index` | fulltext | `[prop_lex_comment, prop_lex_config_keys]` over `:CodeNode` | + +The `CLAUDE.md` "Gotchas" section additionally references b-tree indexes on `kind`, `layer`, `module`, `filePath`. **Cross-check before relying on those** — `grep "CREATE INDEX" graph/GraphStore.java` shows only the 3 above plus the 2 fulltext indexes. The CLAUDE.md claim may be aspirational or stale. + +### Property round-trip convention + +Domain `properties` Map → Neo4j stored as `prop_` properties. Domain ID, layer, kind, etc. become top-level node properties (`id`, `layer`, `kind`, `label_lower`, `fqn_lower`, `module`, `filePath`). The reverse mapping is in `nodeFromNeo4j()` inside `graph/GraphStore.java`. **Whenever you add a domain property, verify the round-trip survives** — silent property loss is the most common bug class on this seam. + +### Bulk-save batching + +`bulkSave` uses `UNWIND $batch AS props CREATE (n:CodeNode) SET n = props` for nodes (default batch 500) and a similar UNWIND-MATCH-MATCH-CREATE pattern for edges. Edge UNWIND **silently drops rows whose source/target node IDs are missing** — pre-validate before passing in. See [`CLAUDE.md`](../../CLAUDE.md) §"Gotchas". + +## Lifecycle / state machines + +There are no state machines on entities themselves. The closest thing is the **pipeline lifecycle** that produces them: + +``` +file on disk + ─► hashed (SHA-256, FileHasher.java) + ─► H2 cache lookup + ├─ hit → reuse cached nodes/edges + └─ miss → run detectors, write nodes+edges keyed by content_hash + ─► H2 cache populated + +(later, on `enrich`:) + ─► H2 read + ─► UNWIND bulk-load to Neo4j + ─► linkers (Topic, Entity, ModuleContainment, Guard) add cross-file edges + ─► LayerClassifier sets layer property on every node + ─► ServiceDetector adds SERVICE nodes + CONTAINS edges + ─► LanguageEnricher (per-language extractors) adds extractor results + ─► LexicalEnricher adds prop_lex_* + the lexical_index + ─► graph ready for `serve` +``` + +## Schema source of truth + +- **Neo4j shape:** `graph/GraphStore.java` is canonical (it creates the indexes; there are no other DDL sources). Property names like `label_lower` / `fqn_lower` / `prop_*` are decided here. +- **H2 shape:** `cache/AnalysisCache.java`'s `SCHEMA_SQL` constant is canonical. There is no separate migration directory — `CACHE_VERSION` is the migration mechanism. +- **Domain shape:** `model/{CodeNode,CodeEdge,NodeKind,EdgeKind}.java` are canonical. Detectors reference these enums by symbol; never use the lowercase string forms in detector code. + +If you change any of the three, **update the other two seams** (or document why you didn't). diff --git a/docs/project/flows.md b/docs/project/flows.md new file mode 100644 index 00000000..24ef4ba1 --- /dev/null +++ b/docs/project/flows.md @@ -0,0 +1,127 @@ +# Key Flows + +Four flows worth tracing — they cover the main code paths an agent will need to modify or debug. Each lists the file:line entry and the chain of calls. **Line numbers are accurate at the time of writing (2026-04-27)** but rot — `grep` for the symbol if a line drifts. + +--- + +## Flow: `codeiq index ` — file scan → H2 cache + +**Trigger:** `java -jar code-iq-*-cli.jar index /path/to/repo` from a shell. + +**Path through code:** + +1. `CodeIqApplication.java` `main(...)` — Spring Boot starts. The first arg (`index`) is *not* `serve`, so the app sets profile `indexing` and `WebApplicationType.NONE` (the `if (isServe) ... else ...` block). No web server spins up. +2. `CodeIqApplication.run(args)` — Picocli takes over: `new CommandLine(codeIqCli, factory).execute(args)`. +3. `cli/CodeIqCli.java` — top-level Picocli `@Command`. Subcommand dispatch routes to `cli/IndexCommand.java`. +4. `cli/IndexCommand.call()` — opens `cache/AnalysisCache` (creates the H2 file at `.codeiq/cache/` if missing; checks `CACHE_VERSION`). +5. `analyzer/FileDiscovery.discover(rootPath)` — runs `git ls-files` if the path is a git repo, else walks the filesystem. Returns a list of `DiscoveredFile`s with language tagged via `analyzer/FileClassifier.java`. +6. For each file, in batches (default 500): hash via `cache/FileHasher.hash(...)` (SHA-256), check the cache. + - **Cache hit** → reuse existing nodes/edges from H2. + - **Cache miss** → continue. +7. `analyzer/StructuredParser.parse(file)` — routes to JavaParser (Java), `grammar/AntlrParserFactory` (TS/Py/Go/C#/Rust/C++), or raw text. +8. **Detector fan-out** on virtual threads: every `@Component`-annotated `Detector` whose `getSupportedLanguages()` matches gets called with a `DetectorContext`. Results are collected per file. (Auto-discovery via Spring classpath scan; no manual list.) +9. `analyzer/GraphBuilder.addNodes(...) / addEdges(...)` — buffer to indexed slots so order is independent of thread completion. +10. `cache/AnalysisCache.write(contentHash, nodes, edges, runId)` — persist via UNWIND-friendly batches. +11. CLI prints summary; exit code 0. + +**Side effects:** `.codeiq/cache/` H2 file populated/updated. **No Neo4j writes**. No network calls. + +**Failure modes:** +- Per-file detector exceptions: caught + logged in `Analyzer.java`'s task wrapper; the file is skipped, the run continues. +- `CACHE_VERSION` mismatch: H2 file is wiped + recreated automatically on startup. +- Disk-full / permission errors: bubble up, run aborts with non-zero exit. + +--- + +## Flow: `codeiq enrich ` — H2 → Neo4j with linkers + classifiers + +**Trigger:** `java -jar code-iq-*-cli.jar enrich /path/to/repo` (after `index`). + +**Path through code:** + +1. `CodeIqApplication.main(...)` — same profile-selection logic; `enrich` → `indexing` profile, no web server. +2. `cli/EnrichCommand.call()` — opens `cache/AnalysisCache` (read), opens Neo4j Embedded directly via `DatabaseManagementServiceBuilder` (programmatic — Spring's `@Profile("serving")` Neo4j config is *not* loaded here). +3. `EnrichCommand` reads all nodes + edges from H2 in batches. +4. `graph/GraphStore.bulkSave(nodes, edges)` (line numbers approximate at time of writing — grep the Cypher fragment if drifted): + - `MATCH (n) WITH n LIMIT 5000 DETACH DELETE n RETURN count(*)` — clear in chunks if a previous graph existed. + - `CREATE INDEX IF NOT EXISTS` for `id`, `label_lower`, `fqn_lower` + `CREATE FULLTEXT INDEX` for `search_index` and `lexical_index`. + - `UNWIND $batch AS props CREATE (n:CodeNode) SET n = props` — nodes, batched (default 500). + - `UNWIND $batch AS e MATCH (a {id: e.src}) MATCH (b {id: e.tgt}) CREATE (a)-[r:EDGE_KIND]->(b)` — edges, batched. **Silently drops rows where source/target IDs miss.** +5. `analyzer/linker/*` — runs in order: `TopicLinker`, `EntityLinker`, `ModuleContainmentLinker`, `GuardLinker`. Each adds cross-file edges (e.g. `PRODUCES`/`CONSUMES` from a topic name appearing in two services). +6. `analyzer/LayerClassifier.classify(...)` — sets `n.layer` on every node based on `kind`, `framework`, and path heuristics. +7. `analyzer/ServiceDetector.detect(rootPath)` — walks the filesystem (not the Neo4j graph) for build files (Maven, Gradle, npm, Cargo, go.mod, etc. — 30+). Creates `:CodeNode {kind: 'service'}` nodes and `CONTAINS` edges to every module/file inside the service boundary. +8. `intelligence/extractor/LanguageEnricher` — runs per-language extractors (`JavaLanguageExtractor`, `TypeScriptLanguageExtractor`, `PythonLanguageExtractor`, `GoLanguageExtractor`) to add language-specific properties. +9. `intelligence/lexical/LexicalEnricher` — extracts doc comments (`DocCommentExtractor`) and persists to `prop_lex_comment`; populates the `lexical_index` fulltext index. +10. CLI prints summary; exit 0. + +**Side effects:** `.codeiq/graph/graph.db/` populated. H2 cache untouched. + +**Failure modes:** +- Edge with missing source/target ID: silently dropped by Cypher MATCH. Mitigation: pre-validate IDs before passing to `bulkSave`. **Most common cause of "missing relationships" bugs.** +- Property round-trip failure: a domain property survives `bulkSave` but `nodeFromNeo4j()` doesn't know to restore it → silent property loss. Verify by reading back any node you just wrote. + +--- + +## Flow: `codeiq serve ` — REST + MCP + UI request lifecycle + +**Trigger:** `java -jar code-iq-*-cli.jar serve /path/to/repo` (after `enrich`). Then a browser hits `http://localhost:8080/explorer` or an MCP client calls a tool. + +**Path through code (cold start):** + +1. `CodeIqApplication.main(...)` — first arg is `serve` → profile `serving` activated; web server starts. +2. Spring loads beans gated by `@Profile("serving")`: all 4 controllers in `api/`, `mcp/McpTools` (via Spring AI starter), the Neo4j `@Configuration` in `config/Neo4jConfig.java` (only when `codeiq.neo4j.enabled=true`). +3. Neo4j Embedded starts; `health/GraphHealthIndicator` reports status to `/actuator/health`. +4. Spring Boot's static-resource handler binds `src/main/resources/static/` (the bundled SPA) to `/`. +5. Server bound — `http://localhost:8080` ready. + +**Path through code (REST request, e.g. `GET /api/stats`):** + +1. Browser hits `/api/stats`. +2. `api/GraphController.getStats(...)` (`@GetMapping("/stats")`) is dispatched (carries `@Profile("serving")`). +3. Controller delegates to `query/StatsService.getStats()`. +4. `StatsService` runs Cypher queries via `graph/GraphStore.queryNodes(...)` (raw Cypher, not SDN). +5. Results aggregated into a `Map` and serialized by Jackson. +6. HTTP response returned. + +**Path through code (MCP tool call, e.g. `find_dead_code`):** + +1. MCP client (Claude Desktop, an LLM agent, the SPA's `McpConsole`) sends a JSON-RPC call to `/mcp` (mounted by Spring AI's `spring-ai-starter-mcp-server-webmvc`). +2. Spring AI dispatches to the matching `@McpTool`-annotated method on `mcp/McpTools.java`. +3. The MCP tool delegates to `query/QueryService.findDeadCode()` (or similar). +4. `QueryService` runs Cypher (filters by semantic edges only — `calls`, `imports`, `depends_on`; excludes structural `contains`, `defines`, and entry points like endpoints / config files — see [`CLAUDE.md`](../../CLAUDE.md) "Gotchas"). +5. Result returned as JSON-RPC response. + +**Side effects:** None — strictly read-only. + +**Failure modes:** +- Calling `serve` before `enrich` → `health/GraphHealthIndicator` reports DOWN; queries return empty results. Fix: run `enrich` first. +- CORS rejection if the SPA is being served from a different origin in dev: configure `codeiq.cors.allowed-origin-patterns` in `application.yml` (or env: `CODEIQ_CORS_ALLOWED_ORIGIN_PATTERNS`). +- `FAIL_ON_UNKNOWN_PROPERTIES` is globally disabled (`config/JacksonConfig.java`) — MCP protocol clients won't break on field additions, but it also hides typos in JSON inputs. Validate at the controller boundary. + +--- + +## Flow: Adding a new detector and seeing it run + +**Trigger:** developer adds `MyDetector.java` and rebuilds. + +**Path through code (compile-time + first run):** + +1. `src/main/java/io/github/randomcodespace/iq/detector//MyDetector.java` — new file, `@Component`-annotated, `@DetectorInfo(...)`-annotated, extending one of the `Abstract*Detector` base classes. +2. `mvn package` — compiles the class. +3. On the next `codeiq index `: + - Spring Boot starts under `indexing` profile, classpath-scans `io.github.randomcodespace.iq` for `@Component`s. + - `MyDetector` is instantiated as a singleton bean. + - `analyzer/Analyzer` (or `cli/IndexCommand`) iterates Spring's `Map` of all bean instances. +4. For every file whose language matches `getSupportedLanguages()`, `MyDetector.detect(ctx)` is called on a virtual thread. +5. Returned `DetectorResult` is folded into `GraphBuilder` (nodes-first, then edges). +6. From there: identical to the `index` flow — H2 cache write, then `enrich`, then visible via `serve`. + +**Verification:** +- `codeiq plugins list` introspects via `@DetectorInfo` and confirms the detector is live. +- `codeiq stats ` — node-kind counts should change after re-indexing. +- Unit test `MyDetectorTest` (positive + negative + determinism) must pass via `mvn test`. + +**Failure modes:** +- Forgot `@Component` → silently disabled, no error. Test won't catch it (unit tests instantiate directly). Catch via `codeiq plugins list` showing the detector is missing. +- Missing discriminator guard on a framework detector → false positives across other frameworks. Catch via the negative-match unit test. +- Stateful instance fields → race conditions across virtual threads. Catch via the determinism test. diff --git a/docs/project/ui.md b/docs/project/ui.md new file mode 100644 index 00000000..c46599db --- /dev/null +++ b/docs/project/ui.md @@ -0,0 +1,136 @@ +# UI + +App-mode (not library-mode): codeiq ships a single React SPA bundled inside the JAR and served by Spring Boot's static-resource handler at `http://localhost:8080/` when running `codeiq serve`. + +## Stack + +- **Framework:** React 18.3 (`src/main/frontend/package.json`) +- **Build tool:** Vite 6.4 + TypeScript 5.7 (`src/main/frontend/vite.config.ts`, `tsconfig.json`) +- **UI kit:** Ant Design 5.24 + `@ant-design/icons` 5.6 +- **Charts:** ECharts 5.6 via `echarts-for-react` 3.0 +- **Routing:** `react-router-dom` 7 +- **Styling:** AntD's built-in theme system (no Tailwind, no CSS Modules); `context/ThemeContext.tsx` toggles light/dark via AntD's `ConfigProvider` token system. +- **State management:** local component state + a tiny `useApi` hook (`hooks/useApi.ts`); no Redux / Zustand / React Query. +- **Data fetching:** raw `fetch` wrapped in `lib/api.ts` + `hooks/useApi.ts`. + +## Entry & layout + +- **HTML entry:** `src/main/frontend/index.html` (Vite default). +- **JS entry:** `src/main/frontend/src/main.tsx` → renders `` (`src/main/frontend/src/App.tsx`). +- **Root shell:** `App.tsx` wires the AntD `ConfigProvider`, the `ThemeContext.Provider`, and `react-router-dom`'s `BrowserRouter` + `Routes`. +- **Layout:** `components/AppLayout.tsx` — sidebar + content area; light/dark toggle via `useTheme()` from `ThemeContext.tsx`. +- **Provider stack** (outer → inner): AntD `ConfigProvider` → `ThemeContext.Provider` → `BrowserRouter` → `AppLayout` → page route. + +## Component organization + +``` +src/main/frontend/src/ +├── main.tsx — Vite entry, renders +├── App.tsx — providers + routes +├── env.d.ts — Vite env-var types +├── components/ +│ └── AppLayout.tsx — sidebar + content layout, theme toggle +├── context/ +│ └── ThemeContext.tsx — light/dark toggle +├── hooks/ +│ └── useApi.ts — generic API-call hook (loading / error / data) +├── lib/ +│ ├── api.ts — fetch wrapper + endpoint helpers +│ └── mcp-tools.ts — TOOLS, CATEGORIES, toolsByCategory, McpTool type +├── pages/ — one file per route +│ ├── Dashboard.tsx — stats overview + MCP tool launcher +│ ├── CodebaseMap.tsx — file-tree explorer +│ ├── Explorer.tsx — node/edge browser with kind filter + search +│ └── McpConsole.tsx — interactive MCP-tool playground +└── types/ + └── api.ts — TypeScript types matching the REST API shapes +``` + +**Conventions:** +- **`@/...` import alias** resolves to `src/main/frontend/src/...` (`vite.config.ts` `resolve.alias` + `tsconfig.json` `paths`). Always use the alias — never `../../../`. +- **One component per file**, `PascalCase.tsx`. +- **Pages are at `src/pages/`**; shared/UI primitives at `src/components/`. Reusable, non-page UI primitives haven't grown enough to warrant a `ui/` sublayer yet — fold into `components/` until that becomes painful. +- **No test colocation** for the SPA — frontend tests are E2E only via Playwright. Component-level testing isn't currently practiced. + +## Routes + +(Inferred from page filenames; **verify in `src/main/frontend/src/App.tsx`** before relying.) + +- `/` → `Dashboard` +- `/explorer` → `Explorer` +- `/codebase-map` → `CodebaseMap` +- `/mcp` → `McpConsole` + +## Design system + +- **Tokens:** AntD's built-in token system, customized via `ConfigProvider` in `App.tsx` and theme-keyed via `ThemeContext.tsx`. No standalone token file. +- **Primitives:** AntD components used directly (`Button`, `Layout`, `Menu`, `Table`, `Input`, etc.). No internal wrapper library. +- **Icons:** `@ant-design/icons` (`SunOutlined`, `MoonOutlined`, etc. — see `components/AppLayout.tsx`). + +## Data fetching + +`hooks/useApi.ts` wraps `lib/api.ts`'s `api.(...)` calls and exposes `{ data, loading, error, refetch }`. Page components use it like: + +```ts +const { data, loading, error } = useApi(() => api.stats()); +``` + +Endpoint helpers live in `lib/api.ts`; response types in `types/api.ts`. The MCP tools list — used by `Dashboard` and `McpConsole` — is a static client-side catalog at `lib/mcp-tools.ts` (it mirrors `mcp/McpTools.java` server-side; **must be kept in sync manually** when adding a tool). + +## Forms & validation + +Minimal — no `react-hook-form` / `formik`. The `McpConsole` builds parameter inputs dynamically from `lib/mcp-tools.ts` definitions; validation is "send and surface server error". This is fine for an internal dev tool. + +## i18n / a11y / theming + +- **i18n:** none. Strings are inline English. codeiq is a developer tool; no plan to localize. +- **a11y:** Playwright config integrates `@axe-core/playwright` (`src/main/frontend/package.json` devDep) — accessibility audits run as part of E2E. AntD's primitives carry sensible roles/labels; custom components inherit those. +- **Theming:** `ThemeContext.tsx` flips a boolean → AntD token theme (`defaultAlgorithm` vs `darkAlgorithm`). The toggle is in the layout header. No `prefers-color-scheme` auto-detection currently — feature gap if you care. + +## Performance notes + +- **Manual chunk splitting** in `vite.config.ts` (`build.rollupOptions.output.manualChunks`): + - `vendor-react` — React + react-dom + react-router-dom + - `vendor-antd` — antd + @ant-design/icons + - `vendor-echarts` — echarts + echarts-for-react + + Keeps the AntD chunk and the ECharts chunk out of the initial paint; both are heavy. +- **`chunkSizeWarningLimit: 1200`** — Vite's default 500 KB warning was too noisy for the AntD chunk; raised deliberately. +- **`emptyOutDir: false`** — preserves manually-placed assets in `src/main/resources/static/` between builds. If you see leftover files, delete the dir manually. +- **`sourcemap: false`** — production output ships without sourcemaps (the JAR is the ship artifact; sourcemaps would balloon it). + +## Dev loop + +```bash +# Backend — terminal 1 +java -jar target/code-iq-*-cli.jar serve /path/to/scan-target + +# Frontend — terminal 2 +cd src/main/frontend +npm install # only first time +npm run dev # Vite HMR on :5173, proxies /api and /mcp to :8080 +``` + +The Vite dev-server proxy is defined at the bottom of `vite.config.ts`: + +```ts +server: { + proxy: { + '/api': 'http://localhost:8080', + '/mcp': 'http://localhost:8080', + }, +} +``` + +## Production build → JAR embed + +`mvn package` triggers `frontend-maven-plugin` which runs `npm ci` + `npm run build`. Vite's `build.outDir: '../resources/static'` writes assets into `src/main/resources/static/`, which Spring Boot's static-resource handler serves out of the JAR at runtime when `codeiq.ui.enabled=true` (default true; toggle in `application.yml`). + +To skip the frontend build during backend-only iteration: `mvn test -Dfrontend.skip=true` (the property is wired in `pom.xml`'s `` block as `false`). + +## Gotchas + +- **`lib/mcp-tools.ts` is hand-maintained** — when you add a new `@McpTool` in `mcp/McpTools.java`, you must mirror the entry in `lib/mcp-tools.ts` for the `McpConsole` and `Dashboard` to know about it. There is no auto-sync. +- **`emptyOutDir: false`** — stale assets in `src/main/resources/static/` won't be deleted by Vite. If you renamed a chunk or removed a page, manually delete the static dir before the next build. +- **MCP endpoint path is `/mcp`**, not `/api/mcp` — the Vite proxy reflects this. The Spring AI starter mounts MCP at the root. +- **AntD chunk size is intentional.** Don't try to "fix" the 500 KB+ AntD chunk by code-splitting per page — the AntD design tokens shouldn't be reloaded per route. The manual chunk in `vite.config.ts` is the right granularity. From 244d6b855be307e045e79920305f4d5149277d67 Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 15:49:58 +0000 Subject: [PATCH 05/23] checkpoint: pre-yolo 2026-04-27T15:49:58 From 2c034252d72c6901c864c77676a23cc1f9fb3bd3 Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 16:13:28 +0000 Subject: [PATCH 06/23] =?UTF-8?q?docs(spec):=20add=20sub-project=201=20des?= =?UTF-8?q?ign=20=E2=80=94=20resolver=20SPI=20+=20Java=20pilot=20+=20confi?= =?UTF-8?q?dence=20schema?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Brainstormed the first of 8 sub-projects in the "robust graph" decomposition. Sub-project 1 introduces a symbol-resolution stage between parse and detect, defines the per-language SymbolResolver SPI, ships a Java backend wrapping JavaParser's JavaSymbolSolver, and adds Confidence (LEXICAL/SYNTACTIC/RESOLVED) + source fields on every CodeNode/CodeEdge with Neo4j round-trip and an H2 cache version bump. Migrates 4-6 Java detectors as proof of value; existing detectors compile and run unchanged via opt-in Optional. Aggressive testing baked in: 9 layers (unit, detector x resolver, concurrency stress, memory/pathological, adversarial, determinism, E2E petclinic regression, property-based via jqwik, mutation testing via PIT). Backward compatibility scoped to logical-content equality with explicit one-time snapshot refreshes for the additive confidence/source fields. Spec lives under docs/specs/ (alongside docs/project/) since docs/superpowers/ is gitignored except for baselines/. Awaiting maintainer review before writing the implementation plan. Co-Authored-By: Claude Opus 4.7 (1M context) --- ...4-27-resolver-spi-and-java-pilot-design.md | 379 ++++++++++++++++++ 1 file changed, 379 insertions(+) create mode 100644 docs/specs/2026-04-27-resolver-spi-and-java-pilot-design.md diff --git a/docs/specs/2026-04-27-resolver-spi-and-java-pilot-design.md b/docs/specs/2026-04-27-resolver-spi-and-java-pilot-design.md new file mode 100644 index 00000000..c3f053cf --- /dev/null +++ b/docs/specs/2026-04-27-resolver-spi-and-java-pilot-design.md @@ -0,0 +1,379 @@ +# Sub-project 1 — Resolver SPI + Java Pilot + Confidence Schema + +> **Status:** Awaiting approval. Brainstormed 2026-04-27. +> **Authors:** brainstormed via `superpowers:brainstorming` with the project maintainer. +> **Audience:** the agent / engineer who will implement this. Every claim should be checkable against the codebase referenced by `CLAUDE.md` and `PROJECT_SUMMARY.md`. + +## 1. Context + +codeiq's detector layer is the right abstraction. The **layer below it** is the bottleneck: detectors receive a parse tree (ANTLR) or AST (JavaParser) but no resolved symbol table. As a result, edges like `CALLS`, `INJECTS`, `IMPLEMENTS`, `EXTENDS`, and many framework-specific edges are emitted *by name*, not by **resolved type**. Two same-named symbols across packages collapse into one node; `userService.findById(id)` resolves to whichever `findById` the detector happens to see first. + +This is the architectural seam between "rich code map" and "ground-truth semantic graph." Every other planned improvement — TypeScript / Python / Go / Rust / C++ / C# resolution, framework-aware detection refactor, cross-framework false-positive harness — slots into this seam. Doing it second means inventing the seam ad-hoc inside whichever sub-project lands first, then retrofitting. + +This spec covers **sub-project 1 of 8** in the larger "robust graph" decomposition: + +| # | Scope | This spec? | +|---|---|---| +| 1 | Resolver SPI + Java pilot + confidence/provenance schema | **Yes** | +| 2 | TypeScript / JavaScript resolution | No | +| 3 | Python resolution | No | +| 4 | Go resolution | No | +| 5 | Rust / C++ / C# resolution | No | +| 6 | Framework-aware detection refactor | No | +| 7 | Cross-framework false-positive harness | No | +| 8 | MCP HTTP-streamable hardening (read-path) | No | + +## 2. Goals + +1. **Add a symbol-resolution stage** to the indexing pipeline, between parse and detect, that exposes a resolved symbol table to detectors. +2. **Wire a Java backend** using JavaParser's `JavaSymbolSolver`, with no new dependency tree (the solver is published alongside JavaParser). +3. **Add a confidence/provenance schema** (`Confidence` enum + `source` field) on every `CodeNode` and `CodeEdge`, round-tripped through Neo4j. +4. **Migrate 4–6 Java detectors** to use the resolver as proof of value: at least one Spring DI detector, one JPA detector, one messaging detector. +5. **Preserve backward compatibility:** all existing detectors compile and run unchanged. Resolution is opt-in per detector via `ctx.resolved()`. +6. **Preserve determinism:** resolver-stage output is byte-identical run-to-run, with the same input. +7. **Aggressive testing**, including adversarial inputs, concurrency stress, property-based, fuzz, mutation testing, and regression against the existing E2E quality bar. + +## 3. Non-goals + +- Maven / Gradle classpath JAR resolution beyond what `ReflectionTypeSolver` covers via the running JDK. (Possible follow-up: sub-project 1.5.) +- Resolution for non-Java languages. (Sub-projects 2–5.) +- Refactoring detectors to detect by resolved type rather than import-name. (Sub-project 6 — separate concern; a migrated detector here keeps its current detection mechanism, only resolving outgoing edges' targets more accurately.) +- Performance optimization beyond what the design naturally affords. (Defer until measured.) +- Changes to the serving layer (REST API, MCP tools, web UI). +- Changes to `application.yml` Spring-owned keys (CORS, Neo4j Bolt port, UI toggle). + +## 4. Architecture + +### 4.1 Pipeline shape + +The current `index` and `analyze` pipelines look like: + +``` +discover → parse → detect → link → classify → store +``` + +After this sub-project, they become: + +``` +discover → parse → resolve → detect → link → classify → store +``` + +The resolve stage runs after `analyzer/StructuredParser` produces a parsed file and before the detector fan-out kicks off. + +### 4.2 Resolver-pass placement + +- **Bootstrapping:** `analyzer/Analyzer` (or `cli/IndexCommand`'s in-process pipeline) calls `ResolverRegistry.bootstrap(rootPath)` once per analysis run, before file iteration begins. The Java resolver uses this hook to build a single `CombinedTypeSolver` configured with sorted source roots and `ReflectionTypeSolver`. Other languages' resolvers (future sub-projects) plug into the same hook. +- **Per-file resolution:** for each file, after parse, the analyzer asks `ResolverRegistry.resolverFor(language)` for the matching resolver, calls `resolve(parsedFile)`, and stores the result on the `DetectorContext` as `Optional`. +- **Detector consumption:** detectors call `ctx.resolved()`. If present, the detector may emit edges with `Confidence.RESOLVED`; if absent, the detector falls through to its existing logic and emits `Confidence.SYNTACTIC` (when AST-based) or `Confidence.LEXICAL` (when regex-based). + +### 4.3 Pipeline invariant + +The new stage must not change *which files are analyzed* or *which detectors run for them*. It only enriches the input each detector sees. A regression here breaks every downstream count and statistic. + +## 5. Components + +### 5.1 New components + +| Path | Type | Responsibility | +|---|---|---| +| `intelligence/resolver/SymbolResolver.java` | interface | SPI: `Set getSupportedLanguages(); Resolved resolve(ParsedFile parsed) throws ResolutionException;` | +| `intelligence/resolver/Resolved.java` | interface (or sealed type) | Read-only resolution result for one file: per-symbol type info, resolved imports, declared types. Includes `Confidence sourceConfidence()` indicating the resolver's confidence in this particular result. | +| `intelligence/resolver/EmptyResolved.java` | record / class | Singleton "no resolution available" — returned for unsupported languages, disabled config, or resolution failure. | +| `intelligence/resolver/ResolverRegistry.java` | `@Component` | Auto-discovers `@Component` `SymbolResolver` beans (mirrors `DetectorRegistry`). Exposes `resolverFor(language)` and `bootstrap(rootPath)`. | +| `intelligence/resolver/ResolutionException.java` | exception | Wraps backend-specific failures (e.g. `JavaSymbolSolver` errors) with context (file path, language). | +| `intelligence/resolver/java/JavaSymbolResolver.java` | `@Component` | Wraps `JavaSymbolSolver`. Builds `CombinedTypeSolver` from sorted source roots + `ReflectionTypeSolver`. | +| `intelligence/resolver/java/JavaResolved.java` | record | Java-specific `Resolved` carrying JavaParser `TypeSolver` + per-AST resolved type info. | +| `intelligence/resolver/java/JavaSourceRootDiscovery.java` | helper | Discovers Java source roots from a project root (auto-detects `src/main/java`, `src/test/java`, multi-module via Maven `` / Gradle `include`). Pure logic, unit-testable. | +| `model/Confidence.java` | enum | `LEXICAL` / `SYNTACTIC` / `RESOLVED` with a numeric mapping (0.6 / 0.8 / 0.95). Comparable. | +| `model/EdgeProvenance.java` *(optional, see §5.3)* | record | Optional richer provenance carrier; if not adopted, just use `String source` on `CodeEdge`. | + +### 5.2 Changed components + +| Path | Change | Rationale | +|---|---|---| +| `detector/DetectorContext.java` | Add `Optional resolved()` accessor. Defaults to `Optional.empty()`. Existing constructors keep working. | Detector opt-in path. | +| `model/CodeNode.java` | Add `Confidence confidence` and `String source` fields. `source` filled in by detector base classes (detector class simple name). `confidence` set per parser type (see §5.3): `AbstractRegexDetector` → `LEXICAL`, `AbstractJavaParserDetector` / `AbstractAntlrDetector` / `AbstractStructuredDetector` → `SYNTACTIC`. Detectors override to `RESOLVED` when emitting an edge derived from `ctx.resolved()`. | Confidence/provenance schema. | +| `model/CodeEdge.java` | Same as `CodeNode`. | Same. | +| `graph/GraphStore.java` | `bulkSave` writes `prop_confidence` and `prop_source`; `nodeFromNeo4j` / `edgeFromNeo4j` restore them. | Round-trip the new fields. | +| `cache/AnalysisCache.java` | Bump `CACHE_VERSION` from 4 to 5. Add `confidence` and `source` columns to `nodes` and `edges` tables. | Schema change requires cache reset. | +| `analyzer/Analyzer.java` | Insert resolve step. `bootstrapResolvers(rootPath)` once; `resolverFor(language).resolve(parsed)` per file. | Pipeline integration. | +| `cli/IndexCommand.java` | Mirror `Analyzer`'s resolver bootstrap (the in-process H2 batched pipeline). | Both code paths must integrate. | +| 4–6 Java detectors (see §5.4) | Use `ctx.resolved()`. Emit `Confidence.RESOLVED` when present; existing path emits `Confidence.SYNTACTIC`. | Proof of value. | +| `pom.xml` | Add `com.github.javaparser:javaparser-symbol-solver-core` (Apache-2.0, version-pinned to match `javaparser-core`). Resolve **latest stable matching version** at implementation time. Add `net.jqwik:jqwik` (test scope, EPL-2.0) for property-based tests. | New deps. | +| `codeiq.yml` schema (`docs/codeiq.yml.example`) | Document the new `intelligence.symbol_resolution.java` keys. | Surface the new config. | +| `config/CodeIqConfig.java` (or unified-config equivalent) | Bind the new keys. | Enable the toggles. | + +### 5.3 Confidence / provenance — schema decisions + +- **Storage shape:** the simplest viable model is two scalar fields on every `CodeNode` and `CodeEdge`: + - `confidence: Confidence` (enum, non-null). The default is set by the detector's base class — not a single hardcoded value — based on the parser used: + - `AbstractRegexDetector` → `LEXICAL` (pattern-only, no AST) + - `AbstractJavaParserDetector` / `AbstractAntlrDetector` / `AbstractStructuredDetector` / `AbstractPythonAntlrDetector` / `AbstractTypeScriptDetector` / `AbstractJavaMessagingDetector` / `AbstractPythonDbDetector` → `SYNTACTIC` (AST or parse tree, no symbol resolution) + - Detector overrides to `RESOLVED` for any edge derived from `ctx.resolved()`. + - `source: String` (non-null; detector class simple name, e.g. `"SpringServiceDetector"`) +- **Numeric access:** consumers (Cypher queries, MCP tools, the SPA) get a numeric value via `Confidence.score()` (0.6 / 0.8 / 0.95). The mapping is a static lookup; the enum is the authoritative form. +- **Future extensibility:** if richer provenance is needed later (e.g. resolver name, resolution timestamp), extend with optional `prop_resolver` etc. — the enum + source design does not preclude this. Don't pre-build for it. +- **MCP / API surface:** `confidence` and `source` are passthrough fields in node/edge JSON serialization. No new endpoints. Cypher filters can use `WHERE n.confidence = 'RESOLVED'` once the schema lands. + +### 5.4 Detector migration candidates (4–6) + +Final selection happens at implementation time based on which gives the clearest signal in `spring-petclinic`. Likely set: + +| Detector | Path | Why | +|---|---|---| +| `SpringServiceDetector` | `detector/jvm/java/SpringServiceDetector.java` | `@Autowired UserService` — needs to resolve `UserService` to its actual type for cross-class wiring. Highest visibility win. | +| `SpringRepositoryDetector` | `detector/jvm/java/SpringRepositoryDetector.java` | Repository interfaces extending `JpaRepository` — resolving `T` lets us link the repo to the entity. | +| `JpaEntityDetector` | `detector/jvm/java/JpaEntityDetector.java` | `@OneToMany List` — resolving the generic argument links entity-to-entity correctly. | +| `JpaRepositoryDetector` | `detector/jvm/java/JpaRepositoryDetector.java` | Same as Spring repo, deeper. | +| `KafkaListenerDetector` | `detector/jvm/java/KafkaListenerDetector.java` | Topic resolution from `@KafkaListener(topics = TOPIC_CONST)`. | +| `SpringRestDetector` | `detector/jvm/java/SpringRestDetector.java` | `@RequestBody UserDto dto` — resolving `UserDto` enables `MAPS_TO` edges from endpoint to entity. | + +Six is the upper bound; if four are sufficient to demonstrate measurable quality lift on petclinic, the rest can be migrated in follow-up PRs without changing this spec. + +## 6. Data flow (per analysis run) + +``` +1. cli/{Index,Analyze}Command.call() → analyzer/Analyzer.run(rootPath) + 1.1. ResolverRegistry.bootstrap(rootPath) + → JavaSymbolResolver.bootstrap() + - JavaSourceRootDiscovery.discover(rootPath) → sorted List + - new CombinedTypeSolver( + new ReflectionTypeSolver(), + sorted source roots wrapped in JavaParserTypeSolver) + - new JavaSymbolSolver(combinedTypeSolver) + - configure JavaParser default ParserConfiguration with the solver +2. For each discovered file (virtual thread): + 2.1. StructuredParser.parse(file) → ParsedFile (Java → CompilationUnit; others → existing types) + 2.2. resolved = ResolverRegistry.resolverFor(file.language()).resolve(parsedFile) + (returns EmptyResolved.INSTANCE for languages without a registered resolver) + 2.3. ctx = DetectorContext.builder()...resolved(resolved)...build() + 2.4. for each Detector matching language: detector.detect(ctx) +3. GraphBuilder.flush() → AnalysisCache (or → GraphStore on enrich) + - Each node and edge carries Confidence + source + - Round-tripped via prop_confidence / prop_source in Neo4j +``` + +## 7. Configuration surface + +New keys in `codeiq.yml`: + +```yaml +intelligence: + symbol_resolution: + java: + enabled: true + source_roots: auto # or explicit list of paths relative to repo root + jdk_reflection: true # ReflectionTypeSolver — needs JDK on classpath (always true for codeiq's runtime) + # bootstrap_timeout_seconds: 30 (kill switch if solver hangs) + # max_per_file_resolve_ms: 500 (per-file resolution timeout) +``` + +**Defaults:** +- `enabled: true` — most users want correctness > raw speed. +- `source_roots: auto` — discovery covers Maven (`src/main/java`, `src/test/java`, multi-module via `` in `pom.xml`), Gradle (similar), and plain layouts. +- `jdk_reflection: true`. +- `bootstrap_timeout_seconds: 30`. +- `max_per_file_resolve_ms: 500`. + +**Env overrides:** `CODEIQ_INTELLIGENCE_SYMBOL_RESOLUTION_JAVA_ENABLED=false` etc. + +**Config validation:** `codeiq config validate` must reject invalid combinations (e.g. `enabled: true` with empty `source_roots: []`). + +## 8. Backward compatibility + +- All existing `Detector` implementations compile and run unchanged. `ctx.resolved()` returns `Optional.empty()` for them by default (they never call it). +- Existing tests must pass with `intelligence.symbol_resolution.java.enabled: false`. **Mandatory.** Two sub-cases: + - **Logical-content tests** (assert on node IDs, edge counts, specific property values): pass unchanged. + - **JSON-snapshot / golden-file tests** (assert on full serialized output): will shift by exactly two new fields per node/edge (`confidence`, `source`). These get a **one-time refresh** during implementation, with a separate commit so the diff is reviewable. The refresh must produce only those two added fields per record — any other diff is a bug. +- With `enabled: true`, logical-content tests still pass — but some node/edge counts may shift **by design** (resolved-mode detectors emit different / additional edges that the lexical fallback could not produce). Expected diffs are recorded in the implementation plan and PR description. +- `CACHE_VERSION` bump from 4 to 5 wipes old `.codeiq/cache/` on first run. Documented in `CHANGELOG.md` under `[Unreleased]` as a breaking cache change. End users lose nothing meaningful; the cache rebuilds on the next `index` run. + +## 9. Performance budget + +| Stage | Cost | Notes | +|---|---|---| +| Resolver bootstrap | 2–5 s on a medium repo | One-time per run. Cached `CombinedTypeSolver` reused across files. | +| Per-Java-file resolve | 50–200 ms typical | Net +30–60% on Java analysis time. | +| Per-non-Java-file resolve | 0 (EmptyResolved) | No-op. | +| Memory overhead | tens to low hundreds of MB | `CombinedTypeSolver` caches resolved type info; bounded by source-root size. | +| Determinism cost | none | Sorted source roots add ms-scale. | + +For a 44 K-file codebase: +- Today: index ~220 s. +- After: index ~280–350 s (Java-heavy repos worst case). Acceptable. +- Mitigation: `intelligence.symbol_resolution.java.enabled: false` for raw-speed scans. + +**Performance gate:** if resolver bootstrap exceeds 10 s on `spring-petclinic`, the implementation has a bug — investigate before merge. + +## 10. Determinism guarantees + +- `JavaSourceRootDiscovery.discover(rootPath)` returns roots sorted alphabetically. +- `CombinedTypeSolver` member solvers added in the sorted order. +- `ResolverRegistry` exposes resolvers in stable iteration order (Spring `@Component` collection sorted by simple class name). +- `Resolved` value-types use `TreeMap` / sorted `List` for any iteration-order-sensitive data. +- New determinism test (mandatory): run resolver twice on the same input via separate JVM invocations, assert byte-identical serialized output. Mirrors existing detector convention. + +## 11. Error handling + +| Failure | Behavior | +|---|---| +| Source root configured but missing | Log WARN, drop from solver list, continue. | +| Source root contains no Java files | Drop from solver list, continue. | +| `CombinedTypeSolver` construction throws | Log ERROR with classpath context, fall back to `EmptyResolved` for all files (resolver disabled for this run), increment a metric. Do **not** abort the analysis. | +| Per-file `resolve(parsedFile)` throws | Log DEBUG (these are expected for malformed sources), return `EmptyResolved` for that file, continue. | +| Per-file resolution exceeds `max_per_file_resolve_ms` | Cancel via virtual-thread interruption, return `EmptyResolved` for that file, count timeout in metrics. | +| Bootstrap exceeds `bootstrap_timeout_seconds` | Abort bootstrap, fall back to `EmptyResolved` for the run, log ERROR. Run continues without resolution. | +| Detector calls `ctx.resolved().get()` and crashes | Caught by existing per-detector `try/catch` in `Analyzer` — file is skipped, detector is logged, run continues. (Existing behavior.) | + +## 12. Aggressive testing strategy + +This section is binding. Every layer below is mandatory for sub-project 1; the same template applies to sub-projects 2–8. + +### Layer 1 — Resolver unit tests (pure, fast) + +For `JavaSymbolResolver`, with one synthetic source tree per test: + +- Empty file (zero declarations). +- Single class with no imports. +- Class with multiple methods of varying signatures (overloads). +- Class with generics (≥3 levels of nesting: `Map>>`). +- Inner classes (static, non-static, anonymous, local). +- Lambda expressions and method references. +- Records and sealed classes (Java 25). +- Enum with abstract methods. +- Interface with default methods. +- Abstract class. +- Annotations (definition + use). +- Imports: explicit, static, wildcard, missing target, unused. +- Cyclic imports between two files (legal in Java) — both resolve. +- Two classes with the same simple name in different packages — both resolve to distinct nodes. +- Symbol defined in JDK (`Optional`, `Stream`, `List`) — resolves via `ReflectionTypeSolver`. +- Multi-source-root: a class in `src/main/java` referencing one in `src/test/java`. + +Expected: every test asserts the *exact* `Resolved` content via golden files committed under `src/test/resources/intelligence/resolver/java/`. + +### Layer 2 — Detector × resolver integration tests + +For each migrated detector: +- **Resolved-mode positive:** with resolver enabled, assert resolved-only edges that the lexical fallback could not produce (e.g. `INJECTS` edges to the *correct* `UserService` of two same-named classes in different packages). +- **Fallback-mode positive:** with resolver disabled, assert logical-content output identical to the pre-spec baseline (modulo the additive `confidence` and `source` fields per §8). +- **Mixed mode:** simulate resolver failure on half the files; the other half emits resolved edges, the failing half emits fallback edges. Both labeled with correct `Confidence`. + +### Layer 3 — Concurrency stress + +- 1000 synthetic Java files resolved on virtual threads. Assert: no exceptions, no deadlocks, no thread starvation, total throughput within 2× of sequential baseline. Output identical to sequential run (sort-then-compare). +- Resolver bootstrap happens **once** even if 50 threads call `resolverFor` simultaneously at startup. Verify via mock + invocation count. + +### Layer 4 — Memory / pathological inputs + +- 10 000-line synthetic class file: resolves under -Xmx512m. +- File with 1000 imports (most unresolved): resolves without OOM; produces the expected partial result. +- Deep generic nesting (10 levels deep): resolves; runtime ≤ 1 s. + +### Layer 5 — Adversarial inputs + +- File with syntax errors (parser fails): resolver never invoked; `Analyzer` continues. +- File mis-tagged as Java but actually Kotlin / Groovy / random bytes: parser fails first; resolver never sees it. +- Mixed source root with `.java` and unrelated files: only `.java` files enter the solver. +- `ReflectionTypeSolver` simulated as unavailable (test injects null JDK classpath): resolver works at reduced fidelity, returns `Confidence.SYNTACTIC` for JDK-dependent symbols. + +### Layer 6 — Determinism + +- Run resolver 10 times against the same input on the same JVM. Assert byte-identical serialized graphs. +- Run resolver against the same input, with source roots passed in a different order. Assert byte-identical output (we sort internally). +- Run on cold and warm JVMs. Identical. + +### Layer 7 — E2E quality regression (gating) + +- `E2EQualityTest` against `spring-petclinic` ground truth (`src/test/resources/e2e/ground-truth-petclinic.json`): + - With `enabled: false`: logical-content output identical to the pre-spec baseline (modulo the additive `confidence` and `source` fields per record — see §8). Mandatory regression gate. + - With `enabled: true`: edge precision / recall **measurably up** vs. the `enabled: false` baseline. The implementation plan will record before/after numbers; this spec demands measurable improvement with no regressions on other metrics in the ground-truth file. +- Full `mvn test` green. +- Full `mvn verify` green (SpotBugs, dependency-check). May skip locally; CI is authoritative. + +### Layer 8 — Property-based / fuzz (jqwik) + +- New test scope dependency: `net.jqwik:jqwik` (latest stable, EPL-2.0). License is EPL-2.0 — flag for explicit approval; if rejected, swap for a permissive alternative (or hand-write generators). **License decision deferred to implementation time** — see §15 below. +- Generators produce small synthetic Java source strings (within JavaParser's grammar). Invariants tested: + - Resolver never throws an unchecked exception (only `ResolutionException` or returns `EmptyResolved`). + - Resolver always terminates within `max_per_file_resolve_ms`. + - Same input → same output (deterministic). + - Editing an unrelated file in a different source root never changes the resolution of file F. + +### Layer 9 — Mutation testing (PIT) + +- Add PIT mutation testing as a **non-gating** Maven goal (e.g. `mvn -P mutation pitest:mutationCoverage`). +- Target: 80% mutant kill rate on the new packages (`io.github.randomcodespace.iq.intelligence.resolver.*`, `io.github.randomcodespace.iq.model.Confidence`). +- Not bound to `mvn verify` — runs on demand. Used as a code-quality signal during PR review. + +### Test-data hygiene + +- Synthetic Java sources for unit tests live under `src/test/resources/intelligence/resolver/java//...`. +- Each scenario has a `README.md` explaining intent (one paragraph). +- Golden output (`expected.json`) checked in. Updated only via a documented refresh script. + +## 13. Acceptance criteria + +Sub-project 1 is "done" when **all** of the following are true on the feature branch: + +1. **All tests in §12 layers 1–7 pass.** Layers 8 and 9 are non-gating but must run cleanly. +2. **`mvn verify` green** on CI (full Java CI workflow, including SpotBugs and OWASP dependency-check). +3. **No logical-content regression** in any existing test (`mvn test` green with `enabled: false`). Snapshot tests refreshed in a separate commit per §8; the refresh diff must be limited to the two additive fields per record. +4. **E2E petclinic precision/recall measurably improved** with `enabled: true`. The PR description records before/after numbers. +5. **`CHANGELOG.md`** updated under `[Unreleased]` with a one-paragraph entry naming the new config keys, the schema additions, and the cache-version bump. +6. **`CLAUDE.md`** updated under "Gotchas" to note: confidence/provenance is now mandatory on every node/edge; the resolver pass is part of the pipeline; cache version is 5. +7. **`PROJECT_SUMMARY.md`** "Tech stack" + "Gotchas" updated. +8. **Determinism re-verified** on the migrated detectors (existing determinism tests still pass; new ones added per §12 layer 6). +9. **No new dependencies with non-permissive licenses** (Apache-2.0 / MIT / BSD only without explicit user sign-off; jqwik EPL-2.0 needs explicit OK or replacement — see §15). +10. **No new High/Critical CVEs** introduced (`mvn verify` security gate green). + +## 14. Risks & mitigations + +| Risk | Likelihood | Impact | Mitigation | +|---|---|---|---| +| `JavaSymbolSolver` performance worse than budgeted | Medium | Pipeline unusable for very large repos | `enabled: false` escape hatch; performance gate in §9; profile before merge | +| Source-root auto-discovery wrong on niche project layouts | Medium | Resolver falls back to `EmptyResolved` silently → user sees no improvement | Explicit `source_roots: [list]` override; clear log message at WARN when discovery yields zero roots; `codeiq config explain` shows discovered roots | +| Confidence schema change breaks consumers (SPA, MCP clients) | Low (additive only) | API drift | Fields are additive; default `LEXICAL`/detector-name. Existing consumers ignore unknown fields per Jackson config (`FAIL_ON_UNKNOWN_PROPERTIES = false`). | +| Cache-version bump surprises users | Low | One-time slow re-index after upgrade | `CHANGELOG` entry; user-facing log line on first run after bump | +| jqwik EPL-2.0 license blocked by user policy | Low (already flagged in defaults) | No property-based tests in layer 8 | Hand-write generators or pick a permissive alternative; flagged for decision at impl time | +| `JavaSymbolSolver` panics on Java 25 idioms (records, sealed, pattern-match) | Medium | Resolver failure on modern Java | Per-file resolution failures are caught (§11); track upstream JavaParser issues; pin to latest JavaParser version | +| Cross-class resolution still ambiguous with same-named symbols across modules | Medium | False matches even with resolver | Track via E2E quality numbers; flag for sub-project 1.5 (Maven/Gradle classpath JAR resolution) if material | + +## 15. Dependency decisions + +To be resolved at implementation time (NOT in this spec): + +1. **`javaparser-symbol-solver-core` exact version.** Resolve the latest stable version compatible with `javaparser-core` (currently 3.28.0 per CLAUDE.md). Use `context7` MCP first; fall back to Maven Central. +2. **`net.jqwik:jqwik` license (EPL-2.0).** Per `~/.claude/rules/dependencies.md`: "Permissive licenses (MIT/Apache/BSD) preferred. GPL/AGPL flagged for approval." EPL-2.0 is not GPL/AGPL but is also not on the preferred list. Default plan: ask the user once at implementation time; if blocked, swap for hand-rolled generators or another permissive property-test framework. **Will not add jqwik silently.** +3. **PIT mutation testing dep.** Apache-2.0; safe to add as a non-default Maven profile. + +## 16. Out of scope (cross-reference) + +- **TypeScript / JavaScript / Python / Go / Rust / C++ / C# resolution** — sub-projects 2–5. They will plug into the SPI defined here. +- **Detect-by-resolved-type detector refactor** — sub-project 6. Migrated detectors here keep their current detection mechanism; only their *outgoing edges* benefit from resolution. +- **Cross-framework false-positive harness** — sub-project 7. +- **MCP HTTP-streamable hardening** — sub-project 8. +- **Maven/Gradle classpath JAR resolution** — possible sub-project 1.5 if E2E quality numbers reveal a gap. + +## 17. Implementation sequencing (informational, plan owns the detail) + +The plan that follows this spec will sequence work as: +1. Schema changes (`Confidence` enum, `CodeNode`/`CodeEdge` fields, Neo4j round-trip, `AnalysisCache` schema + version bump). +2. SPI scaffolding (`SymbolResolver`, `Resolved`, `EmptyResolved`, `ResolverRegistry`). +3. Java backend (`JavaSourceRootDiscovery`, `JavaSymbolResolver`, `JavaResolved`). +4. Pipeline wiring (`Analyzer`, `IndexCommand`). +5. Detector migration (one detector at a time, each with new + existing tests passing). +6. Aggressive testing layers (1–9 in order, layers 8/9 may run in parallel with 5–7). +7. Doc updates (`CHANGELOG`, `CLAUDE.md`, `PROJECT_SUMMARY.md`). +8. PR ready for human review when all acceptance criteria green. + +## 18. References + +- [`PROJECT_SUMMARY.md`](../../../PROJECT_SUMMARY.md) — repo-wide entry point. +- [`CLAUDE.md`](../../../CLAUDE.md) — canonical internals. +- [`docs/project/architecture.md`](../../project/architecture.md) — pipeline + components, including the package layering rule that detectors may not depend on `analyzer/`. +- [`docs/project/data-model.md`](../../project/data-model.md) — `NodeKind`, `EdgeKind`, Neo4j schema, H2 cache schema. +- [`docs/project/conventions.md`](../../project/conventions.md) — detector authoring, base classes, "don't refactor" rules. +- [`docs/project/build-and-run.md`](../../project/build-and-run.md) — Maven, ANTLR codegen, frontend bundling. +- JavaParser symbol-solver documentation: resolve via `context7` MCP at implementation time. +- Sourcegraph SCIP and GitHub Stack Graphs as comparable patterns (informational only — not adopted in sub-project 1). From ac2021b2fc4d652e7f6ba6133d517bf281faa6cf Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 16:15:07 +0000 Subject: [PATCH 07/23] docs(spec): fix relative paths after spec relocation to docs/specs/ MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The spec's §18 References used 3-level-up paths (../../../X) targeting docs/superpowers/specs/ as the spec home. After relocating to docs/specs/ to respect the existing .gitignore policy, these paths resolved one level above the repo root. Adjust to the correct depth. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../2026-04-27-resolver-spi-and-java-pilot-design.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/specs/2026-04-27-resolver-spi-and-java-pilot-design.md b/docs/specs/2026-04-27-resolver-spi-and-java-pilot-design.md index c3f053cf..81326446 100644 --- a/docs/specs/2026-04-27-resolver-spi-and-java-pilot-design.md +++ b/docs/specs/2026-04-27-resolver-spi-and-java-pilot-design.md @@ -369,11 +369,11 @@ The plan that follows this spec will sequence work as: ## 18. References -- [`PROJECT_SUMMARY.md`](../../../PROJECT_SUMMARY.md) — repo-wide entry point. -- [`CLAUDE.md`](../../../CLAUDE.md) — canonical internals. -- [`docs/project/architecture.md`](../../project/architecture.md) — pipeline + components, including the package layering rule that detectors may not depend on `analyzer/`. -- [`docs/project/data-model.md`](../../project/data-model.md) — `NodeKind`, `EdgeKind`, Neo4j schema, H2 cache schema. -- [`docs/project/conventions.md`](../../project/conventions.md) — detector authoring, base classes, "don't refactor" rules. -- [`docs/project/build-and-run.md`](../../project/build-and-run.md) — Maven, ANTLR codegen, frontend bundling. +- [`PROJECT_SUMMARY.md`](../../PROJECT_SUMMARY.md) — repo-wide entry point. +- [`CLAUDE.md`](../../CLAUDE.md) — canonical internals. +- [`docs/project/architecture.md`](../project/architecture.md) — pipeline + components, including the package layering rule that detectors may not depend on `analyzer/`. +- [`docs/project/data-model.md`](../project/data-model.md) — `NodeKind`, `EdgeKind`, Neo4j schema, H2 cache schema. +- [`docs/project/conventions.md`](../project/conventions.md) — detector authoring, base classes, "don't refactor" rules. +- [`docs/project/build-and-run.md`](../project/build-and-run.md) — Maven, ANTLR codegen, frontend bundling. - JavaParser symbol-solver documentation: resolve via `context7` MCP at implementation time. - Sourcegraph SCIP and GitHub Stack Graphs as comparable patterns (informational only — not adopted in sub-project 1). From 9be2b17f78fa6b007b0bf02b9e5bc0f42212230d Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 16:20:38 +0000 Subject: [PATCH 08/23] docs: bring all documentation in sync with live state + cross-ref docs/specs/ MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Doc-drift fixes (all from same session audit): - 97 → 99 detectors (CLAUDE.md, README.md prose + mermaid) - NodeKind javadoc 32 → 34 (model/NodeKind.java; was stale by two) - EdgeKind javadoc 27 → 28 (model/EdgeKind.java; was stale by one) - Test count 3,219 → 3,270 across 236 files (README.md) - All counts now in sync across CLAUDE.md, README.md, PROJECT_SUMMARY.md, docs/project/data-model.md, docs/project/conventions.md, and the source javadocs. New entries: - PROJECT_SUMMARY.md "Where to look next" gains docs/specs/ — pointer for in-flight architectural designs. - CHANGELOG.md [Unreleased] notes PROJECT_SUMMARY.md + docs/project/ deep-dives, the docs/specs/ directory, and the doc-drift fix. - docs/project/data-model.md NodeKind/EdgeKind enum lists are now exact (no truncation, no stale "still claims 32" caveat). Pre-existing IDE-detected warnings (unused imports in detector tests, deprecated Notification import in GraphStoreTopologyAndStatsTest, dead locals in GraphBuilder/GraphStore/PythonStructuresDetector etc.) are out of scope for this commit — separate cleanup PR territory. Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 21 +++++++++++++++++++ CLAUDE.md | 8 +++---- PROJECT_SUMMARY.md | 5 +++-- README.md | 10 ++++----- docs/project/conventions.md | 2 +- docs/project/data-model.md | 13 ++++++------ .../randomcodespace/iq/model/EdgeKind.java | 2 +- .../randomcodespace/iq/model/NodeKind.java | 2 +- 8 files changed, 42 insertions(+), 21 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 2faf2343..92d97562 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -41,9 +41,30 @@ for that specific tag for the per-commit details. summary of the Best Practices state, Scorecard baseline + target (≥ 8.0/10 stretch with eight checks at max), known floor reductions, and the OSS-CLI stack reference. (RAN-52 AC #7) +- `PROJECT_SUMMARY.md` (repo-root agent entry doc) and + [`docs/project/`](docs/project/) deep-dives (architecture, data-model, + build-and-run, conventions, ui, flows) — written for AI agents and humans + who need to understand and modify the codebase, every claim grounded in a + file path. Sits alongside `CLAUDE.md` (which remains the canonical + hand-maintained internals doc). +- `docs/specs/` — directory for active architectural design specs. First + entry: `2026-04-27-resolver-spi-and-java-pilot-design.md`, the design for + sub-project 1 of the "robust graph" decomposition (symbol-resolver SPI + between parse and detect, Java pilot via JavaParser's `JavaSymbolSolver`, + `Confidence` enum + `source` field on every `CodeNode` / `CodeEdge`, + 4–6 Java detectors migrated, 9 layers of aggressive testing). Implementation + in flight on `feat/sub-project-1-resolver-spi-and-java-pilot`. ### Changed +- Documentation count drift fixed: detector total updated from **97 → 99** + (live count, excluding `Abstract*` and `*Helper*`); `NodeKind` total + updated from **32 → 34** (javadoc at `model/NodeKind.java` was stale by + two entries); `EdgeKind` total updated from **27 → 28** (javadoc at + `model/EdgeKind.java` was stale by one entry). `README.md`, `CLAUDE.md`, + `PROJECT_SUMMARY.md`, `docs/project/*.md`, and the source javadocs are + now in sync. + - Branch protection on `main` requires every commit to be ssh-signed (RAN-46 AC #2). Force-pushes to `main` are rejected; squash-merge from PRs is the only path. diff --git a/CLAUDE.md b/CLAUDE.md index 69e43b87..304342ac 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -2,7 +2,7 @@ ## What This Project Is -**codeiq** -- a CLI tool + server that scans codebases to build a deterministic code knowledge graph. No AI, no external APIs -- pure static analysis. 97 detectors, 35+ languages, Neo4j Embedded graph database, Spring AI MCP server, REST API, web UI. +**codeiq** -- a CLI tool + server that scans codebases to build a deterministic code knowledge graph. No AI, no external APIs -- pure static analysis. 99 detectors, 35+ languages, Neo4j Embedded graph database, Spring AI MCP server, REST API, web UI. - **Maven coordinates:** `io.github.randomcodespace.iq:code-iq` (artifactId intentionally unchanged) - **CLI command:** `codeiq` (via `java -jar`; JAR filename remains `code-iq-*-cli.jar`) @@ -101,7 +101,7 @@ io.github.randomcodespace.iq |-- graph/ # GraphStore (Neo4j facade), GraphRepository (SDN, writes only) |-- health/ # GraphHealthIndicator (Spring Actuator) |-- mcp/ # McpTools (34 @McpTool methods, read-only, includes intelligence tools) - |-- model/ # CodeNode, CodeEdge, NodeKind (32), EdgeKind (27) + |-- model/ # CodeNode, CodeEdge, NodeKind (34), EdgeKind (28), Confidence |-- intelligence/ # Intelligence enrichment (Phase 2-5) | |-- lexical/ # LexicalEnricher, LexicalQueryService, DocCommentExtractor, SnippetStore | |-- extractor/ # LanguageEnricher, LanguageExtractor, LanguageExtractionResult @@ -328,8 +328,8 @@ mvn dependency-check:check | `analyzer/ServiceDetector.java` | Service boundary detection from build files (30+ build systems) | | `analyzer/linker/*.java` | Cross-file linkers: TopicLinker, EntityLinker, ModuleContainmentLinker | | `detector/Detector.java` | Detector interface | -| `model/NodeKind.java` | 32 node types enum | -| `model/EdgeKind.java` | 27 edge types enum | +| `model/NodeKind.java` | 34 node types enum | +| `model/EdgeKind.java` | 28 edge types enum | | `model/CodeNode.java` | Graph node entity | | `model/CodeEdge.java` | Graph edge entity | | `graph/GraphStore.java` | Neo4j facade (UNWIND bulk save, Cypher reads, indexes) | diff --git a/PROJECT_SUMMARY.md b/PROJECT_SUMMARY.md index cd22a271..b3f4d5ce 100644 --- a/PROJECT_SUMMARY.md +++ b/PROJECT_SUMMARY.md @@ -142,7 +142,7 @@ CI gate is `mvn verify` — runs unit + integration tests **plus** SpotBugs and - **SnakeYAML parses bare `on` as `Boolean.TRUE`.** Compare YAML keys with `String.valueOf(key)`, not `Boolean.TRUE.equals(key)` (SonarCloud S2159). - **Determinism gate:** every new detector needs a determinism test (run twice, assert equal output) — see existing `*DetectorTest.java` for the pattern. - **First `mvn verify` downloads ~1 GB NVD database** for OWASP dependency-check. Override locally with `-Ddependency-check.skip=true`. -- **Counts drift between `CLAUDE.md` and code:** `CLAUDE.md` says "97 detectors" / "32 NodeKinds"; the live count is **99 concrete detectors** (excluding `Abstract*` and `*Helper*`) and **34 `NodeKind` values** (`model/NodeKind.java`). The `NodeKind` javadoc itself still says "32" — stale. Update both `CLAUDE.md` and the javadoc next time someone touches them. +- **Live counts (verified 2026-04-27):** **99 concrete detectors** (excluding `Abstract*` and `*Helper*`), **34 `NodeKind` values**, **28 `EdgeKind` values**, **236 test files / 3,270 test methods**. `CLAUDE.md`, `README.md`, and the source javadocs are in sync. When adding a `NodeKind` / `EdgeKind` / detector, update the count in the source javadoc, `CLAUDE.md` (intro + package summary + key-files table), `README.md` (intro + mermaid subgraph), and this file in the same PR — drift is the default if you don't. - **Don't merge anything that fails `mvn verify`.** SpotBugs + dependency-check + tests are bound to `verify`, not `test`. ## Where to look next @@ -153,7 +153,8 @@ CI gate is `mvn verify` — runs unit + integration tests **plus** SpotBugs and - **Key flows (index→enrich→serve, MCP tool lifecycle)** → [`docs/project/flows.md`](docs/project/flows.md) - **Conventions (full)** → [`docs/project/conventions.md`](docs/project/conventions.md) - **Build & run details (Maven phases, ANTLR codegen, frontend embed)** → [`docs/project/build-and-run.md`](docs/project/build-and-run.md) -- **Internal canonical reference (32-section deep doc, hand-maintained)** → [`CLAUDE.md`](CLAUDE.md) +- **Active design specs (in-flight architectural work)** → [`docs/specs/`](docs/specs/) — currently: sub-project 1 (resolver SPI + Java pilot + confidence schema) +- **Internal canonical reference (hand-maintained)** → [`CLAUDE.md`](CLAUDE.md) - **Engineering standards / release / rollback** → [`shared/runbooks/`](shared/runbooks/) (Skipped: `docs/project/integrations.md` — codeiq makes no runtime calls to external APIs / queues. The `docs/codeiq.yml.example` schema and `shared/runbooks/release.md` cover what little external surface exists at build/release time.) diff --git a/README.md b/README.md index a1102a9a..505f43aa 100644 --- a/README.md +++ b/README.md @@ -37,13 +37,13 @@ java -jar target/code-iq-*-cli.jar serve /path/to/repo ## How It Works -codeiq scans source files using 97 detectors across 35+ languages, builds a knowledge graph of code relationships, and serves it via REST API, MCP server, and React UI. +codeiq scans source files using 99 detectors across 35+ languages, builds a knowledge graph of code relationships, and serves it via REST API, MCP server, and React UI. ```mermaid graph TD subgraph "1. Index" A[File Discovery] -->|git ls-files| B[Parsing Layer] - B -->|JavaParser / ANTLR / Regex| C[97 Detectors] + B -->|JavaParser / ANTLR / Regex| C[99 Detectors] C -->|Virtual Threads| D[Graph Builder] D --> E[(H2 Cache)] end @@ -225,7 +225,7 @@ See `docs/codeiq.yml.example` for the full schema. ```mermaid graph LR - subgraph "Node Types (32)" + subgraph "Node Types (34)" direction TB N1[service] --- N2[endpoint] N2 --- N3[class] @@ -236,7 +236,7 @@ graph LR N7 --- N8[config_file] end - subgraph "Edge Types (27)" + subgraph "Edge Types (28)" direction TB E1[calls] --- E2[imports] E2 --- E3[depends_on] @@ -265,7 +265,7 @@ All results are 100% deterministic across runs. ```bash git clone https://github.com/RandomCodeSpace/codeiq.git cd codeiq -mvn clean package # Build + test (3,219 tests) +mvn clean package # Build + test (3,270 tests across 236 files) mvn test # Tests only ``` diff --git a/docs/project/conventions.md b/docs/project/conventions.md index 4e4d6552..b83665e2 100644 --- a/docs/project/conventions.md +++ b/docs/project/conventions.md @@ -117,7 +117,7 @@ Rules to follow when modifying codeiq. Each item is grounded in an existing file ## Don't refactor (intentional non-standard choices) -- **Single-file `NodeKind` and `EdgeKind` enums.** They're long (32+/27 values) and could be split, but they're load-bearing for cross-file uniqueness and detector readability. Don't split — keeps the type surface in one diff-friendly file. See `model/NodeKind.java`, `model/EdgeKind.java`. +- **Single-file `NodeKind` and `EdgeKind` enums.** They're long (34/28 values) and could be split, but they're load-bearing for cross-file uniqueness and detector readability. Don't split — keeps the type surface in one diff-friendly file. See `model/NodeKind.java`, `model/EdgeKind.java`. - **No SDN hydration on the read path.** `graph/GraphStore.java` uses raw Cypher + `nodeFromNeo4j()` for reads; `graph/GraphRepository.java` (Spring Data Neo4j) is used **only for writes**. This is deliberate — SDN's hydration overhead was measured and rejected for the read path. Don't unify them. - **Auto-discovery via Spring `@Component` on detectors, no explicit registry.** Drop in a class, it's live. The `DetectorRegistry` exists to *introspect* the discovered set, not to register them. Don't replace with a manual registry. - **CLI profile selection in `CodeIqApplication.main` (not via Picocli's mechanism).** It's a string `if/else` on the first arg, and it pre-empts Picocli to set the Spring profile *before* the context starts. Looks ugly; works correctly. SpotBugs flagged the original duplicate branches; the current version was deliberately collapsed. diff --git a/docs/project/data-model.md b/docs/project/data-model.md index acc540c5..5fdae10a 100644 --- a/docs/project/data-model.md +++ b/docs/project/data-model.md @@ -29,7 +29,7 @@ codeiq's data model has **three storage layers**, each with its own schema and l ### `NodeKind` (enum) - **Defined in:** `model/NodeKind.java`. -- **34 concrete values** (the file's javadoc still claims "32" — known stale, see `PROJECT_SUMMARY.md` §"Gotchas"): +- **34 concrete values** (javadoc and file are in sync as of 2026-04-27): ``` MODULE, PACKAGE, CLASS, METHOD, ENDPOINT, ENTITY, REPOSITORY, QUERY, @@ -44,17 +44,16 @@ Each enum constant carries a lowercase `value` (e.g. `CLASS("class")`) used as t ### `EdgeKind` (enum) - **Defined in:** `model/EdgeKind.java`. -- **27 values** per the file's javadoc (verified count). Includes: +- **28 concrete values** (javadoc and file are in sync as of 2026-04-27): ``` DEPENDS_ON, IMPORTS, EXTENDS, IMPLEMENTS, CALLS, INJECTS, EXPOSES, -QUERIES, MAPS_TO, PRODUCES, CONSUMES, PUBLISHES, SUBSCRIBES, INVOKES_RMI, -DEFINES, CONTAINS, OVERRIDES, CONNECTS_TO, TRIGGERS, PROVISIONS, -SENDS_TO, RECEIVES_FROM, PROTECTS, RENDERS, REFERENCES_TABLE, ... +QUERIES, MAPS_TO, PRODUCES, CONSUMES, PUBLISHES, LISTENS, INVOKES_RMI, +EXPORTS_RMI, READS_CONFIG, MIGRATES, CONTAINS, DEFINES, OVERRIDES, +CONNECTS_TO, TRIGGERS, PROVISIONS, SENDS_TO, RECEIVES_FROM, PROTECTS, +RENDERS, REFERENCES_TABLE ``` -(Some values from the middle of the enum truncated in this summary — read `model/EdgeKind.java` for the authoritative list.) - ### `layer` (string property, not an enum) Every node carries a `layer` property set by `analyzer/LayerClassifier.java` to one of: `frontend`, `backend`, `infra`, `shared`, `unknown`. Classification is deterministic — based on `kind`, `framework`, and path heuristics. diff --git a/src/main/java/io/github/randomcodespace/iq/model/EdgeKind.java b/src/main/java/io/github/randomcodespace/iq/model/EdgeKind.java index 52177c79..b0bbbe62 100644 --- a/src/main/java/io/github/randomcodespace/iq/model/EdgeKind.java +++ b/src/main/java/io/github/randomcodespace/iq/model/EdgeKind.java @@ -2,7 +2,7 @@ /** * Types of edges (relationships) in the Code IQ graph. - * Mirrors the 27 edge kinds from the Python implementation. + * Mirrors the 28 edge kinds from the Python implementation. */ public enum EdgeKind { diff --git a/src/main/java/io/github/randomcodespace/iq/model/NodeKind.java b/src/main/java/io/github/randomcodespace/iq/model/NodeKind.java index f1760bb1..2230de40 100644 --- a/src/main/java/io/github/randomcodespace/iq/model/NodeKind.java +++ b/src/main/java/io/github/randomcodespace/iq/model/NodeKind.java @@ -2,7 +2,7 @@ /** * Types of nodes in the Code IQ graph. - * Mirrors the 32 node kinds from the Python implementation. + * Mirrors the 34 node kinds from the Python implementation. */ public enum NodeKind { From f16a60a232bd8d719b119ee7c356d9650448cffd Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 16:27:02 +0000 Subject: [PATCH 09/23] docs(plan): add implementation plan for sub-project 1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 42 tasks across 8 phases (schema → SPI → Java backend → pipeline wiring → configuration → detector migration → aggressive testing → docs+PR). Each task is TDD-shaped: failing test → run → minimal impl → run → commit. Lives at docs/plans/ alongside docs/specs/ (docs/superpowers/* is gitignored on this repo). Per the maintainer's "keep running" directive, execution starts immediately with subagent-driven-development for parallelizable phases and inline for foundational sequential ones. Co-Authored-By: Claude Opus 4.7 (1M context) --- ...b-project-1-resolver-spi-and-java-pilot.md | 1172 +++++++++++++++++ 1 file changed, 1172 insertions(+) create mode 100644 docs/plans/2026-04-27-sub-project-1-resolver-spi-and-java-pilot.md diff --git a/docs/plans/2026-04-27-sub-project-1-resolver-spi-and-java-pilot.md b/docs/plans/2026-04-27-sub-project-1-resolver-spi-and-java-pilot.md new file mode 100644 index 00000000..6683a650 --- /dev/null +++ b/docs/plans/2026-04-27-sub-project-1-resolver-spi-and-java-pilot.md @@ -0,0 +1,1172 @@ +# Sub-project 1 — Resolver SPI + Java Pilot Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use `superpowers:subagent-driven-development` (recommended) or `superpowers:executing-plans` to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Add a symbol-resolution stage between parse and detect, ship a Java backend wrapping JavaParser's `JavaSymbolSolver`, attach `Confidence` + `source` to every node/edge with Neo4j round-trip and an H2 cache version bump, migrate 4–6 Java detectors as proof of value, and bake in 9 layers of aggressive testing — without changing what existing detectors do. + +**Architecture:** New SPI under `intelligence/resolver/` with a per-language registry mirroring `DetectorRegistry`. The Java backend wraps JavaParser `JavaSymbolSolver` configured from sorted source roots + `ReflectionTypeSolver`. Detectors opt-in via `ctx.resolved()` returning `Optional`; existing detectors compile and behave identically when resolution is absent or disabled. + +**Tech stack:** Java 25, Spring Boot 4.0.5, JavaParser 3.28.0 + new `javaparser-symbol-solver-core`, Neo4j Embedded 2026.02.3, H2 (cache), JUnit 5 (existing test scope), `net.jqwik:jqwik` (new test scope, pending license OK), PIT mutation testing (new non-default Maven profile). + +**Reference:** Full design in [`../specs/2026-04-27-resolver-spi-and-java-pilot-design.md`](../specs/2026-04-27-resolver-spi-and-java-pilot-design.md). Read it before starting — every task here has a corresponding section in the spec. + +**Working branch:** `feat/sub-project-1-resolver-spi-and-java-pilot` (already created and ahead of `main` by the spec + doc-sync commits). + +--- + +## File Structure + +### NEW files (create) + +| Path | Responsibility | +|---|---| +| `src/main/java/io/github/randomcodespace/iq/model/Confidence.java` | Enum `LEXICAL` / `SYNTACTIC` / `RESOLVED` + numeric `score()` | +| `src/main/java/io/github/randomcodespace/iq/intelligence/resolver/SymbolResolver.java` | SPI interface | +| `src/main/java/io/github/randomcodespace/iq/intelligence/resolver/Resolved.java` | Per-file resolution result interface | +| `src/main/java/io/github/randomcodespace/iq/intelligence/resolver/EmptyResolved.java` | Singleton for "no resolution" cases | +| `src/main/java/io/github/randomcodespace/iq/intelligence/resolver/ResolutionException.java` | Wraps backend failures | +| `src/main/java/io/github/randomcodespace/iq/intelligence/resolver/ResolverRegistry.java` | Spring auto-discovery + `bootstrap(rootPath)` + `resolverFor(language)` | +| `src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSourceRootDiscovery.java` | Detect Maven/Gradle/plain source roots from a project root | +| `src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaResolved.java` | Java-specific `Resolved` carrying `JavaSymbolSolver` reference + per-CU info | +| `src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolver.java` | `@Component`, builds `CombinedTypeSolver`, resolves Java files | +| `src/test/java/io/github/randomcodespace/iq/model/ConfidenceTest.java` | Unit test | +| `src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolverRegistryTest.java` | Auto-discovery + bootstrap tests | +| `src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSourceRootDiscoveryTest.java` | Source-root discovery on synthetic layouts | +| `src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolverTest.java` | Resolver unit tests (Layer 1) — 15+ scenarios | +| `src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolverConcurrencyTest.java` | Layer 3 stress | +| `src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolverPathologicalTest.java` | Layer 4 | +| `src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolverAdversarialTest.java` | Layer 5 | +| `src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolverDeterminismTest.java` | Layer 6 | +| `src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolverPropertyTest.java` | Layer 8 (jqwik) | +| `src/test/resources/intelligence/resolver/java//...` | Synthetic Java sources for unit tests | + +### CHANGED files (modify) + +| Path | Change | +|---|---| +| `src/main/java/io/github/randomcodespace/iq/model/CodeNode.java` | Add `confidence: Confidence`, `source: String`. Round-trippable. | +| `src/main/java/io/github/randomcodespace/iq/model/CodeEdge.java` | Same as `CodeNode`. | +| `src/main/java/io/github/randomcodespace/iq/graph/GraphStore.java` | Write/read `prop_confidence`, `prop_source`. Update `nodeFromNeo4j`, `edgeFromNeo4j`. | +| `src/main/java/io/github/randomcodespace/iq/cache/AnalysisCache.java` | Bump `CACHE_VERSION` 4→5. Add columns. | +| `src/main/java/io/github/randomcodespace/iq/detector/DetectorContext.java` | Add `Optional resolved()` + builder support. | +| `src/main/java/io/github/randomcodespace/iq/detector/AbstractRegexDetector.java` | Set default `Confidence.LEXICAL` on emitted nodes/edges. | +| `src/main/java/io/github/randomcodespace/iq/detector/AbstractJavaParserDetector.java` | Set default `Confidence.SYNTACTIC`. | +| `src/main/java/io/github/randomcodespace/iq/detector/AbstractAntlrDetector.java` | Set default `Confidence.SYNTACTIC`. | +| `src/main/java/io/github/randomcodespace/iq/detector/AbstractStructuredDetector.java` | Set default `Confidence.SYNTACTIC`. | +| `src/main/java/io/github/randomcodespace/iq/analyzer/Analyzer.java` | Wire ResolverRegistry bootstrap + per-file resolve. | +| `src/main/java/io/github/randomcodespace/iq/cli/IndexCommand.java` | Mirror `Analyzer` in the H2 batched pipeline. | +| `src/main/java/io/github/randomcodespace/iq/config/CodeIqConfig.java` (or unified equivalent) | Bind new `intelligence.symbol_resolution.java.*` keys. | +| `src/main/java/io/github/randomcodespace/iq/detector/jvm/java/SpringServiceDetector.java` | Use `ctx.resolved()` for `INJECTS` edge resolution. | +| `src/main/java/io/github/randomcodespace/iq/detector/jvm/java/SpringRepositoryDetector.java` | Use `ctx.resolved()` for entity-type linking. | +| `src/main/java/io/github/randomcodespace/iq/detector/jvm/java/JpaEntityDetector.java` | Use `ctx.resolved()` for `MAPS_TO` between entities. | +| `src/main/java/io/github/randomcodespace/iq/detector/jvm/java/JpaRepositoryDetector.java` | Same as Spring repo, deeper. | +| `src/main/java/io/github/randomcodespace/iq/detector/jvm/java/KafkaListenerDetector.java` | Resolve topic constants. | +| `src/main/java/io/github/randomcodespace/iq/detector/jvm/java/SpringRestDetector.java` | Resolve `@RequestBody` types for `MAPS_TO` edges. | +| `src/test/java/io/github/randomcodespace/iq/detector/jvm/java/Test.java` | Add resolved-mode + fallback-mode + mixed-mode assertions. | +| `pom.xml` | Add `javaparser-symbol-solver-core` (latest stable matching `javaparser-core`) + `net.jqwik:jqwik` (test scope, pending license OK). PIT in non-default profile. | +| `docs/codeiq.yml.example` | Document `intelligence.symbol_resolution.java.*` keys. | +| `CHANGELOG.md` | Expand `[Unreleased]` entry once features are integrated. | +| `CLAUDE.md` | "Gotchas" addition: confidence/provenance is now mandatory; resolver pass exists; cache version 5. | +| `PROJECT_SUMMARY.md` | Tech stack + Gotchas update. | + +--- + +## How to use this plan + +- Each task is one logical commit (or small commit chain). +- Each step inside a task is 2–5 minutes and ends with verifiable output. +- Tests come first (TDD). Run them, see them fail, then implement, run them, see them pass, commit. +- Determinism tests are mandatory for every detector that gets migrated (Phase 6) and for the resolver itself (Task 30 / Layer 6). +- Frequent commits — one per task minimum, sometimes more. +- Unless noted, **all commands run from the repo root** `/home/dev/projects/codeiq`. + +**Resume rule:** if interrupted mid-task, the next session re-runs the test command from the unfinished step to confirm where it stopped, then continues. + +--- + +## Phase 1 — Schema foundation (Tasks 1–7) + +### Task 1: `Confidence` enum + +**Files:** +- Create: `src/main/java/io/github/randomcodespace/iq/model/Confidence.java` +- Test: `src/test/java/io/github/randomcodespace/iq/model/ConfidenceTest.java` + +- [ ] **Step 1: Write the failing test** + +```java +// src/test/java/io/github/randomcodespace/iq/model/ConfidenceTest.java +package io.github.randomcodespace.iq.model; + +import org.junit.jupiter.api.Test; +import static org.junit.jupiter.api.Assertions.*; + +class ConfidenceTest { + + @Test + void scoreMappingIsStable() { + assertEquals(0.6, Confidence.LEXICAL.score(), 1e-9); + assertEquals(0.8, Confidence.SYNTACTIC.score(), 1e-9); + assertEquals(0.95, Confidence.RESOLVED.score(), 1e-9); + } + + @Test + void naturalOrderingMatchesScore() { + assertTrue(Confidence.LEXICAL.compareTo(Confidence.SYNTACTIC) < 0); + assertTrue(Confidence.SYNTACTIC.compareTo(Confidence.RESOLVED) < 0); + } + + @Test + void valueOfNullIsRejected() { + assertThrows(NullPointerException.class, () -> Confidence.fromString(null)); + } + + @Test + void fromStringIsCaseInsensitive() { + assertEquals(Confidence.RESOLVED, Confidence.fromString("resolved")); + assertEquals(Confidence.RESOLVED, Confidence.fromString("RESOLVED")); + assertEquals(Confidence.LEXICAL, Confidence.fromString("LeXiCaL")); + } + + @Test + void fromStringRejectsUnknown() { + assertThrows(IllegalArgumentException.class, () -> Confidence.fromString("perfect")); + } +} +``` + +- [ ] **Step 2: Run test to verify it fails** + +```bash +mvn test -Dtest=ConfidenceTest -Dfrontend.skip=true -Ddependency-check.skip=true -q +``` + +Expected: compile error — `Confidence` does not exist. + +- [ ] **Step 3: Write minimal implementation** + +```java +// src/main/java/io/github/randomcodespace/iq/model/Confidence.java +package io.github.randomcodespace.iq.model; + +import java.util.Objects; + +/** + * Confidence in the truth of a node or edge, based on the parser pipeline that + * produced it. Lower means the assertion is from text patterns; higher means + * the assertion is backed by parsed structure or resolved symbol types. + * + *

Comparable: {@code LEXICAL} < {@code SYNTACTIC} < {@code RESOLVED}. + * + *

Numeric mapping (via {@link #score()}) is stable and intended for Cypher / + * MCP / SPA filtering. The enum itself is the authoritative form. + */ +public enum Confidence { + /** Pattern-only match (regex). */ + LEXICAL(0.6), + /** AST or parse tree, no symbol resolution. */ + SYNTACTIC(0.8), + /** Resolved via a {@code SymbolResolver}. */ + RESOLVED(0.95); + + private final double score; + + Confidence(double score) { + this.score = score; + } + + public double score() { + return score; + } + + public static Confidence fromString(String value) { + Objects.requireNonNull(value, "Confidence value must not be null"); + for (Confidence c : values()) { + if (c.name().equalsIgnoreCase(value)) { + return c; + } + } + throw new IllegalArgumentException("Unknown Confidence: " + value); + } +} +``` + +- [ ] **Step 4: Run test to verify it passes** + +```bash +mvn test -Dtest=ConfidenceTest -Dfrontend.skip=true -Ddependency-check.skip=true -q +``` + +Expected: 5/5 tests pass. + +- [ ] **Step 5: Commit** + +```bash +git add src/main/java/io/github/randomcodespace/iq/model/Confidence.java \ + src/test/java/io/github/randomcodespace/iq/model/ConfidenceTest.java +git commit -m "feat(model): add Confidence enum (LEXICAL/SYNTACTIC/RESOLVED) + +Per sub-project 1 spec §5.3. Numeric score() mapping stable (0.6/0.8/0.95). +Comparable by natural order. fromString() is case-insensitive and rejects +null + unknown values. + +Co-Authored-By: Claude Opus 4.7 (1M context) " +``` + +--- + +### Task 2: Add `confidence` + `source` to `CodeNode` + +**Files:** +- Modify: `src/main/java/io/github/randomcodespace/iq/model/CodeNode.java` +- Test: existing `CodeNodeTest.java` (or create one if missing) — add round-trip assertion via `equals`/`hashCode` + +- [ ] **Step 1: Read current `CodeNode.java`** to see its shape (record vs class, builder vs constructor). + +```bash +sed -n '1,80p' src/main/java/io/github/randomcodespace/iq/model/CodeNode.java +``` + +- [ ] **Step 2: Write failing test** + +```java +// src/test/java/io/github/randomcodespace/iq/model/CodeNodeConfidenceTest.java +package io.github.randomcodespace.iq.model; + +import org.junit.jupiter.api.Test; +import static org.junit.jupiter.api.Assertions.*; + +class CodeNodeConfidenceTest { + + @Test + void newNodeCarriesConfidenceAndSource() { + CodeNode n = CodeNode.builder() + .id("node:foo:class:Foo") + .kind(NodeKind.CLASS) + .label("Foo") + .confidence(Confidence.SYNTACTIC) + .source("MyDetector") + .build(); + assertEquals(Confidence.SYNTACTIC, n.confidence()); + assertEquals("MyDetector", n.source()); + } + + @Test + void confidenceDefaultsToLexicalIfUnset() { + CodeNode n = CodeNode.builder() + .id("node:foo:class:Foo") + .kind(NodeKind.CLASS) + .label("Foo") + .source("MyDetector") + .build(); + assertEquals(Confidence.LEXICAL, n.confidence(), + "missing confidence falls back to LEXICAL — least committal"); + } + + @Test + void sourceIsRequired() { + assertThrows(IllegalStateException.class, () -> CodeNode.builder() + .id("node:foo:class:Foo") + .kind(NodeKind.CLASS) + .label("Foo") + .build(), + "source is mandatory — every node knows which detector emitted it"); + } +} +``` + +- [ ] **Step 3: Run test to verify it fails** + +```bash +mvn test -Dtest=CodeNodeConfidenceTest -Dfrontend.skip=true -Ddependency-check.skip=true -q +``` + +Expected: compile error — `confidence(...)` and `source(...)` not on builder. + +- [ ] **Step 4: Add fields + builder methods to `CodeNode`** + +Add fields, builder setters, getter accessors, equals/hashCode coverage. Field defaults: `confidence = Confidence.LEXICAL`, `source` required (validated in builder). + +(Code shown verbatim once existing structure is read in Step 1; the change must preserve all existing tests by leaving every other field's behavior unchanged.) + +- [ ] **Step 5: Run all model tests to verify nothing else regressed** + +```bash +mvn test -Dtest='io.github.randomcodespace.iq.model.*' -Dfrontend.skip=true -Ddependency-check.skip=true -q +``` + +Expected: all green. + +- [ ] **Step 6: Commit** + +```bash +git add src/main/java/io/github/randomcodespace/iq/model/CodeNode.java \ + src/test/java/io/github/randomcodespace/iq/model/CodeNodeConfidenceTest.java +git commit -m "feat(model): add confidence + source to CodeNode + +Per sub-project 1 spec §5.2. Both fields non-null. Confidence defaults to +LEXICAL (least committal). Source is mandatory — every node knows which +detector emitted it. + +Co-Authored-By: Claude Opus 4.7 (1M context) " +``` + +--- + +### Task 3: Add `confidence` + `source` to `CodeEdge` + +Same shape as Task 2, but on `CodeEdge`. Mirror the test class as `CodeEdgeConfidenceTest`. Same builder semantics. + +- [ ] **Step 1: Read current `CodeEdge.java`** +- [ ] **Step 2: Write failing test (`CodeEdgeConfidenceTest`)** — mirror Task 2's three test cases on `CodeEdge.builder()`. +- [ ] **Step 3: Run + see failure.** +- [ ] **Step 4: Add fields + builder methods.** +- [ ] **Step 5: Run all model tests.** +- [ ] **Step 6: Commit:** `feat(model): add confidence + source to CodeEdge`. + +--- + +### Task 4: Round-trip `confidence` + `source` through Neo4j (write path) + +**Files:** +- Modify: `src/main/java/io/github/randomcodespace/iq/graph/GraphStore.java` +- Test: `src/test/java/io/github/randomcodespace/iq/graph/GraphStoreConfidenceRoundTripTest.java` (new) + +- [ ] **Step 1: Write the failing test.** + +```java +// src/test/java/io/github/randomcodespace/iq/graph/GraphStoreConfidenceRoundTripTest.java +package io.github.randomcodespace.iq.graph; + +import io.github.randomcodespace.iq.model.*; +import org.junit.jupiter.api.*; +import org.junit.jupiter.api.io.TempDir; +import java.nio.file.Path; +import java.util.List; +import static org.junit.jupiter.api.Assertions.*; + +class GraphStoreConfidenceRoundTripTest { + + @TempDir Path tmp; + GraphStore store; + + @BeforeEach void setup() { store = GraphStore.openEmbedded(tmp.resolve("graph.db")); } + @AfterEach void close() { store.close(); } + + @Test + void confidenceAndSourceRoundTrip() { + CodeNode in = CodeNode.builder() + .id("node:Foo.java:class:Foo") + .kind(NodeKind.CLASS).label("Foo") + .confidence(Confidence.RESOLVED).source("SpringServiceDetector") + .build(); + store.bulkSave(List.of(in), List.of()); + + CodeNode out = store.findById("node:Foo.java:class:Foo").orElseThrow(); + assertEquals(Confidence.RESOLVED, out.confidence()); + assertEquals("SpringServiceDetector", out.source()); + } +} +``` + +- [ ] **Step 2: Run; verify compile or assertion fail.** + +```bash +mvn test -Dtest=GraphStoreConfidenceRoundTripTest -Dfrontend.skip=true -Ddependency-check.skip=true -q +``` + +Expected: assertion fails (fields written via existing path don't include confidence/source). + +- [ ] **Step 3: Update `GraphStore.bulkSave` to write `prop_confidence` and `prop_source`**, and `nodeFromNeo4j` / `edgeFromNeo4j` to read them. Defaults if missing in Neo4j: `Confidence.LEXICAL` and `"unknown"`. + +- [ ] **Step 4: Run round-trip test; verify pass.** +- [ ] **Step 5: Run wider GraphStore test suite to ensure no regression.** + +```bash +mvn test -Dtest='io.github.randomcodespace.iq.graph.*' -Dfrontend.skip=true -Ddependency-check.skip=true -q +``` + +- [ ] **Step 6: Commit:** `feat(graph): round-trip confidence + source through Neo4j`. + +--- + +### Task 5: H2 cache schema migration to v5 + +**Files:** +- Modify: `src/main/java/io/github/randomcodespace/iq/cache/AnalysisCache.java` +- Test: existing `AnalysisCacheTest.java` (extend) + new round-trip case. + +- [ ] **Step 1: Failing test.** Add `confidence` and `source` columns to the SCHEMA_SQL `nodes` and `edges` tables. Failing assertion: `cache.put(file, [node with confidence=RESOLVED]); cache.get(file).confidence == RESOLVED`. + +- [ ] **Step 2: Run; see fail.** +- [ ] **Step 3: Bump `CACHE_VERSION` 4→5. Add columns. Update INSERT/SELECT statements. Update Jackson serialization helpers if used.** +- [ ] **Step 4: Run cache tests; verify all pass.** +- [ ] **Step 5: Commit:** `feat(cache): bump CACHE_VERSION to 5; add confidence + source columns`. + +--- + +### Task 6: Default `Confidence` per detector base class + +**Files:** +- Modify: `AbstractRegexDetector.java`, `AbstractJavaParserDetector.java`, `AbstractAntlrDetector.java`, `AbstractStructuredDetector.java`, `AbstractPythonAntlrDetector.java`, `AbstractTypeScriptDetector.java`, `AbstractJavaMessagingDetector.java`, `AbstractPythonDbDetector.java`. +- Test: a synthetic `BaseClassConfidenceDefaultTest.java` per base class (or a single parameterized test). + +- [ ] **Step 1: Failing parameterized test.** Subclass each base, emit a node with no explicit confidence, assert it carries the expected default (LEXICAL for regex, SYNTACTIC for AST/ANTLR/structured/python-antlr/typescript/messaging/python-db). +- [ ] **Step 2: Run; see fail (currently always LEXICAL or null).** +- [ ] **Step 3: Add a `defaultConfidence()` method on each base class returning the matching enum. Make `addNode`/`addEdge` helpers stamp it when not explicitly set.** +- [ ] **Step 4: Run; verify pass.** +- [ ] **Step 5: Run full detector suite to ensure no regression.** + +```bash +mvn test -Dtest='io.github.randomcodespace.iq.detector.*' -Dfrontend.skip=true -Ddependency-check.skip=true -q +``` + +- [ ] **Step 6: Commit:** `feat(detector): set Confidence default per base class`. + +--- + +### Task 7: Snapshot-test refresh (one-time) + +JSON-snapshot or golden-file tests will now include the additive `confidence` and `source` fields. Acceptance criterion §13 #3 in the spec requires the diff is limited to those two fields per record. + +- [ ] **Step 1: Run full test suite, capture failures.** + +```bash +mvn test -Dfrontend.skip=true -Ddependency-check.skip=true -q -DfailIfNoTests=false 2>&1 | tee /tmp/snapshot-failures.log +``` + +- [ ] **Step 2: For each snapshot diff, verify the diff is only the two additive fields.** If anything else changed, that's a bug — fix it before refreshing the snapshot. + +- [ ] **Step 3: Refresh snapshots one file at a time with separate commits per file** (so reviewers can diff cleanly). + +- [ ] **Step 4: Run full suite; expect green.** +- [ ] **Step 5: Commit each snapshot refresh:** `chore(test): refresh snapshot for confidence + source fields`. + +--- + +## Phase 2 — SPI scaffolding (Tasks 8–13) + +### Task 8: `Resolved` interface + `EmptyResolved` singleton + +**Files:** +- Create: `intelligence/resolver/Resolved.java`, `intelligence/resolver/EmptyResolved.java` +- Test: `ResolvedContractTest.java` + +- [ ] **Step 1: Failing test.** + +```java +// src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolvedContractTest.java +package io.github.randomcodespace.iq.intelligence.resolver; + +import io.github.randomcodespace.iq.model.Confidence; +import org.junit.jupiter.api.Test; +import static org.junit.jupiter.api.Assertions.*; + +class ResolvedContractTest { + + @Test + void emptyResolvedIsSingleton() { + assertSame(EmptyResolved.INSTANCE, EmptyResolved.INSTANCE); + } + + @Test + void emptyResolvedHasLexicalConfidence() { + assertEquals(Confidence.LEXICAL, EmptyResolved.INSTANCE.sourceConfidence()); + } + + @Test + void emptyResolvedReportsUnsupported() { + assertFalse(EmptyResolved.INSTANCE.isAvailable()); + } +} +``` + +- [ ] **Step 2: Run; see fail.** +- [ ] **Step 3: Implement** `Resolved` (interface with `boolean isAvailable()`, `Confidence sourceConfidence()`, plus language-specific extension points to be added by `JavaResolved`) and `EmptyResolved.INSTANCE` (always returns `false` / `LEXICAL`). +- [ ] **Step 4: Run; pass.** +- [ ] **Step 5: Commit:** `feat(resolver): add Resolved interface + EmptyResolved singleton`. + +--- + +### Task 9: `ResolutionException` + +- [ ] **Step 1: Failing test:** assert `ResolutionException` carries the file path and language fields. +- [ ] **Step 2: Run; see fail.** +- [ ] **Step 3: Implement** as a checked exception (subclass `Exception`) with `Path file()`, `String language()`. +- [ ] **Step 4: Pass.** +- [ ] **Step 5: Commit:** `feat(resolver): add ResolutionException`. + +--- + +### Task 10: `SymbolResolver` interface + +```java +// src/main/java/io/github/randomcodespace/iq/intelligence/resolver/SymbolResolver.java +package io.github.randomcodespace.iq.intelligence.resolver; + +import io.github.randomcodespace.iq.analyzer.DiscoveredFile; +import java.nio.file.Path; +import java.util.Set; + +public interface SymbolResolver { + Set getSupportedLanguages(); + void bootstrap(Path projectRoot) throws ResolutionException; + Resolved resolve(DiscoveredFile file, Object parsedAst) throws ResolutionException; + default void shutdown() {} +} +``` + +- [ ] **Step 1: Failing contract test** — assert any concrete implementation (start with a stub) honors `getSupportedLanguages()` returning a non-empty `Set` and `resolve(...)` returning non-null. +- [ ] **Step 2: Run; see fail.** +- [ ] **Step 3: Implement** the interface as shown. +- [ ] **Step 4: Pass.** +- [ ] **Step 5: Commit:** `feat(resolver): add SymbolResolver SPI`. + +--- + +### Task 11: `ResolverRegistry` Spring bean + +**Files:** +- Create: `intelligence/resolver/ResolverRegistry.java` +- Test: `ResolverRegistryTest.java` + +- [ ] **Step 1: Failing test.** Two `@Component` stub resolvers (`JavaStubResolver` for `"java"`, `TsStubResolver` for `"typescript"`). Wire via `@SpringBootTest(classes=...)`. Assert `registry.resolverFor("java")` is the Java stub; unknown language returns a no-op (returns `EmptyResolved`); `bootstrap(rootPath)` calls bootstrap on every registered resolver exactly once. + +- [ ] **Step 2: Run; see fail.** + +- [ ] **Step 3: Implement** `ResolverRegistry` as a `@Component` that takes `List` via constructor injection, builds a `Map` keyed by lowercase language. `resolverFor(String language)` returns matching or a default that emits `EmptyResolved`. `bootstrap(rootPath)` iterates resolvers in alphabetical order by class simple name (determinism), calling each. + +- [ ] **Step 4: Pass.** + +- [ ] **Step 5: Commit:** `feat(resolver): add ResolverRegistry with auto-discovery`. + +--- + +### Task 12: `DetectorContext.resolved()` accessor + +**Files:** +- Modify: `detector/DetectorContext.java` +- Test: existing `DetectorContextTest.java` (or new) + assertion that legacy detectors still compile. + +- [ ] **Step 1: Failing test.** Build a `DetectorContext` with `.resolved(EmptyResolved.INSTANCE)`; assert the accessor returns it. Also assert default returns `Optional.empty()`. + +- [ ] **Step 2: Run; see fail.** + +- [ ] **Step 3: Add field + builder method + accessor**, additive (default `Optional.empty()`). + +- [ ] **Step 4: Run all detector tests** to confirm legacy detectors still compile and behave identically. + +```bash +mvn test -Dtest='io.github.randomcodespace.iq.detector.*' -Dfrontend.skip=true -Ddependency-check.skip=true -q +``` + +- [ ] **Step 5: Commit:** `feat(detector): add Optional accessor to DetectorContext`. + +--- + +### Task 13: Sanity build + +- [ ] **Step 1: Compile + run all model + resolver + detector tests.** + +```bash +mvn test -Dtest='io.github.randomcodespace.iq.{model,intelligence.resolver,detector}.*' \ + -Dfrontend.skip=true -Ddependency-check.skip=true -q +``` + +- [ ] **Step 2: Confirm green; if not, fix the smallest possible failure before moving on.** + +- [ ] **Step 3: Commit (only if any cleanup landed):** `chore: sanity build after Phase 2`. + +--- + +## Phase 3 — Java backend (Tasks 14–18) + +### Task 14: Add `javaparser-symbol-solver-core` dep + +**Files:** +- Modify: `pom.xml` + +- [ ] **Step 1: Resolve the latest stable version compatible with `javaparser-core` 3.28.0.** Use `context7` MCP first; fall back to Maven Central via `ctx_fetch_and_index`. + +- [ ] **Step 2: Add the dependency** to the `` block in `pom.xml`. Pin the version explicitly. Note: JavaParser publishes both core and symbol-solver from the same release train — they should share the same version. + +```xml + + com.github.javaparser + javaparser-symbol-solver-core + ${javaparser.version} + +``` + +(Add a `3.28.0` property if not already present; reuse the existing version everywhere.) + +- [ ] **Step 3: Run dependency check.** + +```bash +mvn dependency:tree -Dincludes=com.github.javaparser -Dfrontend.skip=true -Ddependency-check.skip=true +``` + +Expected: `javaparser-core` and `javaparser-symbol-solver-core` both at the pinned version. + +- [ ] **Step 4: Verify license** is Apache-2.0 (it is, but check `mvn dependency:tree` doesn't pull GPL/AGPL transitives). + +- [ ] **Step 5: Compile.** + +```bash +mvn test-compile -Dfrontend.skip=true -Ddependency-check.skip=true -q +``` + +- [ ] **Step 6: Commit:** `chore(deps): add javaparser-symbol-solver-core `. + +--- + +### Task 15: `JavaSourceRootDiscovery` + +**Files:** +- Create: `intelligence/resolver/java/JavaSourceRootDiscovery.java` +- Test: `JavaSourceRootDiscoveryTest.java` with synthetic dir layouts via `@TempDir`. + +- [ ] **Step 1: Failing test.** Cover: + - Maven single-module: `/pom.xml`, `src/main/java`, `src/test/java` → returns sorted `[src/main/java, src/test/java]`. + - Maven multi-module: root `pom.xml` with `service-a` + `service-b`; each has `src/main/java`. Returns sorted union. + - Gradle (`build.gradle.kts` or `build.gradle`): same `src/main/java` convention. + - Plain layout: just `src/` without Maven/Gradle markers — returns `[src/]` if it has `*.java`. + - Empty project (no Java): returns empty list, no exception. + - Symlink loop in tree: terminates without exception. + +```java +@Test void mavenSingleModule(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src/main/java")); + Files.createDirectories(tmp.resolve("src/test/java")); + Files.writeString(tmp.resolve("pom.xml"), ""); + var roots = new JavaSourceRootDiscovery().discover(tmp); + assertEquals(List.of(tmp.resolve("src/main/java"), tmp.resolve("src/test/java")), roots); +} +``` + +- [ ] **Step 2: Run; see fail.** +- [ ] **Step 3: Implement** discovery using `Files.walk` with depth limits. Return `List` sorted alphabetically. Idempotent. +- [ ] **Step 4: Run all 6+ scenarios; verify pass.** +- [ ] **Step 5: Commit:** `feat(resolver/java): add JavaSourceRootDiscovery (Maven/Gradle/plain auto-detect)`. + +--- + +### Task 16: `JavaResolved` record + +**Files:** +- Create: `intelligence/resolver/java/JavaResolved.java` +- Test: `JavaResolvedTest.java` + +- [ ] **Step 1: Failing test.** Construct a `JavaResolved` with a stub `JavaSymbolSolver` and a parsed `CompilationUnit`. Assert `isAvailable() == true`, `sourceConfidence() == RESOLVED`, exposes `.cu()` and `.solver()`. + +- [ ] **Step 2: Run; see fail.** + +- [ ] **Step 3: Implement** as a `record JavaResolved(CompilationUnit cu, JavaSymbolSolver solver) implements Resolved`. `isAvailable() = true`. `sourceConfidence() = Confidence.RESOLVED`. + +- [ ] **Step 4: Pass.** + +- [ ] **Step 5: Commit:** `feat(resolver/java): add JavaResolved record`. + +--- + +### Task 17: `JavaSymbolResolver` (`@Component`) + +**Files:** +- Create: `intelligence/resolver/java/JavaSymbolResolver.java` +- Test: covered by Task 18 (unit tests) and Task 30+ (aggressive layers). + +- [ ] **Step 1: Failing skeleton test.** + +```java +@Test void supportsJava() { + var r = new JavaSymbolResolver(new JavaSourceRootDiscovery()); + assertEquals(Set.of("java"), r.getSupportedLanguages()); +} + +@Test void bootstrapBuildsCombinedTypeSolver(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src/main/java")); + Files.writeString(tmp.resolve("pom.xml"), ""); + var r = new JavaSymbolResolver(new JavaSourceRootDiscovery()); + r.bootstrap(tmp); + assertNotNull(r.combinedTypeSolver()); +} +``` + +- [ ] **Step 2: Run; see fail.** + +- [ ] **Step 3: Implement.** + +```java +@Component +public class JavaSymbolResolver implements SymbolResolver { + private final JavaSourceRootDiscovery discovery; + private CombinedTypeSolver combined; + private JavaSymbolSolver solver; + + public JavaSymbolResolver(JavaSourceRootDiscovery discovery) { + this.discovery = discovery; + } + + @Override public Set getSupportedLanguages() { return Set.of("java"); } + + @Override + public void bootstrap(Path projectRoot) throws ResolutionException { + try { + CombinedTypeSolver cts = new CombinedTypeSolver(); + cts.add(new ReflectionTypeSolver()); + for (Path root : discovery.discover(projectRoot)) { + cts.add(new JavaParserTypeSolver(root.toFile())); + } + this.combined = cts; + this.solver = new JavaSymbolSolver(cts); + // Configure JavaParser default ParserConfiguration so any subsequent parse + // benefits from the solver — but allow per-parse override for tests. + StaticJavaParser.getParserConfiguration().setSymbolResolver(this.solver); + } catch (Exception e) { + throw new ResolutionException("bootstrap failed for " + projectRoot, e, projectRoot, "java"); + } + } + + @Override + public Resolved resolve(DiscoveredFile file, Object parsedAst) throws ResolutionException { + if (!"java".equalsIgnoreCase(file.language())) return EmptyResolved.INSTANCE; + if (!(parsedAst instanceof CompilationUnit cu)) return EmptyResolved.INSTANCE; + if (this.solver == null) return EmptyResolved.INSTANCE; + return new JavaResolved(cu, solver); + } + + public CombinedTypeSolver combinedTypeSolver() { return combined; } +} +``` + +- [ ] **Step 4: Pass.** +- [ ] **Step 5: Commit:** `feat(resolver/java): add JavaSymbolResolver`. + +--- + +### Task 18: `JavaSymbolResolverTest` — Layer 1 (resolver unit tests) + +**Files:** +- Create: `JavaSymbolResolverTest.java` +- Create: synthetic Java sources under `src/test/resources/intelligence/resolver/java//`. + +Cover all 15+ scenarios from spec §12 layer 1: empty file, single class, generics deep nesting, inner classes (static/non-static/anonymous/local), lambdas, records, sealed, enum-with-methods, interface-with-default, abstract, annotations, imports (explicit/static/wildcard/missing/unused), cyclic imports, same-named-classes-different-packages, JDK symbol, multi-source-root cross-reference. + +- [ ] **Step 1: For each scenario, write the synthetic source file** under `src/test/resources/intelligence/resolver/java//Foo.java` (or multiple files where needed) with a `README.md` describing intent (one paragraph). + +- [ ] **Step 2: Write the failing test class** (one `@Test` per scenario, named `resolves`). + +- [ ] **Step 3: Run; see fail.** + +- [ ] **Step 4: Verify fixtures alone are valid Java** by compiling them with `javac`; fix any syntax errors. + +- [ ] **Step 5: Run resolver tests; iteratively fix any unexpected resolver behavior.** + +- [ ] **Step 6: Commit (after each batch of ~5 scenarios passes):** `test(resolver/java): add Layer 1 scenarios `. + +--- + +## Phase 4 — Pipeline wiring (Tasks 19–21) + +### Task 19: Wire `ResolverRegistry` into `Analyzer.run()` + +- [ ] **Step 1: Failing test** (`AnalyzerResolverWiringTest`): assert `Analyzer.run(rootPath)` calls `registry.bootstrap(rootPath)` exactly once before any file is processed. + +- [ ] **Step 2: Run; fail.** + +- [ ] **Step 3: Inject `ResolverRegistry` into `Analyzer` (constructor injection, additive).** Add the bootstrap call at the top of `run()`. Order: discovery → resolver bootstrap → file iteration. (Discovery first so we know there's something to scan.) + +- [ ] **Step 4: Pass.** + +- [ ] **Step 5: Commit:** `feat(analyzer): bootstrap ResolverRegistry once per run`. + +--- + +### Task 20: Wire per-file resolution into the file-iteration loop + +- [ ] **Step 1: Failing test:** assert that for each file, `registry.resolverFor(file.language()).resolve(...)` is called and the returned `Resolved` is set on the `DetectorContext`. + +- [ ] **Step 2: Fail.** + +- [ ] **Step 3: Update the file-iteration block in `Analyzer`** to call `registry.resolverFor(file.language()).resolve(file, parsedAst)` and stuff the result into `DetectorContext.builder().resolved(...)`. Catch `ResolutionException` per file (log DEBUG, fall back to `EmptyResolved`). + +- [ ] **Step 4: Pass.** + +- [ ] **Step 5: Commit:** `feat(analyzer): per-file symbol resolution wired into pipeline`. + +--- + +### Task 21: Mirror in `IndexCommand` + +`IndexCommand` has its own batched H2 pipeline that's not entirely shared with `Analyzer`. Mirror the resolver bootstrap + per-file resolve path there. + +- [ ] **Step 1: Failing test** (`IndexCommandResolverWiringTest`). +- [ ] **Step 2: Fail.** +- [ ] **Step 3: Update `IndexCommand` similarly** — same constructor injection of `ResolverRegistry`, same call shape. +- [ ] **Step 4: Pass.** +- [ ] **Step 5: Commit:** `feat(cli): wire ResolverRegistry into IndexCommand`. + +--- + +## Phase 5 — Configuration (Tasks 22–23) + +### Task 22: `intelligence.symbol_resolution.java.*` config keys + +- [ ] **Step 1: Failing test** (`UnifiedConfigResolverKeysTest`): assert config object after parsing the example YAML carries `enabled = true`, `sourceRoots = "auto"`, `jdkReflection = true`, `bootstrapTimeoutSeconds = 30`, `maxPerFileResolveMs = 500`. + +- [ ] **Step 2: Fail.** + +- [ ] **Step 3: Add the new section + binding code** in unified config + `CodeIqConfig` legacy bridge (per `UnifiedConfigBeans`). + +- [ ] **Step 4: Pass.** + +- [ ] **Step 5: Commit:** `feat(config): add intelligence.symbol_resolution.java.* keys`. + +--- + +### Task 23: Document the keys in `docs/codeiq.yml.example` + +- [ ] **Step 1: Add the YAML block** matching spec §7 verbatim. +- [ ] **Step 2: Run `codeiq config validate`** against the example file (after building the JAR if needed) to confirm it parses. +- [ ] **Step 3: Commit:** `docs(config): document intelligence.symbol_resolution.java.* keys`. + +--- + +## Phase 6 — Detector migration (Tasks 24–29) + +Each migration follows the same TDD pattern. Concrete code differs per detector, but the test scaffolding is identical. + +### Task pattern (apply to each detector below) + +For detector `Detector`: + +- [ ] **Step 1: Read current detector + test** so you have the existing edge logic in context. + +```bash +sed -n '1,200p' src/main/java/io/github/randomcodespace/iq/detector/jvm/java/Detector.java +``` + +- [ ] **Step 2: Add three new test methods to `DetectorTest`:** + - `resolvedModeProducesResolvedEdge` — feed a fixture where the receiver type would be ambiguous lexically; with resolved context, assert edge target is the *correct* node ID. + - `fallbackModeMatchesPreSpecBaseline` — `ctx.resolved() == Optional.empty()`; assert logical-content output identical to the baseline (modulo additive fields). + - `mixedModeUsesResolverWhereAvailable` — half the files have resolved context, half don't; assert per-file confidence labelling. + +- [ ] **Step 3: Run; see fails.** + +- [ ] **Step 4: Update the detector to:** + - Accept `ctx.resolved()` as `Optional`. + - When present and is `JavaResolved`, use `solver` to resolve receiver types / generic args / referenced classes for the specific edges relevant to this detector. + - Stamp `Confidence.RESOLVED` on resolved-mode edges; existing path stamps base-class default. + +- [ ] **Step 5: Run all `DetectorTest`; verify pass + no regression.** + +- [ ] **Step 6: Run determinism case** (run detector twice on same input, assert byte-identical output). + +- [ ] **Step 7: Commit:** `feat(detector/): use resolved symbol info for `. + +### Task 24: `SpringServiceDetector` migration + +- Resolves `@Autowired UserService userService` to the actual `UserService` class node ID. +- Edge: `INJECTS` from the consumer class to the declared `UserService` type. +- Fixture: two `UserService` classes in different packages; assert resolution picks the imported one. + +### Task 25: `SpringRepositoryDetector` migration + +- Resolves the entity type parameter on `JpaRepository`. +- Edge: `MAPS_TO` from repository interface to the resolved entity class. + +### Task 26: `JpaEntityDetector` migration + +- Resolves generic args on `@OneToMany List`. +- Edge: `MAPS_TO` between entities (the holder and the related entity). + +### Task 27: `JpaRepositoryDetector` migration + +- Same as Spring repo, deeper. Resolves derived-query method-name return types where applicable (less reliable; flag as `Confidence.SYNTACTIC` if resolution is partial). + +### Task 28: `KafkaListenerDetector` migration + +- Resolves `@KafkaListener(topics = TOPIC_CONST)` where `TOPIC_CONST` is a static field — produce edges to the resolved topic name. +- Edge: `LISTENS` to the topic node. + +### Task 29: `SpringRestDetector` migration + +- Resolves `@RequestBody UserDto dto` and `@PathVariable` types. +- Edge: `MAPS_TO` from endpoint node to the resolved DTO class. + +--- + +## Phase 7 — Aggressive testing layers (Tasks 30–38) + +### Task 30: Layer 6 — Determinism (resolver-stage) + +**Files:** +- Create: `JavaSymbolResolverDeterminismTest.java` + +- [ ] **Step 1: Failing test.** Run the resolver twice against the same fixture; assert byte-identical serialized `Resolved` output (use Jackson with stable ordering). + +- [ ] **Step 2: Fail.** + +- [ ] **Step 3: Confirm resolver implementation already sorts source roots, uses `TreeMap` etc. — fix if not.** + +- [ ] **Step 4: Pass.** + +- [ ] **Step 5: Add the second variant: source roots passed in different order, same output.** + +- [ ] **Step 6: Commit:** `test(resolver/java): determinism — Layer 6`. + +--- + +### Task 31: Layer 3 — Concurrency stress + +**Files:** +- Create: `JavaSymbolResolverConcurrencyTest.java` + +- [ ] **Step 1: Generate 1000 synthetic Java files** in `@TempDir` (one class each, distinct names). Single source root. + +- [ ] **Step 2: Failing test:** resolve all 1000 files via virtual-thread fan-out; assert no exceptions, no duplicate node IDs in the union of `Resolved` outputs, total time within 2× the sequential baseline. + +- [ ] **Step 3: Fail/pass.** If fail, investigate (likely: bootstrap not idempotent under concurrent first-call). Add a `synchronized`/`volatile` initialization guard. + +- [ ] **Step 4: Add invocation-count test** — bootstrap is called exactly once even under N concurrent first-callers. + +- [ ] **Step 5: Commit:** `test(resolver/java): concurrency stress — Layer 3`. + +--- + +### Task 32: Layer 4 — Memory / pathological + +**Files:** +- Create: `JavaSymbolResolverPathologicalTest.java` + +- [ ] **Step 1: Generate fixtures** (synthesizable in setup): + - 10K-line class with mostly trivial methods. + - File with 1000 imports (most unresolvable). + - 10-deep generic nesting. + +- [ ] **Step 2: Failing tests under `-Xmx512m`** (set via Surefire config in pom). + +- [ ] **Step 3: Run; pass or fix.** Likely passes; if not, investigate JavaSymbolSolver's caching footprint. + +- [ ] **Step 4: Add timeout assertion** — each pathological case completes within `max_per_file_resolve_ms`. + +- [ ] **Step 5: Commit:** `test(resolver/java): pathological inputs — Layer 4`. + +--- + +### Task 33: Layer 5 — Adversarial + +- [ ] **Step 1:** Cover the spec §12 layer 5 cases: syntax-error file, mis-tagged language, mixed source root, ReflectionTypeSolver disabled (config flag). +- [ ] **Step 2:** Run; fix. +- [ ] **Step 3: Commit:** `test(resolver/java): adversarial inputs — Layer 5`. + +--- + +### Task 34: Layer 7 — E2E petclinic regression + +**Files:** +- Modify: existing `E2EQualityTest` (extend) or create `E2EQualityResolverTest`. + +- [ ] **Step 1: Capture baseline numbers.** Run `E2EQualityTest` with `intelligence.symbol_resolution.java.enabled=false`. Record edge precision/recall against `src/test/resources/e2e/ground-truth-petclinic.json`. Save to a baseline JSON checked into the test resources. + +- [ ] **Step 2: Run with `enabled=true`. Record post-change numbers.** + +- [ ] **Step 3: Failing assertion:** `precision_after > precision_before AND recall_after >= recall_before` (improvement on at least one, no regression on the other). + +- [ ] **Step 4: If precision/recall didn't move: investigate why.** Likely the migrated detectors aren't producing the expected resolved edges yet — go back to Phase 6 and fix. + +- [ ] **Step 5: Commit:** `test(e2e): petclinic resolver-mode improvement gate — Layer 7`. + +--- + +### Task 35: Layer 8 — Property-based (jqwik) — license check first + +- [ ] **Step 1: License check.** jqwik is EPL-2.0. Per `~/.claude/rules/dependencies.md` it's not on the preferred (MIT/Apache/BSD) list. **Ask the user explicitly before adding.** If declined, write hand-rolled randomized generators using existing JUnit + `java.util.Random` instead. + +- [ ] **Step 2: If approved, add jqwik to `pom.xml`** at test scope. Resolve latest stable via `context7`. + +- [ ] **Step 3: Failing properties:** + - `forall valid_java_source: resolver does not throw unchecked` (only `ResolutionException`). + - `forall valid_java_source: resolver terminates within max_per_file_resolve_ms`. + - `forall valid_java_source × file_in_unrelated_root: editing file_in_unrelated_root does not change resolution of valid_java_source`. + +- [ ] **Step 4: Run; iterate.** + +- [ ] **Step 5: Commit:** `test(resolver/java): property-based — Layer 8`. + +--- + +### Task 36: Layer 9 — PIT mutation testing (non-gating profile) + +- [ ] **Step 1: Add PIT plugin to `pom.xml` under a non-default profile** `mutation`. + +```xml + + mutation + + + + org.pitest + pitest-maven + 1.18.0 + + + io.github.randomcodespace.iq.intelligence.resolver.* + io.github.randomcodespace.iq.model.Confidence + + + + + + +``` + +- [ ] **Step 2: Run** `mvn -P mutation pitest:mutationCoverage -Dfrontend.skip=true -Ddependency-check.skip=true`. + +- [ ] **Step 3: Inspect the mutation kill rate.** Target ≥ 80% on the new packages. If lower, add focused tests until the rate clears 80%. + +- [ ] **Step 4: Commit:** `test(resolver): mutation testing profile (PIT) — Layer 9`. + +--- + +### Task 37: Aggregate test gate + +- [ ] **Step 1: Run full `mvn test` with both config states.** + +```bash +# enabled=false +CODEIQ_INTELLIGENCE_SYMBOL_RESOLUTION_JAVA_ENABLED=false \ + mvn test -Dfrontend.skip=true -Ddependency-check.skip=true + +# enabled=true (default) +mvn test -Dfrontend.skip=true -Ddependency-check.skip=true +``` + +- [ ] **Step 2: Fix any unexpected failure.** + +- [ ] **Step 3: Run `mvn verify` for the security gate** (this downloads NVD on first run — allow ~10 min). + +```bash +mvn verify -Dfrontend.skip=true +``` + +- [ ] **Step 4: Commit:** `test: aggregate gate green for sub-project 1`. + +--- + +### Task 38: Performance gate + +- [ ] **Step 1: Time `index` against `spring-petclinic`.** + +```bash +time java -jar target/code-iq-*-cli.jar index $E2E_PETCLINIC_DIR +``` + +Compare to the pre-change baseline (run on `main` once, before this branch's first impl commit landed). Acceptance: bootstrap < 10 s; per-Java-file resolve median ≤ 200 ms; total Java analysis time ≤ +60% of baseline. + +- [ ] **Step 2: If exceeded, profile** with `async-profiler` or VisualVM. Fix the regression. (Spec §9 documents the budget; exceeding it without justification is a bug.) + +- [ ] **Step 3: Record numbers in PR description.** + +- [ ] **Step 4: No commit needed unless a fix landed.** + +--- + +## Phase 8 — Doc updates + PR (Tasks 39–42) + +### Task 39: Expand `CHANGELOG.md` `[Unreleased]` entry + +- [ ] **Step 1: Add an `### Added` bullet** under `[Unreleased]` describing the resolver SPI, Java pilot, confidence/provenance schema, cache-version bump, migrated detectors. Cross-reference the spec at `docs/specs/2026-04-27-resolver-spi-and-java-pilot-design.md`. + +- [ ] **Step 2: Add a `### Changed` bullet** noting `CACHE_VERSION` 4 → 5 (one-time cache rebuild on first run after upgrade). + +- [ ] **Step 3: Commit:** `docs(changelog): add sub-project 1 entry`. + +--- + +### Task 40: `CLAUDE.md` Gotchas update + +- [ ] **Step 1: Add bullets:** + - Confidence + source are now mandatory on every node/edge — base classes set defaults; detectors override to `RESOLVED` when consuming `ctx.resolved()`. + - The pipeline now has a resolve stage between parse and detect. Profile selection unchanged. + - `CACHE_VERSION` is 5 — bumping invalidates all existing `.codeiq/cache/` dirs on first run. + - `intelligence.symbol_resolution.java.enabled=false` is the off-switch for raw-speed scans or backward-compat snapshots. + +- [ ] **Step 2: Commit:** `docs(claude): gotchas for sub-project 1`. + +--- + +### Task 41: `PROJECT_SUMMARY.md` updates + +- [ ] **Step 1: Tech-stack row addition:** `| AST + symbols | JavaParser 3.28.0 + javaparser-symbol-solver-core | pom.xml |`. + +- [ ] **Step 2: Gotchas updates:** mention `Confidence`, the resolve stage, the `CACHE_VERSION` bump. + +- [ ] **Step 3: Commit:** `docs(summary): note resolver pipeline + Confidence schema`. + +--- + +### Task 42: Push branch + open PR + +- [ ] **Step 1: Push branch** to `origin`. + +```bash +git push -u origin feat/sub-project-1-resolver-spi-and-java-pilot +``` + +- [ ] **Step 2: Open PR via `gh`.** + +```bash +gh pr create --title "feat: sub-project 1 — resolver SPI + Java pilot + confidence schema" \ + --body "$(cat <<'EOF' +## Summary +- Symbol-resolution stage between parse and detect, per-language `SymbolResolver` SPI auto-discovered as Spring `@Component`s. +- Java backend wraps JavaParser's `JavaSymbolSolver` (no new dependency tree — same release train as `javaparser-core`). +- `Confidence` enum (`LEXICAL`/`SYNTACTIC`/`RESOLVED`) and `source` field on every `CodeNode` / `CodeEdge`, round-tripped through Neo4j (`prop_*` convention) and H2 cache (schema v5). +- 4–6 Java detectors migrated as proof of value (Spring service / repository, JPA entity / repo, Kafka listener, Spring REST). +- 9 layers of aggressive testing (unit, integration, concurrency, pathological, adversarial, determinism, E2E petclinic regression, property-based via jqwik [pending license OK], PIT mutation profile). + +## Spec +[`docs/specs/2026-04-27-resolver-spi-and-java-pilot-design.md`](docs/specs/2026-04-27-resolver-spi-and-java-pilot-design.md) + +## Acceptance criteria +See spec §13. All checked. + +## Test plan +- [x] `mvn verify` green on CI +- [x] No logical-content regression with `enabled: false` (snapshots refreshed in separate commits — see history) +- [x] E2E petclinic precision / recall measurably up with `enabled: true` (numbers below) +- [x] Determinism gate: resolver runs byte-identical 10× on same input +- [x] Concurrency stress: 1000 files via virtual threads, no deadlocks +- [x] Layer 8 jqwik / Layer 9 PIT non-gating signals captured in the PR + +## Petclinic numbers +| Metric | enabled=false (baseline) | enabled=true (this PR) | Δ | +|---|---|---|---| +| edge precision | _filled at impl time_ | _filled at impl time_ | + | +| edge recall | _filled at impl time_ | _filled at impl time_ | + | + +## Out of scope +- Sub-projects 2–8 (TS / Python / Go / Rust+C+++C# resolvers, framework-aware detect refactor, FP harness, MCP read-path hardening). Each gets its own spec → plan → impl cycle. + +🤖 Generated with [Claude Code](https://claude.com/claude-code) +EOF +)" +``` + +- [ ] **Step 3: Wait for CI;** if any failure, fix on the branch and push (do not `--amend` and force-push). Repeat until CI green. + +- [ ] **Step 4: Hand back to user** per default check-in cadence (b): "PR is open, tests green, ready for human review." + +--- + +## Self-review (run after writing the plan, before execution) + +1. **Spec coverage** — every acceptance criterion (§13) maps to at least one task. Verified. +2. **Placeholder scan** — no "TBD"/"TODO"/"figure out"; concrete code blocks for foundational tasks; templated patterns for repeated migrations. Acceptable per skill DRY guidance. +3. **Type / naming consistency** — `Confidence`, `Resolved`, `EmptyResolved`, `SymbolResolver`, `ResolverRegistry`, `JavaSymbolResolver`, `JavaResolved`, `JavaSourceRootDiscovery` — all referenced consistently across tasks. +4. **Backward compatibility** — Phase 6 detectors keep their existing logic; resolver consumption is purely additive. +5. **Determinism** — Tasks 30, 31 (concurrency), and detector determinism (per Task pattern Step 6) all preserve the determinism gate. +6. **Performance budget** — Task 38 explicitly checks the spec §9 numbers. +7. **License decisions** — Task 35 (jqwik) is gated on user approval; Task 36 (PIT) is Apache-2.0, fine. +8. **Test refresh hazard** — Task 7 isolates the snapshot refresh into its own commit chain so reviewers can verify the diff is bounded to the additive fields. From ee338113caffb3e34a08ee88e2c59c6bf66238d8 Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 16:28:39 +0000 Subject: [PATCH 10/23] feat(model): add Confidence enum (LEXICAL/SYNTACTIC/RESOLVED) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per sub-project 1 spec §5.3. Numeric score() mapping stable (0.6/0.8/0.95). Comparable by natural order. fromString() is case-insensitive and rejects null + unknown values. Plan task 1/42. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../randomcodespace/iq/model/Confidence.java | 59 +++++++++++++++++++ .../iq/model/ConfidenceTest.java | 40 +++++++++++++ 2 files changed, 99 insertions(+) create mode 100644 src/main/java/io/github/randomcodespace/iq/model/Confidence.java create mode 100644 src/test/java/io/github/randomcodespace/iq/model/ConfidenceTest.java diff --git a/src/main/java/io/github/randomcodespace/iq/model/Confidence.java b/src/main/java/io/github/randomcodespace/iq/model/Confidence.java new file mode 100644 index 00000000..75798d7f --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/model/Confidence.java @@ -0,0 +1,59 @@ +package io.github.randomcodespace.iq.model; + +import java.util.Objects; + +/** + * Confidence in the truth of a node or edge, based on the parser pipeline that + * produced it. + *

+ * Lower values mean the assertion comes from textual patterns; higher values + * mean the assertion is backed by parsed structure or resolved symbol types. + * Comparable: {@code LEXICAL} < {@code SYNTACTIC} < {@code RESOLVED}. + *

+ * Numeric mapping (via {@link #score()}) is stable and intended for Cypher / + * MCP / SPA filtering. The enum itself is the authoritative form; the score + * exists only as a convenience for clients that want a single number. + * + * @see Sub-project 1 design — §5.3 Confidence schema + */ +public enum Confidence { + + /** Pattern-only match (regex). The detector saw a textual pattern. */ + LEXICAL(0.6), + + /** AST or parse tree match, no symbol resolution. The detector saw structure. */ + SYNTACTIC(0.8), + + /** Resolved via a {@code SymbolResolver} — the detector saw resolved types. */ + RESOLVED(0.95); + + private final double score; + + Confidence(double score) { + this.score = score; + } + + /** + * Stable numeric score for filtering / threshold logic. + * Mapping: {@code LEXICAL=0.6}, {@code SYNTACTIC=0.8}, {@code RESOLVED=0.95}. + */ + public double score() { + return score; + } + + /** + * Look up a {@code Confidence} by case-insensitive name. + * + * @throws NullPointerException if {@code value} is null + * @throws IllegalArgumentException if {@code value} does not match any constant + */ + public static Confidence fromString(String value) { + Objects.requireNonNull(value, "Confidence value must not be null"); + for (Confidence c : values()) { + if (c.name().equalsIgnoreCase(value)) { + return c; + } + } + throw new IllegalArgumentException("Unknown Confidence: " + value); + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/model/ConfidenceTest.java b/src/test/java/io/github/randomcodespace/iq/model/ConfidenceTest.java new file mode 100644 index 00000000..896a13a3 --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/model/ConfidenceTest.java @@ -0,0 +1,40 @@ +package io.github.randomcodespace.iq.model; + +import org.junit.jupiter.api.Test; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertThrows; +import static org.junit.jupiter.api.Assertions.assertTrue; + +class ConfidenceTest { + + @Test + void scoreMappingIsStable() { + assertEquals(0.6, Confidence.LEXICAL.score(), 1e-9); + assertEquals(0.8, Confidence.SYNTACTIC.score(), 1e-9); + assertEquals(0.95, Confidence.RESOLVED.score(), 1e-9); + } + + @Test + void naturalOrderingMatchesScore() { + assertTrue(Confidence.LEXICAL.compareTo(Confidence.SYNTACTIC) < 0); + assertTrue(Confidence.SYNTACTIC.compareTo(Confidence.RESOLVED) < 0); + } + + @Test + void fromStringNullIsRejected() { + assertThrows(NullPointerException.class, () -> Confidence.fromString(null)); + } + + @Test + void fromStringIsCaseInsensitive() { + assertEquals(Confidence.RESOLVED, Confidence.fromString("resolved")); + assertEquals(Confidence.RESOLVED, Confidence.fromString("RESOLVED")); + assertEquals(Confidence.LEXICAL, Confidence.fromString("LeXiCaL")); + } + + @Test + void fromStringRejectsUnknown() { + assertThrows(IllegalArgumentException.class, () -> Confidence.fromString("perfect")); + } +} From 0ce49e195a5ee23a7424d4daf9caa4ef364182ec Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 16:31:15 +0000 Subject: [PATCH 11/23] feat(model): add confidence + source to CodeNode MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per sub-project 1 spec §5.2. Both fields are additive: - confidence: Confidence (default LEXICAL, never null after setter). Round-trips through Neo4j via ConfidenceConverter (mirrors NodeKindConverter — stored as enum.name() so Cypher filters like WHERE n.confidence = 'RESOLVED' work without case folding). - source: String (default null on bare construction; stamped by detector base classes during emission in a later task). CodeNode is an SDN @Node entity with no-arg constructor + setters, so this task adapts to that shape rather than introducing a builder. The plan's builder-based test was rewritten to use the existing API. Plan task 2/42. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../randomcodespace/iq/model/CodeNode.java | 44 +++++++++++++++ .../iq/model/ConfidenceConverter.java | 24 ++++++++ .../iq/model/CodeNodeConfidenceTest.java | 56 +++++++++++++++++++ 3 files changed, 124 insertions(+) create mode 100644 src/main/java/io/github/randomcodespace/iq/model/ConfidenceConverter.java create mode 100644 src/test/java/io/github/randomcodespace/iq/model/CodeNodeConfidenceTest.java diff --git a/src/main/java/io/github/randomcodespace/iq/model/CodeNode.java b/src/main/java/io/github/randomcodespace/iq/model/CodeNode.java index e51eaaed..c2a3f69e 100644 --- a/src/main/java/io/github/randomcodespace/iq/model/CodeNode.java +++ b/src/main/java/io/github/randomcodespace/iq/model/CodeNode.java @@ -43,6 +43,20 @@ public class CodeNode { /** Layer classification: frontend, backend, infra, shared, unknown. */ private String layer; + /** + * Confidence in this node's existence and shape, set by the detector that + * emitted it. Defaults to {@link Confidence#LEXICAL} (least committal) so + * a node persisted before this field existed reads back without surprise. + */ + @ConvertWith(converter = ConfidenceConverter.class) + private Confidence confidence = Confidence.LEXICAL; + + /** + * Detector class simple name that emitted this node, e.g. + * {@code "SpringServiceDetector"}. Stamped by detector base classes. + */ + private String source; + private List annotations = new ArrayList<>(); @ConvertWith(converter = MapToJsonConverter.class) @@ -134,6 +148,36 @@ public void setLayer(String layer) { this.layer = layer; } + /** + * @return confidence stamped by the detector. Never {@code null} — falls + * back to {@link Confidence#LEXICAL} for nodes loaded before this + * field existed. + */ + public Confidence getConfidence() { + return confidence != null ? confidence : Confidence.LEXICAL; + } + + /** + * Set confidence. {@code null} is normalized to {@link Confidence#LEXICAL} + * so the field is never null at rest. + */ + public void setConfidence(Confidence confidence) { + this.confidence = confidence != null ? confidence : Confidence.LEXICAL; + } + + /** + * @return the simple class name of the detector that emitted this node, + * or {@code null} if the node was constructed bare (e.g. in tests + * or by code paths that have not been migrated). + */ + public String getSource() { + return source; + } + + public void setSource(String source) { + this.source = source; + } + public List getAnnotations() { return annotations; } diff --git a/src/main/java/io/github/randomcodespace/iq/model/ConfidenceConverter.java b/src/main/java/io/github/randomcodespace/iq/model/ConfidenceConverter.java new file mode 100644 index 00000000..347b1c3f --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/model/ConfidenceConverter.java @@ -0,0 +1,24 @@ +package io.github.randomcodespace.iq.model; + +import org.neo4j.driver.Value; +import org.springframework.data.neo4j.core.convert.Neo4jPersistentPropertyConverter; + +/** + * Converts between {@link Confidence} and its uppercase string name for + * Neo4j storage. Stores {@code "LEXICAL"} / {@code "SYNTACTIC"} / + * {@code "RESOLVED"} so Cypher filters like + * {@code WHERE n.confidence = 'RESOLVED'} match without case folding. + */ +public class ConfidenceConverter implements Neo4jPersistentPropertyConverter { + + @Override + public Value write(Confidence confidence) { + return org.neo4j.driver.Values.value(confidence != null ? confidence.name() : null); + } + + @Override + public Confidence read(Value source) { + if (source == null || source.isNull()) return Confidence.LEXICAL; + return Confidence.fromString(source.asString()); + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/model/CodeNodeConfidenceTest.java b/src/test/java/io/github/randomcodespace/iq/model/CodeNodeConfidenceTest.java new file mode 100644 index 00000000..fff67ee8 --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/model/CodeNodeConfidenceTest.java @@ -0,0 +1,56 @@ +package io.github.randomcodespace.iq.model; + +import org.junit.jupiter.api.Test; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertNull; + +class CodeNodeConfidenceTest { + + @Test + void confidenceDefaultsToLexicalOnFreshNode() { + CodeNode n = new CodeNode("node:Foo.java:class:Foo", NodeKind.CLASS, "Foo"); + assertEquals(Confidence.LEXICAL, n.getConfidence(), + "fresh node defaults to LEXICAL — least committal"); + } + + @Test + void confidenceCanBeSetAndRead() { + CodeNode n = new CodeNode("node:Foo.java:class:Foo", NodeKind.CLASS, "Foo"); + n.setConfidence(Confidence.RESOLVED); + assertEquals(Confidence.RESOLVED, n.getConfidence()); + } + + @Test + void confidenceSetterNormalizesNullToLexical() { + CodeNode n = new CodeNode("node:Foo.java:class:Foo", NodeKind.CLASS, "Foo"); + n.setConfidence(Confidence.RESOLVED); + n.setConfidence(null); + assertEquals(Confidence.LEXICAL, n.getConfidence(), + "null setter falls back to LEXICAL — never null"); + } + + @Test + void sourceIsNullUntilSet() { + CodeNode n = new CodeNode("node:Foo.java:class:Foo", NodeKind.CLASS, "Foo"); + assertNull(n.getSource(), + "source defaults to null on the bare constructor; " + + "detector base classes stamp it via setSource() during emission"); + } + + @Test + void sourceCanBeSetAndRead() { + CodeNode n = new CodeNode("node:Foo.java:class:Foo", NodeKind.CLASS, "Foo"); + n.setSource("SpringServiceDetector"); + assertEquals("SpringServiceDetector", n.getSource()); + } + + @Test + void confidenceAndSourceAreIndependent() { + CodeNode n = new CodeNode("node:Foo.java:class:Foo", NodeKind.CLASS, "Foo"); + n.setConfidence(Confidence.SYNTACTIC); + n.setSource("JpaEntityDetector"); + assertEquals(Confidence.SYNTACTIC, n.getConfidence()); + assertEquals("JpaEntityDetector", n.getSource()); + } +} From 86dca96a056ee42041cc0fc2ebfd6ec8301af9f0 Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 16:34:23 +0000 Subject: [PATCH 12/23] feat(model): add confidence + source to CodeEdge MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mirrors the CodeNode change: confidence defaults to LEXICAL and is never null at rest (setter normalizes); source is the detector's simple class name, stamped by the detector base classes during emission. Per sub-project 1 spec §5.3 (Confidence/source on every edge) and plan Task 3. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../randomcodespace/iq/model/CodeEdge.java | 43 +++++++++++++ .../iq/model/CodeEdgeConfidenceTest.java | 60 +++++++++++++++++++ 2 files changed, 103 insertions(+) create mode 100644 src/test/java/io/github/randomcodespace/iq/model/CodeEdgeConfidenceTest.java diff --git a/src/main/java/io/github/randomcodespace/iq/model/CodeEdge.java b/src/main/java/io/github/randomcodespace/iq/model/CodeEdge.java index 7668f88b..779eeb9a 100644 --- a/src/main/java/io/github/randomcodespace/iq/model/CodeEdge.java +++ b/src/main/java/io/github/randomcodespace/iq/model/CodeEdge.java @@ -34,6 +34,20 @@ public class CodeEdge { @ConvertWith(converter = MapToJsonConverter.class) private Map properties = new HashMap<>(); + /** + * Confidence in this edge's existence and target accuracy. Defaults to + * {@link Confidence#LEXICAL} for backward compatibility with edges + * persisted before this field existed. + */ + @ConvertWith(converter = ConfidenceConverter.class) + private Confidence confidence = Confidence.LEXICAL; + + /** + * Detector class simple name that emitted this edge, e.g. + * {@code "SpringServiceDetector"}. Stamped by detector base classes. + */ + private String source; + public CodeEdge() { } @@ -90,6 +104,35 @@ public void setProperties(Map properties) { this.properties = properties; } + /** + * @return confidence stamped by the detector. Never {@code null} — falls + * back to {@link Confidence#LEXICAL} for edges loaded before this + * field existed. + */ + public Confidence getConfidence() { + return confidence != null ? confidence : Confidence.LEXICAL; + } + + /** + * Set confidence. {@code null} is normalized to {@link Confidence#LEXICAL} + * so the field is never null at rest. + */ + public void setConfidence(Confidence confidence) { + this.confidence = confidence != null ? confidence : Confidence.LEXICAL; + } + + /** + * @return the simple class name of the detector that emitted this edge, + * or {@code null} if the edge was constructed bare. + */ + public String getSource() { + return source; + } + + public void setSource(String source) { + this.source = source; + } + @Override public boolean equals(Object o) { if (this == o) return true; diff --git a/src/test/java/io/github/randomcodespace/iq/model/CodeEdgeConfidenceTest.java b/src/test/java/io/github/randomcodespace/iq/model/CodeEdgeConfidenceTest.java new file mode 100644 index 00000000..938b9db6 --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/model/CodeEdgeConfidenceTest.java @@ -0,0 +1,60 @@ +package io.github.randomcodespace.iq.model; + +import org.junit.jupiter.api.Test; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertNull; + +class CodeEdgeConfidenceTest { + + private CodeEdge newEdge() { + CodeNode target = new CodeNode("node:Bar.java:class:Bar", NodeKind.CLASS, "Bar"); + return new CodeEdge("edge:Foo->Bar:depends_on", EdgeKind.DEPENDS_ON, + "node:Foo.java:class:Foo", target); + } + + @Test + void confidenceDefaultsToLexicalOnFreshEdge() { + assertEquals(Confidence.LEXICAL, newEdge().getConfidence(), + "fresh edge defaults to LEXICAL — least committal"); + } + + @Test + void confidenceCanBeSetAndRead() { + CodeEdge e = newEdge(); + e.setConfidence(Confidence.RESOLVED); + assertEquals(Confidence.RESOLVED, e.getConfidence()); + } + + @Test + void confidenceSetterNormalizesNullToLexical() { + CodeEdge e = newEdge(); + e.setConfidence(Confidence.RESOLVED); + e.setConfidence(null); + assertEquals(Confidence.LEXICAL, e.getConfidence(), + "null setter falls back to LEXICAL — never null"); + } + + @Test + void sourceIsNullUntilSet() { + assertNull(newEdge().getSource(), + "source defaults to null on the bare constructor; " + + "detector base classes stamp it via setSource() during emission"); + } + + @Test + void sourceCanBeSetAndRead() { + CodeEdge e = newEdge(); + e.setSource("SpringServiceDetector"); + assertEquals("SpringServiceDetector", e.getSource()); + } + + @Test + void confidenceAndSourceAreIndependent() { + CodeEdge e = newEdge(); + e.setConfidence(Confidence.SYNTACTIC); + e.setSource("JpaEntityDetector"); + assertEquals(Confidence.SYNTACTIC, e.getConfidence()); + assertEquals("JpaEntityDetector", e.getSource()); + } +} From bb71f49d72cf26f175b8e74997792967af38ca32 Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 16:43:26 +0000 Subject: [PATCH 13/23] feat(graph): round-trip confidence + source through Neo4j MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Stores Confidence enum names ('LEXICAL'/'SYNTACTIC'/'RESOLVED') and the detector source string as bare Neo4j properties on both nodes and relationships, alongside layer/kind/module — not under the prop_* dynamic properties prefix, since they are typed first-class fields on CodeNode and CodeEdge. Read path is non-throwing: missing or malformed values fall back to LEXICAL (least committal), so legacy data persisted before these fields existed reads back cleanly without a schema migration on the Neo4j side. Test coverage (11 new tests in GraphStoreConfidenceRoundTripTest): - All three Confidence values round-trip on nodes (parameterized) - Legacy nodes missing both fields fall back to LEXICAL + null - Legacy nodes with source but missing confidence - Malformed confidence strings (e.g. 'PERFECT') fall back without throw - Mixed-case confidence ('ReSoLvEd') parses correctly - Empty source preserved as empty (no silent normalization) - Edge confidence + source round-trip via hydrateEdgesForNode - Legacy edges with missing confidence/source fall back cleanly - Malformed edge confidence does not throw Forward-compat updates: - ProvenanceNeo4jRoundTripTest stubs the new keys (was strict-Mockito) - GraphStoreExtendedTest helper stubs them too Per sub-project 1 spec §5 + plan Task 4. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../randomcodespace/iq/graph/GraphStore.java | 85 ++++- .../GraphStoreConfidenceRoundTripTest.java | 340 ++++++++++++++++++ .../iq/graph/GraphStoreExtendedTest.java | 3 + .../ProvenanceNeo4jRoundTripTest.java | 7 + 4 files changed, 424 insertions(+), 11 deletions(-) create mode 100644 src/test/java/io/github/randomcodespace/iq/graph/GraphStoreConfidenceRoundTripTest.java diff --git a/src/main/java/io/github/randomcodespace/iq/graph/GraphStore.java b/src/main/java/io/github/randomcodespace/iq/graph/GraphStore.java index 1817abe5..8b871e4a 100644 --- a/src/main/java/io/github/randomcodespace/iq/graph/GraphStore.java +++ b/src/main/java/io/github/randomcodespace/iq/graph/GraphStore.java @@ -3,6 +3,7 @@ import io.github.randomcodespace.iq.flow.FlowDataSource; import io.github.randomcodespace.iq.model.CodeEdge; import io.github.randomcodespace.iq.model.CodeNode; +import io.github.randomcodespace.iq.model.Confidence; import io.github.randomcodespace.iq.model.EdgeKind; import io.github.randomcodespace.iq.model.NodeKind; import org.neo4j.graphdb.GraphDatabaseService; @@ -37,6 +38,7 @@ @ConditionalOnBean(GraphRepository.class) public class GraphStore implements FlowDataSource { private static final String PROP_CNT = "cnt"; + private static final String PROP_CONFIDENCE = "confidence"; private static final String PROP_CONNECTIONS = "connections"; private static final String PROP_EXT = "ext"; private static final String PROP_FILEPATH = "filePath"; @@ -211,20 +213,33 @@ public void bulkSave(List nodes) { skipped++; continue; } - edgeBatch.add(Map.of( - PROP_SOURCEID, sourceId, - PROP_TARGETID, targetId, - "edgeId", edge.getId(), - PROP_KIND, edge.getKind().getValue() - )); + // HashMap (not Map.of) so we can null-skip optional fields. + Map edgeProps = new HashMap<>(6); + edgeProps.put(PROP_SOURCEID, sourceId); + edgeProps.put(PROP_TARGETID, targetId); + edgeProps.put("edgeId", edge.getId()); + edgeProps.put(PROP_KIND, edge.getKind().getValue()); + edgeProps.put(PROP_CONFIDENCE, edge.getConfidence().name()); + if (edge.getSource() != null) { + edgeProps.put(PROP_SOURCE, edge.getSource()); + } + edgeBatch.add(edgeProps); created++; } if (!edgeBatch.isEmpty()) { try (Transaction tx = graphDb.beginTx()) { + // coalesce(e.source, NULL) — Cypher accepts missing map keys as NULL, + // so omitting `source` from the param map cleanly results in r.source IS NULL. tx.execute(""" UNWIND $batch AS e MATCH (s:CodeNode {id: e.sourceId}), (t:CodeNode {id: e.targetId}) - CREATE (s)-[:RELATES_TO {id: e.edgeId, kind: e.kind, sourceId: e.sourceId}]->(t) + CREATE (s)-[:RELATES_TO { + id: e.edgeId, + kind: e.kind, + sourceId: e.sourceId, + confidence: e.confidence, + source: e.source + }]->(t) """, Map.of("batch", edgeBatch)); tx.commit(); } @@ -252,6 +267,12 @@ private Map nodeToProps(CodeNode node) { if (node.getLineStart() != null) props.put("lineStart", node.getLineStart()); if (node.getLineEnd() != null) props.put("lineEnd", node.getLineEnd()); if (node.getLayer() != null) props.put(PROP_LAYER, node.getLayer()); + // Confidence + source are typed first-class fields on CodeNode (not entries + // in node.getProperties()) — store as bare Neo4j properties alongside layer/kind. + // Confidence is never null at rest (setter normalizes to LEXICAL); store the + // enum name so Cypher filters like WHERE n.confidence = 'RESOLVED' match. + props.put(PROP_CONFIDENCE, node.getConfidence().name()); + if (node.getSource() != null) props.put(PROP_SOURCE, node.getSource()); if (node.getAnnotations() != null && !node.getAnnotations().isEmpty()) { props.put("annotations", String.join(",", node.getAnnotations())); } @@ -1151,7 +1172,8 @@ private void hydrateEdges(List nodes) { try (Transaction tx = graphDb.beginTx()) { var result = tx.execute( "MATCH (s:CodeNode)-[r:RELATES_TO]->(t:CodeNode) " - + "RETURN r.id AS id, r.kind AS kind, s.id AS sourceId, t.id AS targetId"); + + "RETURN r.id AS id, r.kind AS kind, s.id AS sourceId, t.id AS targetId, " + + "r.confidence AS confidence, r.source AS source"); while (result.hasNext()) { var row = result.next(); String sourceId = (String) row.get(PROP_SOURCEID); @@ -1168,12 +1190,35 @@ private void hydrateEdges(List nodes) { } catch (IllegalArgumentException e) { continue; } - source.getEdges().add(new CodeEdge(edgeId, edgeKind, sourceId, target)); + CodeEdge edge = new CodeEdge(edgeId, edgeKind, sourceId, target); + applyEdgeConfidenceAndSource(edge, row); + source.getEdges().add(edge); } } } } + /** + * Apply confidence + source from a Cypher row to an edge. Missing or malformed + * confidence falls back to {@link Confidence#LEXICAL} — never throws — so legacy + * edges written before these fields existed read back cleanly. Source stays null + * when missing. + */ + private static void applyEdgeConfidenceAndSource(CodeEdge edge, Map row) { + Object confObj = row.get(PROP_CONFIDENCE); + if (confObj instanceof String confStr) { + try { + edge.setConfidence(Confidence.fromString(confStr)); + } catch (IllegalArgumentException ignored) { + // keep default LEXICAL + } + } + Object srcObj = row.get(PROP_SOURCE); + if (srcObj instanceof String src) { + edge.setSource(src); + } + } + /** * Hydrate edges for a single node within an existing transaction. * Used by findById() to populate outgoing edges for node detail views. @@ -1181,7 +1226,8 @@ private void hydrateEdges(List nodes) { private void hydrateEdgesForNode(Transaction tx, CodeNode node) { var result = tx.execute( "MATCH (s:CodeNode {id: $nodeId})-[r:RELATES_TO]->(t:CodeNode) " - + "RETURN r.id AS id, r.kind AS kind, t.id AS targetId, t", + + "RETURN r.id AS id, r.kind AS kind, t.id AS targetId, t, " + + "r.confidence AS confidence, r.source AS source", Map.of(PROP_NODEID, node.getId())); while (result.hasNext()) { var row = result.next(); @@ -1194,10 +1240,15 @@ private void hydrateEdgesForNode(Transaction tx, CodeNode node) { } catch (IllegalArgumentException e) { continue; } + // targetId is read from the row but not used here — the lightweight target + // node is built from the embedded `t` Node value. Suppress unused warning. + assert targetId == null || !targetId.isEmpty(); // Build a lightweight target node (id only for reference) var targetNeo4j = (org.neo4j.graphdb.Node) row.get("t"); CodeNode target = nodeFromNeo4j(targetNeo4j); - node.getEdges().add(new CodeEdge(edgeId, edgeKind, node.getId(), target)); + CodeEdge edge = new CodeEdge(edgeId, edgeKind, node.getId(), target); + applyEdgeConfidenceAndSource(edge, row); + node.getEdges().add(edge); } } @@ -1216,6 +1267,18 @@ private static CodeNode nodeFromNeo4j(org.neo4j.graphdb.Node neo4jNode) { node.setModule((String) neo4jNode.getProperty(PROP_MODULE, null)); node.setFilePath((String) neo4jNode.getProperty(PROP_FILEPATH, null)); node.setLayer((String) neo4jNode.getProperty(PROP_LAYER, null)); + // Restore confidence + source. Missing/malformed confidence falls back to + // LEXICAL — least committal — so legacy nodes written before these fields + // existed read back without surprise. Source stays null when missing. + String confStr = (String) neo4jNode.getProperty(PROP_CONFIDENCE, null); + if (confStr != null) { + try { + node.setConfidence(Confidence.fromString(confStr)); + } catch (IllegalArgumentException ignored) { + // keep default LEXICAL — never throw on legacy/garbled values + } + } + node.setSource((String) neo4jNode.getProperty(PROP_SOURCE, null)); Object lineStart = neo4jNode.getProperty("lineStart", null); if (lineStart instanceof Number n) node.setLineStart(n.intValue()); diff --git a/src/test/java/io/github/randomcodespace/iq/graph/GraphStoreConfidenceRoundTripTest.java b/src/test/java/io/github/randomcodespace/iq/graph/GraphStoreConfidenceRoundTripTest.java new file mode 100644 index 00000000..c3b0b7b0 --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/graph/GraphStoreConfidenceRoundTripTest.java @@ -0,0 +1,340 @@ +package io.github.randomcodespace.iq.graph; + +import io.github.randomcodespace.iq.model.CodeEdge; +import io.github.randomcodespace.iq.model.CodeNode; +import io.github.randomcodespace.iq.model.Confidence; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.extension.ExtendWith; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.EnumSource; +import org.mockito.Mock; +import org.mockito.junit.jupiter.MockitoExtension; +import org.neo4j.graphdb.GraphDatabaseService; +import org.neo4j.graphdb.Result; +import org.neo4j.graphdb.Transaction; + +import java.util.List; +import java.util.Map; +import java.util.Optional; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.mockito.ArgumentMatchers.anyMap; +import static org.mockito.ArgumentMatchers.anyString; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.when; + +/** + * Aggressive Neo4j round-trip coverage for {@link CodeNode#getConfidence()} + + * {@link CodeNode#getSource()} (and the same on {@link CodeEdge}). Verifies: + *

    + *
  • All three {@link Confidence} values round-trip cleanly on nodes
  • + *
  • Missing properties (legacy data) fall back to {@code LEXICAL} / {@code null} + * — never throw, never null-pointer the typed field
  • + *
  • Malformed / mixed-case confidence strings are tolerated
  • + *
  • Edge confidence + source round-trip through {@code hydrateEdgesForNode}
  • + *
+ */ +@ExtendWith(MockitoExtension.class) +class GraphStoreConfidenceRoundTripTest { + + @Mock + private GraphRepository repository; + + @Mock + private GraphDatabaseService graphDb; + + private GraphStore store; + + @BeforeEach + void setUp() { + store = new GraphStore(repository, graphDb); + } + + // ---------- Node read path: nodeFromNeo4j() via findById() ---------- + + @ParameterizedTest + @EnumSource(Confidence.class) + void node_allConfidenceValuesRoundTrip(Confidence value) { + var neo4jNode = stubBareNeo4jNode("node:Foo.java:class:Foo", "class", "Foo"); + when(neo4jNode.getProperty("confidence", null)).thenReturn(value.name()); + when(neo4jNode.getProperty("source", null)).thenReturn("SpringServiceDetector"); + when(neo4jNode.getPropertyKeys()).thenReturn(List.of()); + wireFindByIdResult(neo4jNode); + + Optional result = store.findById("node:Foo.java:class:Foo"); + + assertThat(result).isPresent(); + assertThat(result.get().getConfidence()) + .as("confidence round-trips through Neo4j read path") + .isEqualTo(value); + assertThat(result.get().getSource()).isEqualTo("SpringServiceDetector"); + } + + @Test + void node_legacyMissingConfidenceFallsBackToLexical() { + // Simulates a node persisted before this field existed: confidence + source + // are absent. Reader must default to LEXICAL (least committal) and null. + var neo4jNode = stubBareNeo4jNode("node:Legacy.java:class:Legacy", "class", "Legacy"); + when(neo4jNode.getProperty("confidence", null)).thenReturn(null); + when(neo4jNode.getProperty("source", null)).thenReturn(null); + when(neo4jNode.getPropertyKeys()).thenReturn(List.of()); + wireFindByIdResult(neo4jNode); + + Optional result = store.findById("node:Legacy.java:class:Legacy"); + + assertThat(result).isPresent(); + assertThat(result.get().getConfidence()) + .as("missing confidence in Neo4j defaults to LEXICAL — never null") + .isEqualTo(Confidence.LEXICAL); + assertThat(result.get().getSource()) + .as("missing source stays null — no string sentinel") + .isNull(); + } + + @Test + void node_legacyHasSourceButMissingConfidence() { + // Mixed legacy: source got populated some other way but confidence wasn't. + // Source preserved, confidence still falls back. + var neo4jNode = stubBareNeo4jNode("node:Mixed.java:class:Mixed", "class", "Mixed"); + when(neo4jNode.getProperty("confidence", null)).thenReturn(null); + when(neo4jNode.getProperty("source", null)).thenReturn("PartialMigrationDetector"); + when(neo4jNode.getPropertyKeys()).thenReturn(List.of()); + wireFindByIdResult(neo4jNode); + + Optional result = store.findById("node:Mixed.java:class:Mixed"); + + assertThat(result).isPresent(); + assertThat(result.get().getConfidence()).isEqualTo(Confidence.LEXICAL); + assertThat(result.get().getSource()).isEqualTo("PartialMigrationDetector"); + } + + @Test + void node_malformedConfidenceFallsBackToLexicalWithoutThrowing() { + // A garbled write or a future enum addition that hasn't shipped here yet: + // the reader must not throw — it falls back to LEXICAL silently. + var neo4jNode = stubBareNeo4jNode("node:Garbled.java:class:Garbled", "class", "Garbled"); + when(neo4jNode.getProperty("confidence", null)).thenReturn("PERFECT"); // not in enum + when(neo4jNode.getProperty("source", null)).thenReturn(null); + when(neo4jNode.getPropertyKeys()).thenReturn(List.of()); + wireFindByIdResult(neo4jNode); + + // Must not throw IllegalArgumentException + Optional result = store.findById("node:Garbled.java:class:Garbled"); + + assertThat(result).isPresent(); + assertThat(result.get().getConfidence()) + .as("unknown confidence string falls back to LEXICAL — read path is non-throwing") + .isEqualTo(Confidence.LEXICAL); + } + + @Test + void node_mixedCaseConfidenceParsesCorrectly() { + // Confidence.fromString is case-insensitive — verify the read path uses it. + var neo4jNode = stubBareNeo4jNode("node:Mixed.java:class:Mixed", "class", "Mixed"); + when(neo4jNode.getProperty("confidence", null)).thenReturn("ReSoLvEd"); + when(neo4jNode.getProperty("source", null)).thenReturn("CaseTestDetector"); + when(neo4jNode.getPropertyKeys()).thenReturn(List.of()); + wireFindByIdResult(neo4jNode); + + Optional result = store.findById("node:Mixed.java:class:Mixed"); + + assertThat(result).isPresent(); + assertThat(result.get().getConfidence()).isEqualTo(Confidence.RESOLVED); + } + + @Test + void node_emptyStringSourcePreservedAsEmpty() { + // Defensive: if upstream wrote an empty string, we don't silently turn it + // into null — the field reads back as empty string. (Detectors should never + // emit empty source, but the read path stays faithful.) + var neo4jNode = stubBareNeo4jNode("node:Empty.java:class:Empty", "class", "Empty"); + when(neo4jNode.getProperty("confidence", null)).thenReturn("LEXICAL"); + when(neo4jNode.getProperty("source", null)).thenReturn(""); + when(neo4jNode.getPropertyKeys()).thenReturn(List.of()); + wireFindByIdResult(neo4jNode); + + Optional result = store.findById("node:Empty.java:class:Empty"); + + assertThat(result).isPresent(); + assertThat(result.get().getSource()).isEmpty(); + } + + // ---------- Edge read path: hydrateEdgesForNode() via findById() ---------- + + @Test + void edge_confidenceAndSourceRoundTrip() { + // findById hydrates outgoing edges. Mock both the node lookup AND the edge query. + var neo4jNode = stubBareNeo4jNode("node:Foo.java:class:Foo", "class", "Foo"); + when(neo4jNode.getProperty("confidence", null)).thenReturn("RESOLVED"); + when(neo4jNode.getProperty("source", null)).thenReturn("SpringServiceDetector"); + when(neo4jNode.getPropertyKeys()).thenReturn(List.of()); + + var targetNeo4j = stubBareNeo4jNode("node:Bar.java:class:Bar", "class", "Bar"); + when(targetNeo4j.getProperty("confidence", null)).thenReturn(null); + when(targetNeo4j.getProperty("source", null)).thenReturn(null); + when(targetNeo4j.getPropertyKeys()).thenReturn(List.of()); + + var tx = mock(Transaction.class); + when(graphDb.beginTx()).thenReturn(tx); + + // First execute(): node lookup + var nodeResult = mock(Result.class); + when(nodeResult.hasNext()).thenReturn(true, false); + when(nodeResult.next()).thenReturn(Map.of("n", neo4jNode)); + + // Second execute(): outgoing edges + var edgeResult = mock(Result.class); + when(edgeResult.hasNext()).thenReturn(true, false); + when(edgeResult.next()).thenReturn(Map.of( + "id", "edge:Foo->Bar:depends_on", + "kind", "depends_on", + "targetId", "node:Bar.java:class:Bar", + "t", targetNeo4j, + "confidence", "RESOLVED", + "source", "SpringDependsOnDetector" + )); + + when(tx.execute(anyString(), anyMap())).thenReturn(nodeResult, edgeResult); + + Optional result = store.findById("node:Foo.java:class:Foo"); + + assertThat(result).isPresent(); + assertThat(result.get().getEdges()).hasSize(1); + CodeEdge edge = result.get().getEdges().getFirst(); + assertThat(edge.getConfidence()).isEqualTo(Confidence.RESOLVED); + assertThat(edge.getSource()).isEqualTo("SpringDependsOnDetector"); + } + + @Test + void edge_legacyMissingConfidenceAndSourceFallsBackCleanly() { + var neo4jNode = stubBareNeo4jNode("node:Foo.java:class:Foo", "class", "Foo"); + when(neo4jNode.getProperty("confidence", null)).thenReturn(null); + when(neo4jNode.getProperty("source", null)).thenReturn(null); + when(neo4jNode.getPropertyKeys()).thenReturn(List.of()); + + var targetNeo4j = stubBareNeo4jNode("node:Bar.java:class:Bar", "class", "Bar"); + when(targetNeo4j.getProperty("confidence", null)).thenReturn(null); + when(targetNeo4j.getProperty("source", null)).thenReturn(null); + when(targetNeo4j.getPropertyKeys()).thenReturn(List.of()); + + var tx = mock(Transaction.class); + when(graphDb.beginTx()).thenReturn(tx); + + var nodeResult = mock(Result.class); + when(nodeResult.hasNext()).thenReturn(true, false); + when(nodeResult.next()).thenReturn(Map.of("n", neo4jNode)); + + // Edge row missing confidence + source keys (legacy edge). Map.of cannot + // contain nulls, so we use HashMap-style construction via java.util.HashMap. + java.util.HashMap legacyEdgeRow = new java.util.HashMap<>(); + legacyEdgeRow.put("id", "edge:Foo->Bar:legacy"); + legacyEdgeRow.put("kind", "depends_on"); + legacyEdgeRow.put("targetId", "node:Bar.java:class:Bar"); + legacyEdgeRow.put("t", targetNeo4j); + legacyEdgeRow.put("confidence", null); + legacyEdgeRow.put("source", null); + + var edgeResult = mock(Result.class); + when(edgeResult.hasNext()).thenReturn(true, false); + when(edgeResult.next()).thenReturn(legacyEdgeRow); + + when(tx.execute(anyString(), anyMap())).thenReturn(nodeResult, edgeResult); + + Optional result = store.findById("node:Foo.java:class:Foo"); + + assertThat(result).isPresent(); + assertThat(result.get().getEdges()).hasSize(1); + CodeEdge edge = result.get().getEdges().getFirst(); + assertThat(edge.getConfidence()) + .as("legacy edge missing confidence falls back to LEXICAL") + .isEqualTo(Confidence.LEXICAL); + assertThat(edge.getSource()) + .as("legacy edge missing source stays null") + .isNull(); + } + + @Test + void edge_malformedConfidenceFallsBackToLexicalWithoutThrowing() { + var neo4jNode = stubBareNeo4jNode("node:Foo.java:class:Foo", "class", "Foo"); + when(neo4jNode.getProperty("confidence", null)).thenReturn(null); + when(neo4jNode.getProperty("source", null)).thenReturn(null); + when(neo4jNode.getPropertyKeys()).thenReturn(List.of()); + + var targetNeo4j = stubBareNeo4jNode("node:Bar.java:class:Bar", "class", "Bar"); + when(targetNeo4j.getProperty("confidence", null)).thenReturn(null); + when(targetNeo4j.getProperty("source", null)).thenReturn(null); + when(targetNeo4j.getPropertyKeys()).thenReturn(List.of()); + + var tx = mock(Transaction.class); + when(graphDb.beginTx()).thenReturn(tx); + + var nodeResult = mock(Result.class); + when(nodeResult.hasNext()).thenReturn(true, false); + when(nodeResult.next()).thenReturn(Map.of("n", neo4jNode)); + + var edgeResult = mock(Result.class); + when(edgeResult.hasNext()).thenReturn(true, false); + when(edgeResult.next()).thenReturn(Map.of( + "id", "edge:Foo->Bar:garbled", + "kind", "depends_on", + "targetId", "node:Bar.java:class:Bar", + "t", targetNeo4j, + "confidence", "PERFECT", // not a Confidence enum + "source", "GarbledDetector" + )); + + when(tx.execute(anyString(), anyMap())).thenReturn(nodeResult, edgeResult); + + Optional result = store.findById("node:Foo.java:class:Foo"); + + assertThat(result).isPresent(); + CodeEdge edge = result.get().getEdges().getFirst(); + assertThat(edge.getConfidence()) + .as("garbled enum string does not throw — falls back to LEXICAL") + .isEqualTo(Confidence.LEXICAL); + assertThat(edge.getSource()) + .as("source is preserved even when confidence is garbled") + .isEqualTo("GarbledDetector"); + } + + // ---------- Helpers ---------- + + /** + * Build a Neo4j Node mock with the standard non-confidence-related getProperty + * stubs already wired (id, kind, label, fqn, module, filePath, layer, lineStart, + * lineEnd, annotations). Caller adds confidence + source + propertyKeys stubs. + */ + private static org.neo4j.graphdb.Node stubBareNeo4jNode(String id, String kindStr, String label) { + var n = mock(org.neo4j.graphdb.Node.class); + when(n.getProperty("id", null)).thenReturn(id); + when(n.getProperty("kind", null)).thenReturn(kindStr); + when(n.getProperty("label", "")).thenReturn(label); + when(n.getProperty("fqn", null)).thenReturn(null); + when(n.getProperty("module", null)).thenReturn(null); + when(n.getProperty("filePath", null)).thenReturn(null); + when(n.getProperty("layer", null)).thenReturn(null); + when(n.getProperty("lineStart", null)).thenReturn(null); + when(n.getProperty("lineEnd", null)).thenReturn(null); + when(n.getProperty("annotations", null)).thenReturn(null); + return n; + } + + /** + * Wire up findById's transaction chain: first execute() returns the node row, + * second execute() (the edge hydration) returns empty. + */ + private void wireFindByIdResult(org.neo4j.graphdb.Node neo4jNode) { + var tx = mock(Transaction.class); + when(graphDb.beginTx()).thenReturn(tx); + + var nodeResult = mock(Result.class); + when(nodeResult.hasNext()).thenReturn(true, false); + when(nodeResult.next()).thenReturn(Map.of("n", neo4jNode)); + + var edgeResult = mock(Result.class); + when(edgeResult.hasNext()).thenReturn(false); + + when(tx.execute(anyString(), anyMap())).thenReturn(nodeResult, edgeResult); + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/graph/GraphStoreExtendedTest.java b/src/test/java/io/github/randomcodespace/iq/graph/GraphStoreExtendedTest.java index 6a9181e9..e3dfba7b 100644 --- a/src/test/java/io/github/randomcodespace/iq/graph/GraphStoreExtendedTest.java +++ b/src/test/java/io/github/randomcodespace/iq/graph/GraphStoreExtendedTest.java @@ -51,6 +51,9 @@ private org.neo4j.graphdb.Node mockNeo4jNode(String id, String kind, String labe when(neo4jNode.getProperty("layer", null)).thenReturn(null); when(neo4jNode.getProperty("lineStart", null)).thenReturn(null); when(neo4jNode.getProperty("lineEnd", null)).thenReturn(null); + when(neo4jNode.getProperty("annotations", null)).thenReturn(null); + when(neo4jNode.getProperty("confidence", null)).thenReturn(null); + when(neo4jNode.getProperty("source", null)).thenReturn(null); return neo4jNode; } diff --git a/src/test/java/io/github/randomcodespace/iq/intelligence/ProvenanceNeo4jRoundTripTest.java b/src/test/java/io/github/randomcodespace/iq/intelligence/ProvenanceNeo4jRoundTripTest.java index 748133ca..8c6a7fde 100644 --- a/src/test/java/io/github/randomcodespace/iq/intelligence/ProvenanceNeo4jRoundTripTest.java +++ b/src/test/java/io/github/randomcodespace/iq/intelligence/ProvenanceNeo4jRoundTripTest.java @@ -63,6 +63,11 @@ void provenance_survivesNeo4jRoundTrip() { when(neo4jNode.getProperty("lineStart", null)).thenReturn(null); when(neo4jNode.getProperty("lineEnd", null)).thenReturn(null); when(neo4jNode.getProperty("annotations", null)).thenReturn(null); + // confidence + source are typed first-class fields read by nodeFromNeo4j; + // this test doesn't care about them, so stub null (legacy/unset) and let the + // reader fall back to its defaults. + when(neo4jNode.getProperty("confidence", null)).thenReturn(null); + when(neo4jNode.getProperty("source", null)).thenReturn(null); // Property keys as stored by bulkSave (prop_ prefix, values as String) when(neo4jNode.getPropertyKeys()).thenReturn(List.of( @@ -122,6 +127,8 @@ void provenance_survivesNeo4jRoundTrip_withNullRepoUrl() { when(neo4jNode.getProperty("lineStart", null)).thenReturn(null); when(neo4jNode.getProperty("lineEnd", null)).thenReturn(null); when(neo4jNode.getProperty("annotations", null)).thenReturn(null); + when(neo4jNode.getProperty("confidence", null)).thenReturn(null); + when(neo4jNode.getProperty("source", null)).thenReturn(null); // Only required provenance keys (no repo_url, no commit_sha) when(neo4jNode.getPropertyKeys()).thenReturn(List.of( From 1534338cdea006d85533fdcddc12e7eb5dafc711 Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 16:48:07 +0000 Subject: [PATCH 14/23] feat(cache): bump CACHE_VERSION to 5; round-trip confidence + source MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Confidence + detector source now serialize through the H2 analysis cache alongside the rest of the node/edge JSON blob. CACHE_VERSION is bumped 4→5 so any existing v4 caches drop and re-populate on next open. Deviation from plan §Task 5: the plan suggested adding `confidence` and `source` SQL columns. Skipped — they would only matter for SQL-level filtering, which we don't do today, and the JSON `data` blob is the authoritative shape on read. We can add columns later if a query layer needs them. YAGNI for now; the version bump alone guarantees no stale v4 rows leak through with the old shape. Test coverage (12 new tests in AnalysisCacheConfidenceTest): - All three Confidence values round-trip on nodes (parameterized) - Bare nodes (no setter calls) round-trip as LEXICAL + null source - Upsert overwrites confidence (no silent decay to older value) - Clear → re-store preserves confidence - All three Confidence values round-trip on edges (parameterized) - Bare edges round-trip as LEXICAL + null - setConfidence(null) is normalized to LEXICAL (never-null invariant) - CACHE_VERSION reflection assertion guards against accidental rollback Per sub-project 1 spec §5 + plan Task 5. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../iq/cache/AnalysisCache.java | 39 +++- .../iq/cache/AnalysisCacheConfidenceTest.java | 207 ++++++++++++++++++ 2 files changed, 244 insertions(+), 2 deletions(-) create mode 100644 src/test/java/io/github/randomcodespace/iq/cache/AnalysisCacheConfidenceTest.java diff --git a/src/main/java/io/github/randomcodespace/iq/cache/AnalysisCache.java b/src/main/java/io/github/randomcodespace/iq/cache/AnalysisCache.java index fa15ac01..630d16b4 100644 --- a/src/main/java/io/github/randomcodespace/iq/cache/AnalysisCache.java +++ b/src/main/java/io/github/randomcodespace/iq/cache/AnalysisCache.java @@ -5,6 +5,7 @@ import com.fasterxml.jackson.databind.ObjectMapper; import io.github.randomcodespace.iq.model.CodeEdge; import io.github.randomcodespace.iq.model.CodeNode; +import io.github.randomcodespace.iq.model.Confidence; import io.github.randomcodespace.iq.model.EdgeKind; import io.github.randomcodespace.iq.model.NodeKind; import org.slf4j.Logger; @@ -39,8 +40,8 @@ public final class AnalysisCache implements Closeable { private static final Logger log = LoggerFactory.getLogger(AnalysisCache.class); - /** Bump when hash algorithm or schema changes to force cache invalidation. */ - private static final int CACHE_VERSION = 4; + /** Bump when hash algorithm or serialization shape changes to force cache invalidation. */ + private static final int CACHE_VERSION = 5; private static final String SCHEMA_SQL = """ CREATE TABLE IF NOT EXISTS cache_meta ( @@ -689,6 +690,10 @@ private String serializeNode(CodeNode node) { if (node.getLineStart() != null) data.put("line_start", node.getLineStart()); if (node.getLineEnd() != null) data.put("line_end", node.getLineEnd()); if (node.getLayer() != null) data.put("layer", node.getLayer()); + // Confidence is never null at rest (setter normalizes to LEXICAL); store the + // enum name. Source is optional and stays null for bare construction. + data.put("confidence", node.getConfidence().name()); + if (node.getSource() != null) data.put("source", node.getSource()); if (node.getAnnotations() != null && !node.getAnnotations().isEmpty()) { data.put("annotations", node.getAnnotations()); } @@ -720,6 +725,20 @@ private CodeNode deserializeNode(String json) { if (data.get("line_start") instanceof Number n) node.setLineStart(n.intValue()); if (data.get("line_end") instanceof Number n) node.setLineEnd(n.intValue()); node.setLayer((String) data.get("layer")); + // Confidence + source: missing/malformed values fall back to LEXICAL/null + // — never throw — so legacy cache rows without these fields still load. + Object confObj = data.get("confidence"); + if (confObj instanceof String confStr) { + try { + node.setConfidence(Confidence.fromString(confStr)); + } catch (IllegalArgumentException ignored) { + // keep default LEXICAL + } + } + Object srcObj = data.get("source"); + if (srcObj instanceof String src) { + node.setSource(src); + } if (data.get("annotations") instanceof List list) { node.setAnnotations(list.stream().map(Object::toString).toList()); } @@ -743,6 +762,9 @@ private String serializeEdge(CodeEdge edge) { if (edge.getTarget() != null) { data.put("target_id", edge.getTarget().getId()); } + // Confidence is never null at rest; source is optional. + data.put("confidence", edge.getConfidence().name()); + if (edge.getSource() != null) data.put("source", edge.getSource()); if (edge.getProperties() != null && !edge.getProperties().isEmpty()) { data.put("properties", edge.getProperties()); } @@ -772,6 +794,19 @@ private CodeEdge deserializeEdge(String json) { } CodeEdge edge = new CodeEdge(id, EdgeKind.fromValue(kindStr), sourceId, target); + // Confidence + source: missing/malformed → LEXICAL/null, never throw. + Object confObj = data.get("confidence"); + if (confObj instanceof String confStr) { + try { + edge.setConfidence(Confidence.fromString(confStr)); + } catch (IllegalArgumentException ignored) { + // keep default LEXICAL + } + } + Object srcObj = data.get("source"); + if (srcObj instanceof String src) { + edge.setSource(src); + } if (data.get("properties") instanceof Map map) { @SuppressWarnings("unchecked") Map props = (Map) map; diff --git a/src/test/java/io/github/randomcodespace/iq/cache/AnalysisCacheConfidenceTest.java b/src/test/java/io/github/randomcodespace/iq/cache/AnalysisCacheConfidenceTest.java new file mode 100644 index 00000000..64e284b7 --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/cache/AnalysisCacheConfidenceTest.java @@ -0,0 +1,207 @@ +package io.github.randomcodespace.iq.cache; + +import io.github.randomcodespace.iq.model.CodeEdge; +import io.github.randomcodespace.iq.model.CodeNode; +import io.github.randomcodespace.iq.model.Confidence; +import io.github.randomcodespace.iq.model.EdgeKind; +import io.github.randomcodespace.iq.model.NodeKind; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.io.TempDir; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.EnumSource; + +import java.lang.reflect.Field; +import java.nio.file.Path; +import java.util.List; + +import static org.junit.jupiter.api.Assertions.*; + +/** + * Aggressive H2-cache round-trip coverage for {@link Confidence} and detector + * source on cached nodes and edges. Verifies that bumping {@code CACHE_VERSION} + * to 5 actually carries the new fields through both the serialize and + * deserialize paths, including: + *
    + *
  • All three confidence values (LEXICAL/SYNTACTIC/RESOLVED) on nodes and edges
  • + *
  • Bare model objects (no confidence explicitly set) round-trip as LEXICAL
  • + *
  • Source is optional and stays null on bare objects
  • + *
  • Repeated upsert preserves confidence (no silent decay)
  • + *
  • {@code CACHE_VERSION} is exactly 5 — guards against accidental rollback
  • + *
+ */ +class AnalysisCacheConfidenceTest { + + private AnalysisCache cache; + + @BeforeEach + void setUp(@TempDir Path tempDir) { + cache = new AnalysisCache(tempDir.resolve("test-cache.db")); + } + + @AfterEach + void tearDown() { + if (cache != null) { + cache.close(); + } + } + + // ---------- Node round-trips ---------- + + @ParameterizedTest + @EnumSource(Confidence.class) + void node_allConfidenceValuesRoundTripThroughCache(Confidence value) { + CodeNode node = new CodeNode("test:cache:" + value.name(), NodeKind.CLASS, "X"); + node.setConfidence(value); + node.setSource("MyDetector"); + + cache.storeResults("h-" + value.name(), "X.java", "java", + List.of(node), List.of()); + + var result = cache.loadCachedResults("h-" + value.name()); + assertNotNull(result); + assertEquals(1, result.nodes().size()); + CodeNode loaded = result.nodes().getFirst(); + assertEquals(value, loaded.getConfidence(), + "node confidence must round-trip through the H2 cache"); + assertEquals("MyDetector", loaded.getSource()); + } + + @Test + void node_bareConstructionDefaultsRoundTripAsLexicalAndNullSource() { + // Bare node — no confidence or source set. Round-trip must yield LEXICAL + null + // (matches CodeNode field defaults and the "least committal" invariant). + CodeNode node = new CodeNode("test:bare:Foo", NodeKind.CLASS, "Foo"); + cache.storeResults("h-bare", "Foo.java", "java", + List.of(node), List.of()); + + var result = cache.loadCachedResults("h-bare"); + assertNotNull(result); + CodeNode loaded = result.nodes().getFirst(); + assertEquals(Confidence.LEXICAL, loaded.getConfidence(), + "bare node round-trips as LEXICAL — least committal default"); + assertNull(loaded.getSource(), + "bare node round-trips with null source — no string sentinel"); + } + + @Test + void node_upsertPreservesConfidenceAndSource() { + // First write with one confidence/source, then overwrite with a stronger one. + // Reload must reflect the latest write — no silent decay. + CodeNode v1 = new CodeNode("test:upsert:Foo", NodeKind.CLASS, "Foo"); + v1.setConfidence(Confidence.LEXICAL); + v1.setSource("RegexDetector"); + cache.storeResults("h-upsert", "Foo.java", "java", List.of(v1), List.of()); + + CodeNode v2 = new CodeNode("test:upsert:Foo:v2", NodeKind.CLASS, "Foo"); + v2.setConfidence(Confidence.RESOLVED); + v2.setSource("ResolvedDetector"); + cache.storeResults("h-upsert", "Foo.java", "java", List.of(v2), List.of()); + + var result = cache.loadCachedResults("h-upsert"); + assertNotNull(result); + assertEquals(1, result.nodes().size()); + CodeNode loaded = result.nodes().getFirst(); + assertEquals(Confidence.RESOLVED, loaded.getConfidence(), + "upsert must overwrite confidence — never silently keep the older value"); + assertEquals("ResolvedDetector", loaded.getSource()); + } + + @Test + void node_clearThenStoreReroundtripsConfidence() { + // Defensive: after a full clear, the next round-trip still works. + CodeNode pre = new CodeNode("pre:n", NodeKind.CLASS, "Pre"); + pre.setConfidence(Confidence.RESOLVED); + cache.storeResults("h-pre", "P.java", "java", List.of(pre), List.of()); + cache.clear(); + // Verify clear removed it. + assertNull(cache.loadCachedResults("h-pre")); + + CodeNode post = new CodeNode("post:n", NodeKind.CLASS, "Post"); + post.setConfidence(Confidence.SYNTACTIC); + post.setSource("PostClearDetector"); + cache.storeResults("h-post", "P.java", "java", List.of(post), List.of()); + + var result = cache.loadCachedResults("h-post"); + assertNotNull(result); + assertEquals(Confidence.SYNTACTIC, result.nodes().getFirst().getConfidence()); + assertEquals("PostClearDetector", result.nodes().getFirst().getSource()); + } + + // ---------- Edge round-trips ---------- + + @ParameterizedTest + @EnumSource(Confidence.class) + void edge_allConfidenceValuesRoundTripThroughCache(Confidence value) { + CodeNode src = new CodeNode("e:src:" + value.name(), NodeKind.CLASS, "Src"); + CodeNode tgt = new CodeNode("e:tgt:" + value.name(), NodeKind.CLASS, "Tgt"); + CodeEdge edge = new CodeEdge("e:edge:" + value.name(), EdgeKind.DEPENDS_ON, + "e:src:" + value.name(), tgt); + edge.setConfidence(value); + edge.setSource("EdgeDetector"); + + cache.storeResults("e-" + value.name(), "E.java", "java", + List.of(src, tgt), List.of(edge)); + + var result = cache.loadCachedResults("e-" + value.name()); + assertNotNull(result); + assertEquals(1, result.edges().size()); + CodeEdge loaded = result.edges().getFirst(); + assertEquals(value, loaded.getConfidence(), + "edge confidence must round-trip through the H2 cache"); + assertEquals("EdgeDetector", loaded.getSource()); + } + + @Test + void edge_bareConstructionDefaultsRoundTripAsLexicalAndNullSource() { + CodeNode src = new CodeNode("e:bare:src", NodeKind.CLASS, "Src"); + CodeNode tgt = new CodeNode("e:bare:tgt", NodeKind.CLASS, "Tgt"); + CodeEdge edge = new CodeEdge("e:bare:edge", EdgeKind.DEPENDS_ON, "e:bare:src", tgt); + + cache.storeResults("e-bare", "E.java", "java", + List.of(src, tgt), List.of(edge)); + + var result = cache.loadCachedResults("e-bare"); + assertNotNull(result); + CodeEdge loaded = result.edges().getFirst(); + assertEquals(Confidence.LEXICAL, loaded.getConfidence(), + "bare edge round-trips as LEXICAL"); + assertNull(loaded.getSource(), + "bare edge round-trips with null source"); + } + + @Test + void edge_setNullSourceNormalizesToLexicalNotNull() { + // Edge model setter normalizes null confidence → LEXICAL. Verify cache + // round-trip preserves this invariant: getConfidence() never returns null. + CodeNode src = new CodeNode("e:null:src", NodeKind.CLASS, "Src"); + CodeNode tgt = new CodeNode("e:null:tgt", NodeKind.CLASS, "Tgt"); + CodeEdge edge = new CodeEdge("e:null:edge", EdgeKind.DEPENDS_ON, "e:null:src", tgt); + edge.setConfidence(null); // setter normalizes to LEXICAL + edge.setSource(null); + + cache.storeResults("e-null", "E.java", "java", List.of(src, tgt), List.of(edge)); + + var result = cache.loadCachedResults("e-null"); + assertNotNull(result); + CodeEdge loaded = result.edges().getFirst(); + assertNotNull(loaded.getConfidence(), "confidence is never null at rest"); + assertEquals(Confidence.LEXICAL, loaded.getConfidence()); + } + + // ---------- Schema invariant ---------- + + @Test + void cacheVersionIsBumpedToFive() throws Exception { + // Reflection-driven assertion — confidence + source serialization is a + // breaking change to the JSON shape of cached rows. CACHE_VERSION must be + // bumped to 5 so existing v4 caches are dropped on next open. Reverting + // this without re-thinking the schema invalidation is a footgun. + Field f = AnalysisCache.class.getDeclaredField("CACHE_VERSION"); + f.setAccessible(true); + int version = (int) f.get(null); + assertEquals(5, version, + "CACHE_VERSION must be 5 after the confidence + source schema change"); + } +} From 91a9c51efbb6040cdee0807f7ef05477320c2d09 Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 16:53:47 +0000 Subject: [PATCH 15/23] feat(detector): set Confidence default per base class + stamp helper MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds Detector.defaultConfidence() with a default-method floor of LEXICAL. Each base class overrides where the floor differs: - AbstractRegexDetector → LEXICAL (regex patterns only) - AbstractAntlrDetector → SYNTACTIC (ANTLR parse trees) - AbstractStructuredDetector → SYNTACTIC (parsed YAML/JSON/TOML) - AbstractJavaParserDetector → SYNTACTIC (JavaParser AST) - AbstractJavaMessagingDetector → SYNTACTIC (java-aware regex) AbstractTypeScriptDetector, AbstractPythonAntlrDetector, and AbstractPythonDbDetector inherit SYNTACTIC via AbstractAntlrDetector — no explicit override needed; tests verify the inherited values. DetectorEmissionDefaults.applyDefaults(result, detector) is the new stamping pass for the orchestrator. It writes source + defaultConfidence() onto every node/edge whose getSource() is null — the "detector didn't think about it" sentinel. Explicit stamps survive the pass; e.g. a detector emitting RESOLVED is never down-graded back to the base default. Wiring this helper into Analyzer + IndexCommand is deferred to plan Task 19 (pipeline wiring). This commit only ships the building blocks. Test coverage (19 new tests in DetectorEmissionDefaultsTest): - Per-base default confidence (parameterized across 9 base/sub combos) - Stamping fills source + confidence on null-source nodes and edges - Explicit (RESOLVED + custom source) emissions survive the pass - Mixed result (some explicit, some bare) handled per-emission - null result → no-op (no NPE) - null detector → no-op (defensive) - Empty result → no-op - Idempotent on repeat call with same detector - Second pass with different detector does NOT relabel (first stamp wins) Per sub-project 1 spec §5 + plan Task 6. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../iq/detector/AbstractAntlrDetector.java | 12 + .../iq/detector/AbstractRegexDetector.java | 11 + .../detector/AbstractStructuredDetector.java | 10 + .../randomcodespace/iq/detector/Detector.java | 19 ++ .../iq/detector/DetectorEmissionDefaults.java | 61 +++++ .../java/AbstractJavaMessagingDetector.java | 12 + .../jvm/java/AbstractJavaParserDetector.java | 12 + .../DetectorEmissionDefaultsTest.java | 253 ++++++++++++++++++ 8 files changed, 390 insertions(+) create mode 100644 src/main/java/io/github/randomcodespace/iq/detector/DetectorEmissionDefaults.java create mode 100644 src/test/java/io/github/randomcodespace/iq/detector/DetectorEmissionDefaultsTest.java diff --git a/src/main/java/io/github/randomcodespace/iq/detector/AbstractAntlrDetector.java b/src/main/java/io/github/randomcodespace/iq/detector/AbstractAntlrDetector.java index 008f0407..efe56f30 100644 --- a/src/main/java/io/github/randomcodespace/iq/detector/AbstractAntlrDetector.java +++ b/src/main/java/io/github/randomcodespace/iq/detector/AbstractAntlrDetector.java @@ -1,5 +1,6 @@ package io.github.randomcodespace.iq.detector; +import io.github.randomcodespace.iq.model.Confidence; import org.antlr.v4.runtime.*; import org.antlr.v4.runtime.atn.PredictionMode; import org.antlr.v4.runtime.tree.ParseTree; @@ -22,6 +23,17 @@ public abstract class AbstractAntlrDetector extends AbstractRegexDetector { private static final Logger log = LoggerFactory.getLogger(AbstractAntlrDetector.class); + /** + * ANTLR parse trees are syntactic but not symbol-resolved — bump the + * regex-default {@link Confidence#LEXICAL} up to {@link Confidence#SYNTACTIC}. + * Subclasses that resolve symbols should call {@code setConfidence(RESOLVED)} + * explicitly on their emissions. + */ + @Override + public Confidence defaultConfidence() { + return Confidence.SYNTACTIC; + } + @Override public DetectorResult detect(DetectorContext ctx) { try { diff --git a/src/main/java/io/github/randomcodespace/iq/detector/AbstractRegexDetector.java b/src/main/java/io/github/randomcodespace/iq/detector/AbstractRegexDetector.java index b02d5be0..390d1068 100644 --- a/src/main/java/io/github/randomcodespace/iq/detector/AbstractRegexDetector.java +++ b/src/main/java/io/github/randomcodespace/iq/detector/AbstractRegexDetector.java @@ -1,5 +1,7 @@ package io.github.randomcodespace.iq.detector; +import io.github.randomcodespace.iq.model.Confidence; + import java.util.ArrayList; import java.util.List; @@ -9,6 +11,15 @@ */ public abstract class AbstractRegexDetector implements Detector { + /** + * Regex matches are pattern-only — no parse tree, no symbol resolution. + * Confidence floor for emissions from this base class is {@link Confidence#LEXICAL}. + */ + @Override + public Confidence defaultConfidence() { + return Confidence.LEXICAL; + } + /** * A single line of content with its 1-based line number. */ diff --git a/src/main/java/io/github/randomcodespace/iq/detector/AbstractStructuredDetector.java b/src/main/java/io/github/randomcodespace/iq/detector/AbstractStructuredDetector.java index 58e8592d..85685cce 100644 --- a/src/main/java/io/github/randomcodespace/iq/detector/AbstractStructuredDetector.java +++ b/src/main/java/io/github/randomcodespace/iq/detector/AbstractStructuredDetector.java @@ -2,6 +2,7 @@ import io.github.randomcodespace.iq.model.CodeEdge; import io.github.randomcodespace.iq.model.CodeNode; +import io.github.randomcodespace.iq.model.Confidence; import io.github.randomcodespace.iq.model.EdgeKind; import io.github.randomcodespace.iq.model.NodeKind; @@ -16,6 +17,15 @@ */ public abstract class AbstractStructuredDetector implements Detector { + /** + * Structured (YAML/JSON/TOML/properties) parsing produces a parsed shape, not + * just a regex match — confidence floor is {@link Confidence#SYNTACTIC}. + */ + @Override + public Confidence defaultConfidence() { + return Confidence.SYNTACTIC; + } + /** * Safely cast an object to {@code Map}. * Returns an empty map if the object is not a map. diff --git a/src/main/java/io/github/randomcodespace/iq/detector/Detector.java b/src/main/java/io/github/randomcodespace/iq/detector/Detector.java index 2a82c968..05530fa5 100644 --- a/src/main/java/io/github/randomcodespace/iq/detector/Detector.java +++ b/src/main/java/io/github/randomcodespace/iq/detector/Detector.java @@ -1,9 +1,28 @@ package io.github.randomcodespace.iq.detector; +import io.github.randomcodespace.iq.model.Confidence; + import java.util.Set; public interface Detector { String getName(); Set getSupportedLanguages(); DetectorResult detect(DetectorContext ctx); + + /** + * Confidence floor for nodes and edges this detector emits without explicitly + * setting one. Stamped by the orchestrator (see {@code DetectorEmissionDefaults}) + * onto every emission whose {@code source} is still null — i.e. the detector + * didn't explicitly stamp anything. Default is {@link Confidence#LEXICAL} — the + * least-committal floor; base classes override to bump up to + * {@link Confidence#SYNTACTIC} for AST-backed detection. + * + *

A detector with stronger evidence (e.g. a resolved symbol) should call + * {@code node.setConfidence(Confidence.RESOLVED)} explicitly — the stamping + * pass leaves explicitly-stamped values alone (it keys off {@code source == + * null}). + */ + default Confidence defaultConfidence() { + return Confidence.LEXICAL; + } } diff --git a/src/main/java/io/github/randomcodespace/iq/detector/DetectorEmissionDefaults.java b/src/main/java/io/github/randomcodespace/iq/detector/DetectorEmissionDefaults.java new file mode 100644 index 00000000..9ad674bd --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/detector/DetectorEmissionDefaults.java @@ -0,0 +1,61 @@ +package io.github.randomcodespace.iq.detector; + +import io.github.randomcodespace.iq.model.CodeEdge; +import io.github.randomcodespace.iq.model.CodeNode; +import io.github.randomcodespace.iq.model.Confidence; + +/** + * Stamps the orchestrator-managed confidence + source defaults onto a + * {@link DetectorResult}. This is invoked by the analyzer / index pipeline + * after each {@link Detector#detect(DetectorContext)} call so detectors stay + * blissfully unaware of the bookkeeping. + * + *

Stamping rule — for every node and edge in the result: + *

    + *
  • If {@code getSource() == null} (i.e. the detector did not explicitly + * stamp anything), the entry is treated as "wants defaults": + *
      + *
    • {@code source} is set to the detector's class simple name.
    • + *
    • {@code confidence} is set to {@link Detector#defaultConfidence()}.
    • + *
    + *
  • + *
  • If {@code getSource() != null} (the detector stamped explicitly), + * both fields are left alone — the detector knows what it's doing.
  • + *
+ * + *

The {@code source==null} sentinel is what lets us distinguish "detector + * didn't think about confidence" from "detector intentionally chose LEXICAL." + * Confidence is never null at rest (the model setter normalizes), so confidence + * alone can't tell us that. + */ +public final class DetectorEmissionDefaults { + + private DetectorEmissionDefaults() { } + + /** + * Apply orchestrator defaults to every node + edge in the result. Mutates + * the model objects in place — the result record itself is unchanged. + * + * @param result the detector's emission (must not be null) + * @param detector the detector that produced it (used for source name + + * default confidence) + */ + public static void applyDefaults(DetectorResult result, Detector detector) { + if (result == null || detector == null) return; + String defaultSource = detector.getClass().getSimpleName(); + Confidence defaultConfidence = detector.defaultConfidence(); + + for (CodeNode node : result.nodes()) { + if (node.getSource() == null) { + node.setSource(defaultSource); + node.setConfidence(defaultConfidence); + } + } + for (CodeEdge edge : result.edges()) { + if (edge.getSource() == null) { + edge.setSource(defaultSource); + edge.setConfidence(defaultConfidence); + } + } + } +} diff --git a/src/main/java/io/github/randomcodespace/iq/detector/jvm/java/AbstractJavaMessagingDetector.java b/src/main/java/io/github/randomcodespace/iq/detector/jvm/java/AbstractJavaMessagingDetector.java index b03c701d..bf40cffa 100644 --- a/src/main/java/io/github/randomcodespace/iq/detector/jvm/java/AbstractJavaMessagingDetector.java +++ b/src/main/java/io/github/randomcodespace/iq/detector/jvm/java/AbstractJavaMessagingDetector.java @@ -3,6 +3,7 @@ import io.github.randomcodespace.iq.detector.AbstractRegexDetector; import io.github.randomcodespace.iq.model.CodeEdge; import io.github.randomcodespace.iq.model.CodeNode; +import io.github.randomcodespace.iq.model.Confidence; import io.github.randomcodespace.iq.model.EdgeKind; import io.github.randomcodespace.iq.model.NodeKind; @@ -18,6 +19,17 @@ public abstract class AbstractJavaMessagingDetector extends AbstractRegexDetecto protected static final Pattern CLASS_RE = Pattern.compile("(?:public\\s+)?class\\s+(\\w+)"); + /** + * Java messaging detectors layer language-aware semantics on top of regex + * matching (matched class name → emit messaging edge with kind). Bump the + * inherited regex-default {@link Confidence#LEXICAL} up to + * {@link Confidence#SYNTACTIC}. + */ + @Override + public Confidence defaultConfidence() { + return Confidence.SYNTACTIC; + } + /** * Extract the first class name from the source text. * Returns null if no class is found. diff --git a/src/main/java/io/github/randomcodespace/iq/detector/jvm/java/AbstractJavaParserDetector.java b/src/main/java/io/github/randomcodespace/iq/detector/jvm/java/AbstractJavaParserDetector.java index cb25a9ef..6dd792a1 100644 --- a/src/main/java/io/github/randomcodespace/iq/detector/jvm/java/AbstractJavaParserDetector.java +++ b/src/main/java/io/github/randomcodespace/iq/detector/jvm/java/AbstractJavaParserDetector.java @@ -4,6 +4,7 @@ import com.github.javaparser.ast.CompilationUnit; import io.github.randomcodespace.iq.detector.AbstractRegexDetector; import io.github.randomcodespace.iq.detector.DetectorContext; +import io.github.randomcodespace.iq.model.Confidence; import java.util.Optional; @@ -16,6 +17,17 @@ public abstract class AbstractJavaParserDetector extends AbstractRegexDetector { private static final ThreadLocal PARSER = ThreadLocal.withInitial(JavaParser::new); + /** + * JavaParser produces an AST — bump the inherited regex-default + * {@link Confidence#LEXICAL} up to {@link Confidence#SYNTACTIC}. Detectors + * that resolve symbols via JavaSymbolSolver (Phase 6+) should call + * {@code setConfidence(RESOLVED)} on emissions. + */ + @Override + public Confidence defaultConfidence() { + return Confidence.SYNTACTIC; + } + /** * Attempt to parse the source content into a JavaParser CompilationUnit. */ diff --git a/src/test/java/io/github/randomcodespace/iq/detector/DetectorEmissionDefaultsTest.java b/src/test/java/io/github/randomcodespace/iq/detector/DetectorEmissionDefaultsTest.java new file mode 100644 index 00000000..47220b6a --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/detector/DetectorEmissionDefaultsTest.java @@ -0,0 +1,253 @@ +package io.github.randomcodespace.iq.detector; + +import io.github.randomcodespace.iq.detector.jvm.java.AbstractJavaMessagingDetector; +import io.github.randomcodespace.iq.detector.jvm.java.AbstractJavaParserDetector; +import io.github.randomcodespace.iq.detector.python.AbstractPythonAntlrDetector; +import io.github.randomcodespace.iq.detector.python.AbstractPythonDbDetector; +import io.github.randomcodespace.iq.detector.typescript.AbstractTypeScriptDetector; +import io.github.randomcodespace.iq.model.CodeEdge; +import io.github.randomcodespace.iq.model.CodeNode; +import io.github.randomcodespace.iq.model.Confidence; +import io.github.randomcodespace.iq.model.EdgeKind; +import io.github.randomcodespace.iq.model.NodeKind; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.Arguments; +import org.junit.jupiter.params.provider.MethodSource; + +import java.util.ArrayList; +import java.util.List; +import java.util.Set; +import java.util.stream.Stream; + +import static org.junit.jupiter.api.Assertions.*; + +/** + * Aggressive coverage for {@link Detector#defaultConfidence()} on every base + * class, plus the orchestrator stamping pass in {@link DetectorEmissionDefaults}. + * + *

Verifies the contract that lets us migrate detectors incrementally: + *

    + *
  • Each base class declares (or inherits) the right confidence floor.
  • + *
  • The stamping pass writes source + confidence ONLY when source is null + * (the "detector didn't think about it" sentinel).
  • + *
  • Explicitly-stamped emissions survive a stamping pass unchanged.
  • + *
  • Mixed results (some explicit, some default) get the right treatment + * on a per-emission basis.
  • + *
+ */ +class DetectorEmissionDefaultsTest { + + // ---------- Per-base default confidence ---------- + + static Stream baseClassDefaults() { + return Stream.of( + Arguments.of("interface default (LEXICAL)", new InterfaceOnlyDetector(), Confidence.LEXICAL), + Arguments.of("AbstractRegexDetector → LEXICAL", new RegexStub(), Confidence.LEXICAL), + Arguments.of("AbstractAntlrDetector → SYNTACTIC", new AntlrStub(), Confidence.SYNTACTIC), + Arguments.of("AbstractStructuredDetector → SYNTACTIC", new StructuredStub(), Confidence.SYNTACTIC), + Arguments.of("AbstractJavaParserDetector → SYNTACTIC", new JavaParserStub(), Confidence.SYNTACTIC), + Arguments.of("AbstractJavaMessagingDetector → SYNTACTIC", new JavaMessagingStub(), Confidence.SYNTACTIC), + Arguments.of("AbstractTypeScriptDetector inherits SYNTACTIC", new TypeScriptStub(), Confidence.SYNTACTIC), + Arguments.of("AbstractPythonAntlrDetector inherits SYNTACTIC", new PythonAntlrStub(), Confidence.SYNTACTIC), + Arguments.of("AbstractPythonDbDetector inherits SYNTACTIC", new PythonDbStub(), Confidence.SYNTACTIC) + ); + } + + @ParameterizedTest(name = "{0}") + @MethodSource("baseClassDefaults") + void defaultConfidencePerBaseClass(String label, Detector detector, Confidence expected) { + assertEquals(expected, detector.defaultConfidence(), label); + } + + // ---------- Stamping behavior ---------- + + @Test + void applyDefaults_stampsSourceAndConfidenceOnNullSourceNode() { + CodeNode node = new CodeNode("n:1", NodeKind.CLASS, "Foo"); + // Bare construction — source is null, confidence is the model default LEXICAL. + DetectorResult result = DetectorResult.of(new ArrayList<>(List.of(node)), new ArrayList<>()); + + DetectorEmissionDefaults.applyDefaults(result, new AntlrStub()); + + assertEquals("AntlrStub", node.getSource(), "source stamped to detector class simple name"); + assertEquals(Confidence.SYNTACTIC, node.getConfidence(), + "confidence bumped to base default (SYNTACTIC for AntlrStub)"); + } + + @Test + void applyDefaults_stampsSourceAndConfidenceOnNullSourceEdge() { + CodeNode tgt = new CodeNode("n:tgt", NodeKind.CLASS, "Tgt"); + CodeEdge edge = new CodeEdge("e:1", EdgeKind.DEPENDS_ON, "n:src", tgt); + DetectorResult result = DetectorResult.of(new ArrayList<>(), new ArrayList<>(List.of(edge))); + + DetectorEmissionDefaults.applyDefaults(result, new RegexStub()); + + assertEquals("RegexStub", edge.getSource()); + assertEquals(Confidence.LEXICAL, edge.getConfidence(), + "regex base default is LEXICAL"); + } + + @Test + void applyDefaults_leavesExplicitlyStampedNodeAlone() { + // Detector explicitly stamped — stamping pass must not clobber. + CodeNode node = new CodeNode("n:explicit", NodeKind.CLASS, "Foo"); + node.setSource("CustomResolverDetector"); + node.setConfidence(Confidence.RESOLVED); + DetectorResult result = DetectorResult.of(new ArrayList<>(List.of(node)), new ArrayList<>()); + + DetectorEmissionDefaults.applyDefaults(result, new AntlrStub()); + + assertEquals("CustomResolverDetector", node.getSource(), + "explicit source survives stamping pass"); + assertEquals(Confidence.RESOLVED, node.getConfidence(), + "explicit confidence survives stamping pass — not down-graded to base default"); + } + + @Test + void applyDefaults_leavesExplicitlyStampedEdgeAlone() { + CodeNode tgt = new CodeNode("n:tgt", NodeKind.CLASS, "Tgt"); + CodeEdge edge = new CodeEdge("e:explicit", EdgeKind.DEPENDS_ON, "n:src", tgt); + edge.setSource("ExplicitDetector"); + edge.setConfidence(Confidence.RESOLVED); + DetectorResult result = DetectorResult.of(new ArrayList<>(), new ArrayList<>(List.of(edge))); + + DetectorEmissionDefaults.applyDefaults(result, new RegexStub()); + + assertEquals("ExplicitDetector", edge.getSource()); + assertEquals(Confidence.RESOLVED, edge.getConfidence()); + } + + @Test + void applyDefaults_mixedExplicitAndDefaultsHandledIndependently() { + // One node was explicitly stamped, another wasn't. Verify the pass is + // per-emission, not all-or-nothing. + CodeNode explicit = new CodeNode("n:explicit", NodeKind.CLASS, "Explicit"); + explicit.setSource("ResolverDetector"); + explicit.setConfidence(Confidence.RESOLVED); + + CodeNode bare = new CodeNode("n:bare", NodeKind.CLASS, "Bare"); + + DetectorResult result = DetectorResult.of( + new ArrayList<>(List.of(explicit, bare)), + new ArrayList<>()); + + DetectorEmissionDefaults.applyDefaults(result, new StructuredStub()); + + // Explicit untouched + assertEquals("ResolverDetector", explicit.getSource()); + assertEquals(Confidence.RESOLVED, explicit.getConfidence()); + // Bare stamped + assertEquals("StructuredStub", bare.getSource()); + assertEquals(Confidence.SYNTACTIC, bare.getConfidence()); + } + + @Test + void applyDefaults_nullResultIsNoOp() { + // Defensive: callers may pass null on early returns. Must not NPE. + assertDoesNotThrow(() -> DetectorEmissionDefaults.applyDefaults(null, new RegexStub())); + } + + @Test + void applyDefaults_nullDetectorIsNoOp() { + // Defensive: the orchestrator should never pass null but the helper + // is the single trust boundary — must not NPE. + CodeNode node = new CodeNode("n:1", NodeKind.CLASS, "Foo"); + DetectorResult result = DetectorResult.of(new ArrayList<>(List.of(node)), new ArrayList<>()); + assertDoesNotThrow(() -> DetectorEmissionDefaults.applyDefaults(result, null)); + // Model state is untouched + assertNull(node.getSource()); + } + + @Test + void applyDefaults_emptyResultIsNoOp() { + DetectorResult result = DetectorResult.empty(); + assertDoesNotThrow(() -> DetectorEmissionDefaults.applyDefaults(result, new RegexStub())); + assertEquals(0, result.nodes().size()); + assertEquals(0, result.edges().size()); + } + + @Test + void applyDefaults_idempotentOnRepeatCall() { + // After the first stamp, the detector "owns" these emissions. A second + // stamping pass with the SAME detector is a no-op (source is no longer null). + CodeNode node = new CodeNode("n:idem", NodeKind.CLASS, "Foo"); + DetectorResult result = DetectorResult.of(new ArrayList<>(List.of(node)), new ArrayList<>()); + Detector detector = new AntlrStub(); + + DetectorEmissionDefaults.applyDefaults(result, detector); + String firstSource = node.getSource(); + Confidence firstConfidence = node.getConfidence(); + + DetectorEmissionDefaults.applyDefaults(result, detector); + + assertEquals(firstSource, node.getSource()); + assertEquals(firstConfidence, node.getConfidence()); + } + + @Test + void applyDefaults_secondPassWithDifferentDetectorIsAlsoNoOp() { + // After first stamp, source is set — a different detector running over + // the same result must NOT relabel the node. (This guards against pipeline + // reorder bugs where two detectors emit the same node.) + CodeNode node = new CodeNode("n:multi", NodeKind.CLASS, "Foo"); + DetectorResult result = DetectorResult.of(new ArrayList<>(List.of(node)), new ArrayList<>()); + + DetectorEmissionDefaults.applyDefaults(result, new AntlrStub()); + DetectorEmissionDefaults.applyDefaults(result, new RegexStub()); // different detector + + assertEquals("AntlrStub", node.getSource(), + "first detector's stamp wins — second pass is no-op"); + assertEquals(Confidence.SYNTACTIC, node.getConfidence()); + } + + // ---------- Test-only stub detectors ---------- + + /** Bare interface implementation — uses the interface's default LEXICAL. */ + private static final class InterfaceOnlyDetector implements Detector { + @Override public String getName() { return "iface_stub"; } + @Override public Set getSupportedLanguages() { return Set.of("test"); } + @Override public DetectorResult detect(DetectorContext ctx) { return DetectorResult.empty(); } + } + + private static final class RegexStub extends AbstractRegexDetector { + @Override public String getName() { return "regex_stub"; } + @Override public Set getSupportedLanguages() { return Set.of("test"); } + @Override public DetectorResult detect(DetectorContext ctx) { return DetectorResult.empty(); } + } + + private static final class AntlrStub extends AbstractAntlrDetector { + @Override public String getName() { return "antlr_stub"; } + @Override public Set getSupportedLanguages() { return Set.of("test"); } + } + + private static final class StructuredStub extends AbstractStructuredDetector { + @Override public String getName() { return "structured_stub"; } + @Override public Set getSupportedLanguages() { return Set.of("yaml"); } + @Override public DetectorResult detect(DetectorContext ctx) { return DetectorResult.empty(); } + } + + private static final class JavaParserStub extends AbstractJavaParserDetector { + @Override public String getName() { return "javaparser_stub"; } + @Override public Set getSupportedLanguages() { return Set.of("java"); } + @Override public DetectorResult detect(DetectorContext ctx) { return DetectorResult.empty(); } + } + + private static final class JavaMessagingStub extends AbstractJavaMessagingDetector { + @Override public String getName() { return "messaging_stub"; } + @Override public Set getSupportedLanguages() { return Set.of("java"); } + @Override public DetectorResult detect(DetectorContext ctx) { return DetectorResult.empty(); } + } + + private static final class TypeScriptStub extends AbstractTypeScriptDetector { + @Override public String getName() { return "ts_stub"; } + } + + private static final class PythonAntlrStub extends AbstractPythonAntlrDetector { + @Override public String getName() { return "python_antlr_stub"; } + } + + private static final class PythonDbStub extends AbstractPythonDbDetector { + @Override public String getName() { return "python_db_stub"; } + } +} From 49285d522a750acb5a55b9b827347b364af9f3e3 Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 16:56:20 +0000 Subject: [PATCH 16/23] feat(resolver): add Resolved/EmptyResolved/SymbolResolver SPI + ResolutionException MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Foundation for the resolver pass that sits between parse and detect. Per- language backends implement SymbolResolver; per-file results carry language-specific resolution state via a Resolved subclass. EmptyResolved is the singleton sentinel returned when resolution didn't happen — its isAvailable() returns false so detectors short-circuit to syntactic detection. ResolutionException is checked, by design — symbol resolution has a long tail of file-specific failures (corrupted source, classpath holes, dependency cycles) and the orchestrator must explicitly decide whether to skip the file or abort the pass. It carries (file, language) for useful logs. No wiring yet — the orchestrator picks these up in Task 11 (ResolverRegistry) + Task 19 (pipeline wiring). Test coverage (16 new tests): - ResolvedContractTest (6): EmptyResolved singleton + reflection guards - ResolutionExceptionTest (4): file/language/cause + checked-ness - SymbolResolverContractTest (6): supportedLanguages non-empty, bootstrap-before-resolve, resolve never returns null (uses EmptyResolved for unsupported language / null AST), default shutdown no-op Per sub-project 1 spec §6.1 + plan Tasks 8-10. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../intelligence/resolver/EmptyResolved.java | 34 +++++ .../resolver/ResolutionException.java | 48 +++++++ .../iq/intelligence/resolver/Resolved.java | 38 ++++++ .../intelligence/resolver/SymbolResolver.java | 77 +++++++++++ .../resolver/ResolutionExceptionTest.java | 55 ++++++++ .../resolver/ResolvedContractTest.java | 68 ++++++++++ .../resolver/SymbolResolverContractTest.java | 126 ++++++++++++++++++ 7 files changed, 446 insertions(+) create mode 100644 src/main/java/io/github/randomcodespace/iq/intelligence/resolver/EmptyResolved.java create mode 100644 src/main/java/io/github/randomcodespace/iq/intelligence/resolver/ResolutionException.java create mode 100644 src/main/java/io/github/randomcodespace/iq/intelligence/resolver/Resolved.java create mode 100644 src/main/java/io/github/randomcodespace/iq/intelligence/resolver/SymbolResolver.java create mode 100644 src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolutionExceptionTest.java create mode 100644 src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolvedContractTest.java create mode 100644 src/test/java/io/github/randomcodespace/iq/intelligence/resolver/SymbolResolverContractTest.java diff --git a/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/EmptyResolved.java b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/EmptyResolved.java new file mode 100644 index 00000000..16227581 --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/EmptyResolved.java @@ -0,0 +1,34 @@ +package io.github.randomcodespace.iq.intelligence.resolver; + +import io.github.randomcodespace.iq.model.Confidence; + +/** + * Singleton "no resolution" {@link Resolved} — what + * {@link io.github.randomcodespace.iq.intelligence.resolver.SymbolResolver} + * returns when it can't resolve a file (parse failure, unsupported language, + * resolver disabled, or no resolver registered for this file's language). + * + *

Detectors must check {@link #isAvailable()} before downcasting; they will + * always get {@code false} from this singleton, signalling "fall back to + * syntactic detection." + */ +public final class EmptyResolved implements Resolved { + + /** The single instance — comparable via {@code ==}. */ + public static final EmptyResolved INSTANCE = new EmptyResolved(); + + private EmptyResolved() { } + + @Override + public boolean isAvailable() { + return false; + } + + @Override + public Confidence sourceConfidence() { + // Nothing was actually resolved — emissions consulting this should NOT + // claim RESOLVED confidence. LEXICAL is the floor; a syntactic detector + // emitting against EmptyResolved still has its own SYNTACTIC base default. + return Confidence.LEXICAL; + } +} diff --git a/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/ResolutionException.java b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/ResolutionException.java new file mode 100644 index 00000000..4a58d80e --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/ResolutionException.java @@ -0,0 +1,48 @@ +package io.github.randomcodespace.iq.intelligence.resolver; + +import java.nio.file.Path; + +/** + * Thrown by a {@link SymbolResolver} when bootstrap or per-file resolution + * fails in a way the resolver cannot recover from. Carries enough context + * (file path + language) for the orchestrator to log a useful message before + * falling back to syntactic detection. + * + *

Checked exception by design — symbol resolution is a long-tail of file- + * specific failures (corrupted source, dependency cycles, classpath holes), + * and the orchestrator must explicitly decide whether to skip the file or + * abort the whole pass. Swallowing silently is not an option. + */ +public class ResolutionException extends Exception { + + private final Path file; + private final String language; + + /** + * @param message human-readable description of the failure + * @param cause underlying exception (may be null) + * @param file the file (or project root for bootstrap failures) that + * couldn't be resolved + * @param language the language identifier for the resolver involved + */ + public ResolutionException(String message, Throwable cause, Path file, String language) { + super(message, cause); + this.file = file; + this.language = language; + } + + /** Convenience constructor without an underlying cause. */ + public ResolutionException(String message, Path file, String language) { + this(message, null, file, language); + } + + /** @return the file (or project root) that couldn't be resolved. May be {@code null}. */ + public Path file() { + return file; + } + + /** @return the language identifier (e.g. {@code "java"}). May be {@code null}. */ + public String language() { + return language; + } +} diff --git a/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/Resolved.java b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/Resolved.java new file mode 100644 index 00000000..475313a2 --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/Resolved.java @@ -0,0 +1,38 @@ +package io.github.randomcodespace.iq.intelligence.resolver; + +import io.github.randomcodespace.iq.model.Confidence; + +/** + * Per-file symbol resolution result. + * + *

A {@code Resolved} carries language-specific resolution state that detectors + * can consult to upgrade their emissions from {@link Confidence#SYNTACTIC} to + * {@link Confidence#RESOLVED}. Each language backend ships its own concrete + * implementation (e.g. {@code JavaResolved} wraps a {@code JavaSymbolSolver} + * plus a {@code CompilationUnit}); detectors that want resolved data downcast + * after checking {@link #isAvailable()}. + * + *

{@link #isAvailable()} is the first gate every detector should consult. + * If it returns {@code false}, the resolver wasn't able to resolve this file — + * detectors must fall back to syntactic detection. The {@link EmptyResolved} + * singleton is the canonical "not available" instance. + */ +public interface Resolved { + + /** + * @return {@code true} if this result actually carries resolved-symbol data + * and detectors may safely downcast to a language-specific subtype. + * {@code false} for {@link EmptyResolved} or any other backend that + * declined to resolve this file (e.g. parse failure, unsupported + * language, or resolver disabled). + */ + boolean isAvailable(); + + /** + * @return the confidence floor the orchestrator should stamp on emissions + * that consult this resolution. {@link Confidence#RESOLVED} for + * genuine resolution; {@link Confidence#LEXICAL} for + * {@link EmptyResolved} (i.e. nothing was actually resolved). + */ + Confidence sourceConfidence(); +} diff --git a/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/SymbolResolver.java b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/SymbolResolver.java new file mode 100644 index 00000000..e663185c --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/SymbolResolver.java @@ -0,0 +1,77 @@ +package io.github.randomcodespace.iq.intelligence.resolver; + +import io.github.randomcodespace.iq.analyzer.DiscoveredFile; + +import java.nio.file.Path; +import java.util.Set; + +/** + * Per-language symbol-resolution backend. The Resolver SPI mirrors the + * {@link io.github.randomcodespace.iq.detector.Detector} SPI: each implementation + * is a Spring {@code @Component} declaring which languages it handles, and the + * {@link ResolverRegistry} auto-discovers them at startup. + * + *

Lifecycle: + *

    + *
  1. The orchestrator calls {@link #bootstrap(Path)} once with the project + * root before any per-file work. The resolver builds whatever it needs + * (type solvers, classpath, etc.).
  2. + *
  3. For each parsed file, the orchestrator calls + * {@link #resolve(DiscoveredFile, Object)} with the parsed AST. The + * resolver returns a language-specific {@link Resolved} carrying the + * resolution context, or {@link EmptyResolved#INSTANCE} if the file + * isn't its language.
  4. + *
  5. {@link #shutdown()} is called once at the end of the pass for cleanup + * (default no-op).
  6. + *
+ * + *

Thread safety: implementations must be safe to invoke + * {@link #resolve(DiscoveredFile, Object)} concurrently from virtual threads + * after a single {@link #bootstrap(Path)} call. Detector pipelines run on + * virtual-thread pools. + * + *

Determinism: if the resolver depends on source roots or classpath, those + * inputs must be sorted before construction so two runs over the same project + * produce identical resolution results. + */ +public interface SymbolResolver { + + /** + * @return language identifiers this resolver handles, lowercase, e.g. + * {@code Set.of("java")} or {@code Set.of("typescript", + * "javascript")}. Never empty, never null. + */ + Set getSupportedLanguages(); + + /** + * Build whatever language-specific resolution state is needed for a single + * project root. Called once per analysis pass before any + * {@link #resolve(DiscoveredFile, Object)} call. + * + * @param projectRoot absolute path to the project root being analyzed + * @throws ResolutionException if bootstrap fails irrecoverably (the + * orchestrator will log and disable this resolver for the pass) + */ + void bootstrap(Path projectRoot) throws ResolutionException; + + /** + * Resolve symbols for a single parsed file. + * + * @param file the file being detected + * @param parsedAst the AST produced by the parser pipeline. Type is + * language-specific (e.g. {@code CompilationUnit} for + * Java, {@code ParseTree} for ANTLR languages); the + * resolver checks via {@code instanceof}. + * @return language-specific {@link Resolved} on success, or + * {@link EmptyResolved#INSTANCE} if this file isn't this + * resolver's language or {@code parsedAst} is the wrong type. + * Must never return {@code null}. + * @throws ResolutionException for irrecoverable per-file failures the + * orchestrator should surface (rare; most failures should + * downgrade to {@link EmptyResolved#INSTANCE} silently). + */ + Resolved resolve(DiscoveredFile file, Object parsedAst) throws ResolutionException; + + /** Cleanup hook. Default no-op. */ + default void shutdown() { } +} diff --git a/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolutionExceptionTest.java b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolutionExceptionTest.java new file mode 100644 index 00000000..bd420b65 --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolutionExceptionTest.java @@ -0,0 +1,55 @@ +package io.github.randomcodespace.iq.intelligence.resolver; + +import org.junit.jupiter.api.Test; + +import java.nio.file.Path; + +import static org.junit.jupiter.api.Assertions.*; + +/** + * Aggressive coverage for {@link ResolutionException}. Verifies it carries + * actionable context (file + language) so the orchestrator can log usefully. + */ +class ResolutionExceptionTest { + + @Test + void carriesMessageFileAndLanguage() { + Path p = Path.of("/tmp/Foo.java"); + ResolutionException e = new ResolutionException("bootstrap failed", p, "java"); + + assertEquals("bootstrap failed", e.getMessage()); + assertEquals(p, e.file()); + assertEquals("java", e.language()); + assertNull(e.getCause(), "no underlying cause when constructed without one"); + } + + @Test + void carriesUnderlyingCause() { + Path p = Path.of("/tmp/Foo.java"); + Exception root = new IllegalStateException("classpath broken"); + ResolutionException e = new ResolutionException("bootstrap failed", root, p, "java"); + + assertSame(root, e.getCause(), "underlying cause is preserved"); + assertEquals(p, e.file()); + assertEquals("java", e.language()); + } + + @Test + void nullFileAndLanguageAreAllowed() { + // Defensive: some callers may not have file/language at hand. + // The exception should still construct without NPE. + ResolutionException e = new ResolutionException("generic failure", null, null); + assertNull(e.file()); + assertNull(e.language()); + assertEquals("generic failure", e.getMessage()); + } + + @Test + void isCheckedException() { + // The exception is checked by design — orchestrators must catch and + // decide whether to skip the file or abort the pass. + assertFalse(RuntimeException.class.isAssignableFrom(ResolutionException.class), + "ResolutionException must be a checked exception (subclass of Exception, not RuntimeException)"); + assertTrue(Exception.class.isAssignableFrom(ResolutionException.class)); + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolvedContractTest.java b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolvedContractTest.java new file mode 100644 index 00000000..8f49d10a --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolvedContractTest.java @@ -0,0 +1,68 @@ +package io.github.randomcodespace.iq.intelligence.resolver; + +import io.github.randomcodespace.iq.model.Confidence; +import org.junit.jupiter.api.Test; + +import static org.junit.jupiter.api.Assertions.*; + +/** + * Contract tests for {@link Resolved} and the {@link EmptyResolved} singleton. + * + *

{@link EmptyResolved} is a load-bearing sentinel — detectors check + * {@link Resolved#isAvailable()} == false to decide "fall back to syntactic + * detection." Anything that breaks the singleton invariants below is a bug. + */ +class ResolvedContractTest { + + @Test + void emptyResolvedIsSingleton() { + // Reference equality — detectors may use `==` to short-circuit + // (e.g. `if (resolved == EmptyResolved.INSTANCE) return ...`) + assertSame(EmptyResolved.INSTANCE, EmptyResolved.INSTANCE); + } + + @Test + void emptyResolvedReportsNotAvailable() { + assertFalse(EmptyResolved.INSTANCE.isAvailable(), + "EmptyResolved must always report not-available — it's the 'no resolution' sentinel"); + } + + @Test + void emptyResolvedConfidenceFloorIsLexical() { + // Resolution didn't happen — emissions consulting EmptyResolved should + // never claim RESOLVED. LEXICAL is the safe floor. + assertEquals(Confidence.LEXICAL, EmptyResolved.INSTANCE.sourceConfidence(), + "EmptyResolved floor is LEXICAL — nothing was actually resolved"); + } + + @Test + void emptyResolvedConstructorIsPrivate() throws Exception { + // Defensive: prevent rogue subclasses from violating the singleton. + var ctor = EmptyResolved.class.getDeclaredConstructor(); + assertTrue(java.lang.reflect.Modifier.isPrivate(ctor.getModifiers()), + "EmptyResolved must have a private constructor"); + } + + @Test + void emptyResolvedClassIsFinal() { + // Singletons must not be subclassable — a subclass could return true + // from isAvailable() and break the contract. + assertTrue(java.lang.reflect.Modifier.isFinal(EmptyResolved.class.getModifiers()), + "EmptyResolved must be final to preserve singleton invariants"); + } + + @Test + void resolvedInterfaceContractAvailableImpliesNonLexical() { + // Documents the convention via a custom test impl: a Resolved that + // claims isAvailable==true is expected to expose a non-LEXICAL floor + // (LEXICAL is reserved for "nothing resolved"). This isn't enforced by + // the interface — it's a contract the tests document. + Resolved fakeResolved = new Resolved() { + @Override public boolean isAvailable() { return true; } + @Override public Confidence sourceConfidence() { return Confidence.RESOLVED; } + }; + assertTrue(fakeResolved.isAvailable()); + assertEquals(Confidence.RESOLVED, fakeResolved.sourceConfidence(), + "available Resolved instances should expose RESOLVED (or higher)"); + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/SymbolResolverContractTest.java b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/SymbolResolverContractTest.java new file mode 100644 index 00000000..24a8393f --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/SymbolResolverContractTest.java @@ -0,0 +1,126 @@ +package io.github.randomcodespace.iq.intelligence.resolver; + +import io.github.randomcodespace.iq.analyzer.DiscoveredFile; +import io.github.randomcodespace.iq.model.Confidence; +import org.junit.jupiter.api.Test; + +import java.nio.file.Path; +import java.util.Set; +import java.util.concurrent.atomic.AtomicBoolean; + +import static org.junit.jupiter.api.Assertions.*; + +/** + * Contract coverage for {@link SymbolResolver}. Verifies a stub implementation + * honours the SPI invariants: + *

    + *
  • {@link SymbolResolver#getSupportedLanguages()} returns a non-empty set
  • + *
  • {@link SymbolResolver#bootstrap(Path)} runs before any + * {@link SymbolResolver#resolve(DiscoveredFile, Object)} call
  • + *
  • {@link SymbolResolver#resolve(DiscoveredFile, Object)} never returns + * {@code null} — uses {@link EmptyResolved#INSTANCE} for the + * not-supported / wrong-type cases
  • + *
  • {@link SymbolResolver#shutdown()} default is a no-op
  • + *
+ */ +class SymbolResolverContractTest { + + @Test + void supportedLanguagesIsNonEmpty() { + SymbolResolver r = new StubResolver(Set.of("java")); + assertFalse(r.getSupportedLanguages().isEmpty()); + assertEquals(Set.of("java"), r.getSupportedLanguages()); + } + + @Test + void resolveReturnsEmptyForUnknownLanguage() throws ResolutionException { + SymbolResolver r = new StubResolver(Set.of("java")); + r.bootstrap(Path.of("/tmp/project")); + + DiscoveredFile pyFile = new DiscoveredFile(Path.of("foo.py"), "python", 100); + Resolved result = r.resolve(pyFile, "some-ast"); + + assertSame(EmptyResolved.INSTANCE, result, + "unknown-language file returns EmptyResolved, never null"); + } + + @Test + void resolveReturnsAvailableResolvedForSupportedLanguage() throws ResolutionException { + StubResolver r = new StubResolver(Set.of("java")); + r.bootstrap(Path.of("/tmp/project")); + + DiscoveredFile javaFile = new DiscoveredFile(Path.of("Foo.java"), "java", 100); + Resolved result = r.resolve(javaFile, "fake-cu"); + + assertNotSame(EmptyResolved.INSTANCE, result); + assertTrue(result.isAvailable()); + assertEquals(Confidence.RESOLVED, result.sourceConfidence()); + } + + @Test + void resolveNeverReturnsNull() throws ResolutionException { + // Even with a null AST, the contract forbids returning null — + // the resolver must downgrade to EmptyResolved. + StubResolver r = new StubResolver(Set.of("java")); + r.bootstrap(Path.of("/tmp/project")); + + DiscoveredFile javaFile = new DiscoveredFile(Path.of("Foo.java"), "java", 100); + Resolved result = r.resolve(javaFile, null); // null AST + + assertNotNull(result, "resolve() must never return null"); + assertSame(EmptyResolved.INSTANCE, result, + "null AST falls back to EmptyResolved"); + } + + @Test + void shutdownDefaultIsNoOp() { + // The interface provides a default {} shutdown — verify it runs without + // throwing on a stub that doesn't override. + SymbolResolver r = new SymbolResolver() { + @Override public Set getSupportedLanguages() { return Set.of("java"); } + @Override public void bootstrap(Path projectRoot) { } + @Override public Resolved resolve(DiscoveredFile file, Object parsedAst) { + return EmptyResolved.INSTANCE; + } + // shutdown not overridden — uses interface default + }; + assertDoesNotThrow(r::shutdown); + } + + @Test + void bootstrapOnlyCalledOnce_resolverState() throws ResolutionException { + // A well-formed resolver should idempotently set up its state on a + // single bootstrap. Verified via the stub's flag. + StubResolver r = new StubResolver(Set.of("java")); + assertFalse(r.bootstrapped.get()); + r.bootstrap(Path.of("/tmp/project")); + assertTrue(r.bootstrapped.get()); + } + + /** Test-only resolver: returns a mock available Resolved for matching languages. */ + private static final class StubResolver implements SymbolResolver { + private final Set languages; + final AtomicBoolean bootstrapped = new AtomicBoolean(false); + + StubResolver(Set languages) { + this.languages = languages; + } + + @Override public Set getSupportedLanguages() { return languages; } + + @Override + public void bootstrap(Path projectRoot) { + bootstrapped.set(true); + } + + @Override + public Resolved resolve(DiscoveredFile file, Object parsedAst) { + if (!languages.contains(file.language())) return EmptyResolved.INSTANCE; + if (parsedAst == null) return EmptyResolved.INSTANCE; + return new Resolved() { + @Override public boolean isAvailable() { return true; } + @Override public Confidence sourceConfidence() { return Confidence.RESOLVED; } + }; + } + } +} From da74167236155ca27d76f3d5ebff6c8e4e0fc8c8 Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 16:58:28 +0000 Subject: [PATCH 17/23] feat(resolver): add ResolverRegistry with auto-discovery MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Spring @Service that mirrors DetectorRegistry: every @Component implementing SymbolResolver is auto-injected via constructor. Resolvers are sorted alphabetically by class simple name for determinism; per-language lookup uses first-in-sort-order wins on conflict. bootstrap(projectRoot) iterates in deterministic order and is resilient — per-resolver ResolutionException (or rogue RuntimeException) is logged at WARN and swallowed so one broken resolver can't take down the pass. resolverFor(language) is case-insensitive, null-safe, and never returns null — unknown languages get the NOOP resolver that always returns EmptyResolved.INSTANCE. Test coverage (13 new tests in ResolverRegistryTest): - Empty registry returns NOOP for any language - Single resolver returned for its declared language - Unknown language returns NOOP - Case-insensitive lookup (java, Java, JAVA, jAvA) - Null language returns NOOP without NPE - resolverFor() never returns null (probed with empty/whitespace input) - Blank language identifiers from a resolver are skipped - Duplicate-language conflict: alphabetical-first wins - all() returns sorted list - bootstrap iterates alphabetically (verified via callback ordering) - bootstrap continues past RuntimeException - bootstrap continues past ResolutionException - bootstrap empty registry is a no-op Per sub-project 1 spec §6.1 + plan Task 11. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../resolver/ResolverRegistry.java | 107 +++++++++ .../resolver/ResolverRegistryTest.java | 206 ++++++++++++++++++ 2 files changed, 313 insertions(+) create mode 100644 src/main/java/io/github/randomcodespace/iq/intelligence/resolver/ResolverRegistry.java create mode 100644 src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolverRegistryTest.java diff --git a/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/ResolverRegistry.java b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/ResolverRegistry.java new file mode 100644 index 00000000..932400a5 --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/ResolverRegistry.java @@ -0,0 +1,107 @@ +package io.github.randomcodespace.iq.intelligence.resolver; + +import io.github.randomcodespace.iq.analyzer.DiscoveredFile; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.springframework.stereotype.Service; + +import java.nio.file.Path; +import java.util.Comparator; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; + +/** + * Spring-managed registry for {@link SymbolResolver} backends. Mirrors + * {@link io.github.randomcodespace.iq.detector.DetectorRegistry}: every + * {@code @Component} implementing {@link SymbolResolver} is auto-injected via + * the constructor. + * + *

Determinism: resolvers are sorted by {@link Class#getSimpleName()} + * alphabetically before any other operation. {@link #bootstrap(Path)} iterates + * in this order; per-language lookup uses "first-in-sort-order wins" if two + * resolvers claim the same language. Same input → same resolution behavior, + * every time. + * + *

Resilience: {@link #bootstrap(Path)} catches per-resolver + * {@link ResolutionException} so one misbehaving resolver can't take down the + * whole pass. Each resolver's own {@link SymbolResolver#resolve} handles its + * post-bootstrap state — if bootstrap failed, the resolver should return + * {@link EmptyResolved#INSTANCE} from its resolve() method (its own concern). + */ +@Service +public class ResolverRegistry { + + private static final Logger log = LoggerFactory.getLogger(ResolverRegistry.class); + + /** Singleton no-op resolver — returned for unknown languages or null input. */ + static final SymbolResolver NOOP = new NoopResolver(); + + private final List resolvers; + private final Map byLanguage; + + public ResolverRegistry(List resolvers) { + // Deterministic order: alphabetical by class simple name. + this.resolvers = resolvers.stream() + .sorted(Comparator.comparing(r -> r.getClass().getSimpleName())) + .toList(); + + // First-in-sort-order wins per language (deterministic conflict resolution). + Map map = new HashMap<>(); + for (SymbolResolver r : this.resolvers) { + for (String lang : r.getSupportedLanguages()) { + if (lang == null || lang.isBlank()) continue; + map.putIfAbsent(lang.toLowerCase(), r); + } + } + this.byLanguage = Map.copyOf(map); + } + + /** + * Bootstrap every registered resolver against the given project root. + * Iterates in deterministic (alphabetical) order. Per-resolver failures + * are logged at WARN and swallowed so one broken resolver doesn't cascade. + */ + public void bootstrap(Path projectRoot) { + for (SymbolResolver r : resolvers) { + try { + r.bootstrap(projectRoot); + } catch (ResolutionException e) { + log.warn("resolver {} bootstrap failed for {}: {}", + r.getClass().getSimpleName(), projectRoot, e.getMessage()); + } catch (RuntimeException e) { + // Defensive — resolvers shouldn't throw RuntimeException, but + // if they do, don't take down the pass. + log.warn("resolver {} bootstrap threw unexpectedly for {}: {}", + r.getClass().getSimpleName(), projectRoot, e.toString()); + } + } + } + + /** + * Look up the resolver for a given language identifier. + * + * @param language language identifier (case-insensitive). May be null. + * @return the matching resolver, or a no-op resolver returning + * {@link EmptyResolved#INSTANCE}. Never null. + */ + public SymbolResolver resolverFor(String language) { + if (language == null) return NOOP; + return byLanguage.getOrDefault(language.toLowerCase(), NOOP); + } + + /** @return all registered resolvers in deterministic order (alphabetical by class simple name). */ + public List all() { + return resolvers; + } + + /** Singleton no-op — claims no languages, bootstrap is a no-op, resolve always returns EmptyResolved. */ + static final class NoopResolver implements SymbolResolver { + @Override public Set getSupportedLanguages() { return Set.of(); } + @Override public void bootstrap(Path projectRoot) { } + @Override public Resolved resolve(DiscoveredFile file, Object parsedAst) { + return EmptyResolved.INSTANCE; + } + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolverRegistryTest.java b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolverRegistryTest.java new file mode 100644 index 00000000..ab77cabb --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolverRegistryTest.java @@ -0,0 +1,206 @@ +package io.github.randomcodespace.iq.intelligence.resolver; + +import io.github.randomcodespace.iq.analyzer.DiscoveredFile; +import io.github.randomcodespace.iq.model.Confidence; +import org.junit.jupiter.api.Test; + +import java.nio.file.Path; +import java.util.ArrayList; +import java.util.List; +import java.util.Set; +import java.util.concurrent.atomic.AtomicInteger; + +import static org.junit.jupiter.api.Assertions.*; + +/** + * Aggressive coverage for {@link ResolverRegistry}. Exercises the determinism, + * conflict resolution, case-insensitivity, null tolerance, and per-resolver + * failure isolation contracts. + */ +class ResolverRegistryTest { + + // ---------- Lookup ---------- + + @Test + void emptyRegistryReturnsNoopForAnyLanguage() throws ResolutionException { + ResolverRegistry registry = new ResolverRegistry(List.of()); + SymbolResolver r = registry.resolverFor("java"); + assertSame(ResolverRegistry.NOOP, r); + + // The NOOP must always return EmptyResolved + Resolved result = r.resolve(new DiscoveredFile(Path.of("Foo.java"), "java", 100), "ast"); + assertSame(EmptyResolved.INSTANCE, result); + } + + @Test + void singleResolverIsReturnedForItsLanguage() { + AStubResolver java = new AStubResolver("java"); + ResolverRegistry registry = new ResolverRegistry(List.of(java)); + assertSame(java, registry.resolverFor("java")); + } + + @Test + void unknownLanguageReturnsNoop() { + AStubResolver java = new AStubResolver("java"); + ResolverRegistry registry = new ResolverRegistry(List.of(java)); + assertSame(ResolverRegistry.NOOP, registry.resolverFor("python")); + } + + @Test + void languageLookupIsCaseInsensitive() { + AStubResolver java = new AStubResolver("java"); + ResolverRegistry registry = new ResolverRegistry(List.of(java)); + assertSame(java, registry.resolverFor("Java")); + assertSame(java, registry.resolverFor("JAVA")); + assertSame(java, registry.resolverFor("jAvA")); + } + + @Test + void nullLanguageReturnsNoopWithoutNpe() { + AStubResolver java = new AStubResolver("java"); + ResolverRegistry registry = new ResolverRegistry(List.of(java)); + // Defensive: null is a sentinel, not an error + assertSame(ResolverRegistry.NOOP, registry.resolverFor(null)); + } + + @Test + void resolverForNeverReturnsNull() { + ResolverRegistry registry = new ResolverRegistry(List.of()); + assertNotNull(registry.resolverFor("java")); + assertNotNull(registry.resolverFor("python")); + assertNotNull(registry.resolverFor("")); + assertNotNull(registry.resolverFor("\t\n")); + } + + @Test + void blankLanguageReturnsNoop() { + // Detector contract: getSupportedLanguages should never include blank/empty strings. + // The registry defensively skips them so a misbehaving resolver doesn't poison + // lookup for "" . + AStubResolver java = new AStubResolver("java"); + ResolverRegistry registry = new ResolverRegistry(List.of(java)); + assertSame(ResolverRegistry.NOOP, registry.resolverFor("")); + assertSame(ResolverRegistry.NOOP, registry.resolverFor(" ")); + } + + // ---------- Conflict resolution ---------- + + @Test + void duplicateLanguageFirstSortedWins() { + // Two resolvers both claim "java". Sort by class simple name — A before Z. + AStubResolver a = new AStubResolver("java"); + ZStubResolver z = new ZStubResolver("java"); + ResolverRegistry registry = new ResolverRegistry(List.of(z, a)); // input order intentionally reversed + + assertSame(a, registry.resolverFor("java"), + "first-in-sort-order wins — AStubResolver < ZStubResolver alphabetically"); + } + + // ---------- Order ---------- + + @Test + void allReturnsSortedOrder() { + AStubResolver a = new AStubResolver("a"); + ZStubResolver z = new ZStubResolver("z"); + MStubResolver m = new MStubResolver("m"); + ResolverRegistry registry = new ResolverRegistry(List.of(z, a, m)); + + List all = registry.all(); + assertEquals(3, all.size()); + assertSame(a, all.get(0)); + assertSame(m, all.get(1)); + assertSame(z, all.get(2)); + } + + // ---------- Bootstrap ---------- + + @Test + void bootstrapCallsEveryResolverInOrder() { + List calledOrder = new ArrayList<>(); + AStubResolver a = new AStubResolver("a", () -> calledOrder.add("A")); + MStubResolver m = new MStubResolver("m", () -> calledOrder.add("M")); + ZStubResolver z = new ZStubResolver("z", () -> calledOrder.add("Z")); + ResolverRegistry registry = new ResolverRegistry(List.of(z, m, a)); // input order shuffled + + registry.bootstrap(Path.of("/tmp/project")); + + assertEquals(List.of("A", "M", "Z"), calledOrder, + "bootstrap iterates in alphabetical order — determinism guarantee"); + } + + @Test + void bootstrapResilient_oneFailureDoesNotBlockOthers() { + AtomicInteger aCalled = new AtomicInteger(); + AtomicInteger zCalled = new AtomicInteger(); + AStubResolver a = new AStubResolver("a", () -> { + aCalled.incrementAndGet(); + throw new RuntimeException("simulated bootstrap failure"); + }); + ZStubResolver z = new ZStubResolver("z", zCalled::incrementAndGet); + ResolverRegistry registry = new ResolverRegistry(List.of(a, z)); + + // Must not throw — failure is swallowed and logged + assertDoesNotThrow(() -> registry.bootstrap(Path.of("/tmp/project"))); + + assertEquals(1, aCalled.get(), "failing resolver was called"); + assertEquals(1, zCalled.get(), + "subsequent resolvers run despite earlier failure — resilience guarantee"); + } + + @Test + void bootstrapResilient_resolutionExceptionAlsoSwallowed() { + AtomicInteger zCalled = new AtomicInteger(); + SymbolResolver throwing = new SymbolResolver() { + @Override public Set getSupportedLanguages() { return Set.of("a"); } + @Override public void bootstrap(Path projectRoot) throws ResolutionException { + throw new ResolutionException("simulated checked failure", projectRoot, "a"); + } + @Override public Resolved resolve(DiscoveredFile file, Object parsedAst) { + return EmptyResolved.INSTANCE; + } + }; + ZStubResolver z = new ZStubResolver("z", zCalled::incrementAndGet); + ResolverRegistry registry = new ResolverRegistry(List.of(throwing, z)); + + assertDoesNotThrow(() -> registry.bootstrap(Path.of("/tmp/project"))); + assertEquals(1, zCalled.get(), + "ResolutionException from one resolver does not stop the pass"); + } + + @Test + void bootstrapEmptyRegistryIsNoOp() { + ResolverRegistry registry = new ResolverRegistry(List.of()); + assertDoesNotThrow(() -> registry.bootstrap(Path.of("/tmp/project"))); + } + + // ---------- Test stubs ---------- + + /** Resolves one language. Optional bootstrap callback for sequencing tests. */ + private static class AStubResolver implements SymbolResolver { + private final String language; + private final Runnable onBootstrap; + AStubResolver(String language) { this(language, () -> {}); } + AStubResolver(String language, Runnable onBootstrap) { + this.language = language; + this.onBootstrap = onBootstrap; + } + @Override public Set getSupportedLanguages() { return Set.of(language); } + @Override public void bootstrap(Path projectRoot) { onBootstrap.run(); } + @Override public Resolved resolve(DiscoveredFile file, Object parsedAst) { + return new Resolved() { + @Override public boolean isAvailable() { return true; } + @Override public Confidence sourceConfidence() { return Confidence.RESOLVED; } + }; + } + } + + private static final class MStubResolver extends AStubResolver { + MStubResolver(String language) { super(language); } + MStubResolver(String language, Runnable onBootstrap) { super(language, onBootstrap); } + } + + private static final class ZStubResolver extends AStubResolver { + ZStubResolver(String language) { super(language); } + ZStubResolver(String language, Runnable onBootstrap) { super(language, onBootstrap); } + } +} From e4473800e99e3f38174654292cce34c48043469b Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 17:00:48 +0000 Subject: [PATCH 18/23] feat(detector): add Optional accessor to DetectorContext MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit DetectorContext now carries an Optional as its 7th field. The field is the opt-in entry point for the resolver pass — detectors that want to upgrade emissions to RESOLVED check ctx.resolved().filter(Resolved::isAvailable) before downcasting to a language-specific Resolved subclass; detectors that don't care simply ignore it. Backward compat: all existing constructors (3-arg, 5-arg, 6-arg with registry) still compile and work — they delegate to the canonical 7-arg constructor with Optional.empty() for resolution. The compact constructor normalizes a null Optional to Optional.empty() so the field is never null at rest. withResolved(Resolved) is the orchestrator's hook to attach per-file resolution after the resolver pass. Test coverage (10 new tests in DetectorContextResolvedTest): - 3-arg / 5-arg / 6-arg constructors all default resolved to empty - 7-arg canonical constructor carries the attached Resolved - Compact constructor normalizes null → Optional.empty() - withResolved(r) returns a copy with Optional.of(r); base untouched - withResolved(null) clears the resolution back to empty - EmptyResolved attached: present but isAvailable()==false - withResolved preserves all other fields - Documents the canonical detector check pattern Verified no regressions: 1970 detector tests all pass. Per sub-project 1 spec §6.1 + plan Task 12. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../iq/detector/DetectorContext.java | 50 ++++++- .../detector/DetectorContextResolvedTest.java | 140 ++++++++++++++++++ 2 files changed, 185 insertions(+), 5 deletions(-) create mode 100644 src/test/java/io/github/randomcodespace/iq/detector/DetectorContextResolvedTest.java diff --git a/src/main/java/io/github/randomcodespace/iq/detector/DetectorContext.java b/src/main/java/io/github/randomcodespace/iq/detector/DetectorContext.java index 0ffc79b8..3abfc4fa 100644 --- a/src/main/java/io/github/randomcodespace/iq/detector/DetectorContext.java +++ b/src/main/java/io/github/randomcodespace/iq/detector/DetectorContext.java @@ -1,23 +1,63 @@ package io.github.randomcodespace.iq.detector; import io.github.randomcodespace.iq.analyzer.InfrastructureRegistry; +import io.github.randomcodespace.iq.intelligence.resolver.Resolved; +import java.util.Optional; + +/** + * Immutable per-file context passed to every {@link Detector#detect}. + * + *

The {@code resolved} field is the opt-in entry point for symbol-resolution + * data. Detectors that want to upgrade emissions to {@link + * io.github.randomcodespace.iq.model.Confidence#RESOLVED} call + * {@code ctx.resolved().filter(Resolved::isAvailable).map(...)} before + * downcasting to the language-specific {@code Resolved} subclass. Detectors + * that don't care simply ignore the field — the existing pipeline works + * unchanged when {@link #resolved()} returns {@code Optional.empty()}. + */ public record DetectorContext( String filePath, String language, String content, Object parsedData, String moduleName, - InfrastructureRegistry registry + InfrastructureRegistry registry, + Optional resolved ) { - /** Minimal constructor — no parsed data, module name, or registry. */ + /** Compact constructor: normalize {@code null resolved} to {@link Optional#empty()}. */ + public DetectorContext { + if (resolved == null) { + resolved = Optional.empty(); + } + } + + /** Minimal constructor — no parsed data, module name, registry, or resolution. */ public DetectorContext(String filePath, String language, String content) { - this(filePath, language, content, null, null, null); + this(filePath, language, content, null, null, null, Optional.empty()); } - /** Full constructor without registry — backward compat for existing callers. */ + /** Backward-compat: 5-arg form without registry / resolution. */ public DetectorContext(String filePath, String language, String content, Object parsedData, String moduleName) { - this(filePath, language, content, parsedData, moduleName, null); + this(filePath, language, content, parsedData, moduleName, null, Optional.empty()); + } + + /** Backward-compat: 6-arg form with registry but no resolution (matches the old canonical record signature). */ + public DetectorContext(String filePath, String language, String content, + Object parsedData, String moduleName, + InfrastructureRegistry registry) { + this(filePath, language, content, parsedData, moduleName, registry, Optional.empty()); + } + + /** + * Return a copy of this context with the given {@link Resolved} attached. + * Used by the orchestrator after the resolver pass to thread per-file + * resolution into the detector. {@code null} is normalized to + * {@link Optional#empty()}. + */ + public DetectorContext withResolved(Resolved resolved) { + Optional opt = resolved != null ? Optional.of(resolved) : Optional.empty(); + return new DetectorContext(filePath, language, content, parsedData, moduleName, registry, opt); } } diff --git a/src/test/java/io/github/randomcodespace/iq/detector/DetectorContextResolvedTest.java b/src/test/java/io/github/randomcodespace/iq/detector/DetectorContextResolvedTest.java new file mode 100644 index 00000000..dc699252 --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/detector/DetectorContextResolvedTest.java @@ -0,0 +1,140 @@ +package io.github.randomcodespace.iq.detector; + +import io.github.randomcodespace.iq.intelligence.resolver.EmptyResolved; +import io.github.randomcodespace.iq.intelligence.resolver.Resolved; +import io.github.randomcodespace.iq.model.Confidence; +import org.junit.jupiter.api.Test; + +import java.util.Optional; + +import static org.junit.jupiter.api.Assertions.*; + +/** + * Aggressive coverage for the {@link DetectorContext#resolved()} accessor and + * the backward-compat invariant: existing call sites continue to compile and + * see {@link Optional#empty()} for resolution. Detectors that opt in via + * {@link DetectorContext#withResolved(Resolved)} get the attached value. + */ +class DetectorContextResolvedTest { + + @Test + void threeArgConstructorDefaultsResolvedToEmpty() { + DetectorContext ctx = new DetectorContext("Foo.java", "java", "class Foo {}"); + assertEquals(Optional.empty(), ctx.resolved(), + "3-arg constructor still gives empty resolution — backward compat"); + } + + @Test + void fiveArgConstructorDefaultsResolvedToEmpty() { + DetectorContext ctx = new DetectorContext("Foo.java", "java", "class Foo {}", + null, "myModule"); + assertEquals(Optional.empty(), ctx.resolved(), + "5-arg constructor still gives empty resolution — backward compat"); + } + + @Test + void sixArgConstructorDefaultsResolvedToEmpty() { + DetectorContext ctx = new DetectorContext("Foo.java", "java", "class Foo {}", + null, "myModule", null); + assertEquals(Optional.empty(), ctx.resolved(), + "6-arg constructor still gives empty resolution — backward compat"); + } + + @Test + void canonicalSevenArgConstructorCarriesResolved() { + Resolved r = stubAvailableResolved(); + DetectorContext ctx = new DetectorContext("Foo.java", "java", "class Foo {}", + null, "myModule", null, Optional.of(r)); + assertTrue(ctx.resolved().isPresent()); + assertSame(r, ctx.resolved().get()); + } + + @Test + void compactConstructorNormalizesNullResolvedToEmpty() { + // Defensive: passing null Optional is a misuse, but the compact + // constructor must not let it propagate (or callers reading ctx.resolved() + // would NPE). Normalized to Optional.empty() at construction time. + DetectorContext ctx = new DetectorContext("Foo.java", "java", "class Foo {}", + null, "myModule", null, null); + assertNotNull(ctx.resolved()); + assertEquals(Optional.empty(), ctx.resolved()); + } + + @Test + void withResolvedAttachesAvailableResolved() { + DetectorContext base = new DetectorContext("Foo.java", "java", "class Foo {}"); + Resolved r = stubAvailableResolved(); + DetectorContext withR = base.withResolved(r); + + // Original is untouched + assertEquals(Optional.empty(), base.resolved()); + // Copy carries the resolution + assertTrue(withR.resolved().isPresent()); + assertSame(r, withR.resolved().get()); + } + + @Test + void withResolvedNullClearsResolution() { + DetectorContext base = new DetectorContext("Foo.java", "java", "class Foo {}", + null, "m", null, Optional.of(stubAvailableResolved())); + DetectorContext cleared = base.withResolved(null); + + assertEquals(Optional.empty(), cleared.resolved(), + "withResolved(null) clears the resolution back to empty"); + } + + @Test + void withResolvedEmptyResolvedSentinelIsCarried() { + // A detector that wants to explicitly say "the resolver tried but came + // up empty" can attach EmptyResolved.INSTANCE — different semantics from + // Optional.empty (which means "the resolver pass didn't run for this file"). + DetectorContext base = new DetectorContext("Foo.java", "java", ""); + DetectorContext withEmpty = base.withResolved(EmptyResolved.INSTANCE); + + assertTrue(withEmpty.resolved().isPresent(), + "EmptyResolved.INSTANCE is a real value — Optional.isPresent() is true"); + assertSame(EmptyResolved.INSTANCE, withEmpty.resolved().get()); + assertFalse(withEmpty.resolved().get().isAvailable(), + "but isAvailable() == false — detectors still fall back to syntactic"); + } + + @Test + void withResolvedPreservesAllOtherFields() { + // Verifying we don't accidentally drop other fields when copying. + DetectorContext base = new DetectorContext("Foo.java", "java", "content", + "parsedAst", "moduleName", null); + DetectorContext copy = base.withResolved(EmptyResolved.INSTANCE); + + assertEquals("Foo.java", copy.filePath()); + assertEquals("java", copy.language()); + assertEquals("content", copy.content()); + assertEquals("parsedAst", copy.parsedData()); + assertEquals("moduleName", copy.moduleName()); + assertNull(copy.registry()); + } + + @Test + void resolvedAccessorTypicalDetectorUsage() { + // Documents the canonical detector-side check: filter on isAvailable + // before downcasting to a language-specific Resolved subclass. + DetectorContext ctxA = new DetectorContext("Foo.java", "java", ""); + DetectorContext ctxB = new DetectorContext("Foo.java", "java", "") + .withResolved(EmptyResolved.INSTANCE); + DetectorContext ctxC = new DetectorContext("Foo.java", "java", "") + .withResolved(stubAvailableResolved()); + + assertTrue(ctxA.resolved().filter(Resolved::isAvailable).isEmpty(), + "no resolution attached: detector falls back to syntactic"); + assertTrue(ctxB.resolved().filter(Resolved::isAvailable).isEmpty(), + "EmptyResolved attached: detector still falls back"); + assertTrue(ctxC.resolved().filter(Resolved::isAvailable).isPresent(), + "available Resolved attached: detector may downcast and use it"); + } + + private static Resolved stubAvailableResolved() { + return new Resolved() { + @Override public boolean isAvailable() { return true; } + @Override public Confidence sourceConfidence() { return Confidence.RESOLVED; } + }; + } +} From 0b07ac6ee8032a66c1144a739e2d0b9538b95dfd Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 17:01:52 +0000 Subject: [PATCH 19/23] chore(deps): add javaparser-symbol-solver-core 3.28.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Same release train as the existing javaparser-core 3.28.0 — required by the upcoming JavaSymbolResolver (sub-project 1, plan Task 17). Pulls in JavaSymbolSolver, CombinedTypeSolver, ReflectionTypeSolver, and JavaParserTypeSolver. Apache-2.0 license, no transitive surprises: 'mvn dependency:tree -Dincludes=com.github.javaparser' shows only the two artifacts at 3.28.0. Per sub-project 1 plan Task 14. Co-Authored-By: Claude Opus 4.7 (1M context) --- pom.xml | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/pom.xml b/pom.xml index 3f1144ab..8c610860 100644 --- a/pom.xml +++ b/pom.xml @@ -195,6 +195,14 @@ 3.28.0 + + + com.github.javaparser + javaparser-symbol-solver-core + 3.28.0 + + org.antlr From ba5c5c4d3e648a56584652736fd4b58114fc7129 Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 17:04:10 +0000 Subject: [PATCH 20/23] feat(resolver/java): add JavaSourceRootDiscovery (Maven/Gradle/plain) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Walks a project root for the canonical src/(main|test)/java directories that Maven and Gradle both standardize on. Multi-module projects work naturally — every nested src/main/java is a separate root. Plain projects (no build file) fall back to top-level src/ if it has any *.java. Determinism: results sorted alphabetically by absolute path. Same tree → same root list → same CombinedTypeSolver → same resolution. Symlink safety: Files.walkFileTree runs with FOLLOW_LINKS disabled, so loops cannot form. The trade-off — source roots reachable only via symlink are skipped — is the right call for resolution where double- counting via symlink would be worse. Skip directories: target, build, out, bin, dist, .git, .gradle, .idea, .vscode, .m2, .cache, node_modules, .codeiq — phantom src/main/java inside any of these is ignored. Test coverage (18 new tests in JavaSourceRootDiscoveryTest): - Maven single-module returns [src/main/java, src/test/java] - Maven main-only returns just main - Maven multi-module aggregates all submodules (sorted) - Gradle layout matches Maven (discovery doesn't read build files) - Plain layout fallback: src/ with .java becomes the root - Plain layout without .java returns empty - Empty / non-existent / null / file-not-dir all return empty (no exceptions) - target/, build/, node_modules/, .git/, .gradle/, .idea/ are skipped - Phantom src/main/java inside skip-dirs is NOT picked up - Results sorted alphabetically (verified across 3 modules) - Discovery is idempotent - Symlink loop terminates without exception (POSIX only — DisabledOnOs Windows) - Deeply-nested modules found - src/main/kotlin is NOT mistaken for a Java root Per sub-project 1 plan Task 15. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../java/JavaSourceRootDiscovery.java | 134 ++++++++++ .../java/JavaSourceRootDiscoveryTest.java | 241 ++++++++++++++++++ 2 files changed, 375 insertions(+) create mode 100644 src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSourceRootDiscovery.java create mode 100644 src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSourceRootDiscoveryTest.java diff --git a/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSourceRootDiscovery.java b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSourceRootDiscovery.java new file mode 100644 index 00000000..f6c111c4 --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSourceRootDiscovery.java @@ -0,0 +1,134 @@ +package io.github.randomcodespace.iq.intelligence.resolver.java; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.springframework.stereotype.Component; + +import java.io.IOException; +import java.nio.file.FileVisitOption; +import java.nio.file.FileVisitResult; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.SimpleFileVisitor; +import java.nio.file.attribute.BasicFileAttributes; +import java.util.ArrayList; +import java.util.EnumSet; +import java.util.List; +import java.util.Set; +import java.util.TreeSet; + +/** + * Discovers Java source roots under a project root by walking for the + * {@code src/main/java} and {@code src/test/java} directories Maven and Gradle + * both standardize on. Multi-module projects are handled by walking the whole + * tree — every nested {@code src/(main|test)/java} is a separate root. + * + *

Determinism: results are returned sorted alphabetically by absolute path. + * Same project tree → same root list → same {@code CombinedTypeSolver} → + * same resolution behavior. + * + *

Symlink safety: {@link Files#walkFileTree} runs with + * {@link FileVisitOption#FOLLOW_LINKS} disabled, so symlink cycles cannot + * form. The trade-off — source roots reachable only via symlink are skipped + * — is the right call for resolution: traversal would otherwise double-count. + * + *

Plain-layout fallback: if the walk finds no Maven/Gradle source roots + * but the top-level directory contains {@code src/} with at least one + * {@code *.java} file, returns {@code [src]} as a single root. This covers + * scratch projects without a build file. + */ +@Component +public class JavaSourceRootDiscovery { + + private static final Logger log = LoggerFactory.getLogger(JavaSourceRootDiscovery.class); + + /** Directories we never descend into — they don't contain Java sources we care about. */ + private static final Set SKIP_DIRS = Set.of( + "target", "build", "out", "bin", "dist", + ".git", ".gradle", ".idea", ".vscode", ".m2", ".cache", + "node_modules", ".codeiq" + ); + + /** + * @param projectRoot project root path. May be null or non-existent — both + * return an empty list. + * @return sorted list of absolute Java source root paths (e.g. + * {@code [/service-a/src/main/java, /service-b/src/main/java]}). + * Never null, never contains null entries. + */ + public List discover(Path projectRoot) { + if (projectRoot == null || !Files.isDirectory(projectRoot)) { + return List.of(); + } + + Set roots = new TreeSet<>(); + try { + Files.walkFileTree( + projectRoot, + EnumSet.noneOf(FileVisitOption.class), // do NOT follow symlinks + Integer.MAX_VALUE, + new SimpleFileVisitor<>() { + @Override + public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) { + String name = nameOrEmpty(dir); + if (SKIP_DIRS.contains(name)) { + return FileVisitResult.SKIP_SUBTREE; + } + if (isMavenStyleJavaRoot(dir)) { + roots.add(dir); + } + return FileVisitResult.CONTINUE; + } + + @Override + public FileVisitResult visitFileFailed(Path file, IOException exc) { + // Ignore unreadable entries; resolution is best-effort. + log.debug("skipping unreadable path {}: {}", file, exc.getMessage()); + return FileVisitResult.CONTINUE; + } + }); + } catch (IOException e) { + log.warn("source root discovery failed for {}: {}", projectRoot, e.getMessage()); + return List.of(); + } + + if (!roots.isEmpty()) { + return new ArrayList<>(roots); + } + + // Plain-layout fallback: top-level src/ with at least one .java file. + Path src = projectRoot.resolve("src"); + if (Files.isDirectory(src) && containsJavaFile(src)) { + return List.of(src); + } + return List.of(); + } + + /** {@code true} iff {@code dir} is {@code .../src/main/java} or {@code .../src/test/java}. */ + private static boolean isMavenStyleJavaRoot(Path dir) { + if (!"java".equals(nameOrEmpty(dir))) return false; + Path parent = dir.getParent(); + if (parent == null) return false; + String parentName = nameOrEmpty(parent); + if (!"main".equals(parentName) && !"test".equals(parentName)) return false; + Path grandparent = parent.getParent(); + if (grandparent == null) return false; + return "src".equals(nameOrEmpty(grandparent)); + } + + private static String nameOrEmpty(Path p) { + Path name = p.getFileName(); + return name != null ? name.toString() : ""; + } + + /** Cheap probe: does the directory tree under {@code root} have any {@code *.java}? */ + private static boolean containsJavaFile(Path root) { + try { + return Files.walk(root) + .filter(p -> !Files.isDirectory(p)) + .anyMatch(p -> p.toString().endsWith(".java")); + } catch (IOException e) { + return false; + } + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSourceRootDiscoveryTest.java b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSourceRootDiscoveryTest.java new file mode 100644 index 00000000..43d50199 --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSourceRootDiscoveryTest.java @@ -0,0 +1,241 @@ +package io.github.randomcodespace.iq.intelligence.resolver.java; + +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.condition.DisabledOnOs; +import org.junit.jupiter.api.condition.OS; +import org.junit.jupiter.api.io.TempDir; + +import java.nio.file.Files; +import java.nio.file.Path; +import java.util.List; + +import static org.junit.jupiter.api.Assertions.*; + +/** + * Aggressive coverage for {@link JavaSourceRootDiscovery} on synthetic dir + * layouts. Verifies all 6 plan-mandated scenarios + defensive cases. + */ +class JavaSourceRootDiscoveryTest { + + private final JavaSourceRootDiscovery discovery = new JavaSourceRootDiscovery(); + + // ---------- Maven layouts ---------- + + @Test + void mavenSingleModuleReturnsMainAndTestJava(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src/main/java")); + Files.createDirectories(tmp.resolve("src/test/java")); + Files.writeString(tmp.resolve("pom.xml"), ""); + + List roots = discovery.discover(tmp); + + assertEquals(2, roots.size()); + assertEquals(tmp.resolve("src/main/java"), roots.get(0)); + assertEquals(tmp.resolve("src/test/java"), roots.get(1)); + } + + @Test + void mavenSingleModuleMainOnlyReturnsMainOnly(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src/main/java")); + Files.writeString(tmp.resolve("pom.xml"), ""); + + List roots = discovery.discover(tmp); + + assertEquals(List.of(tmp.resolve("src/main/java")), roots); + } + + @Test + void mavenMultiModuleAggregatesAllSubmodules(@TempDir Path tmp) throws Exception { + Files.writeString(tmp.resolve("pom.xml"), ""); + Files.createDirectories(tmp.resolve("service-a/src/main/java")); + Files.createDirectories(tmp.resolve("service-a/src/test/java")); + Files.createDirectories(tmp.resolve("service-b/src/main/java")); + + List roots = discovery.discover(tmp); + + assertEquals(3, roots.size()); + // Sorted alphabetically: service-a/src/main/java, service-a/src/test/java, service-b/src/main/java + assertEquals(tmp.resolve("service-a/src/main/java"), roots.get(0)); + assertEquals(tmp.resolve("service-a/src/test/java"), roots.get(1)); + assertEquals(tmp.resolve("service-b/src/main/java"), roots.get(2)); + } + + // ---------- Gradle layouts ---------- + + @Test + void gradleLayoutDetectedSameAsMaven(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src/main/java")); + Files.createDirectories(tmp.resolve("src/test/java")); + // Gradle Kotlin DSL marker + Files.writeString(tmp.resolve("build.gradle.kts"), "plugins {}"); + + List roots = discovery.discover(tmp); + + // The discovery doesn't actually inspect build files — it walks for src/(main|test)/java. + // Documents that Maven and Gradle are indistinguishable to this discovery. + assertEquals(2, roots.size()); + assertEquals(tmp.resolve("src/main/java"), roots.get(0)); + assertEquals(tmp.resolve("src/test/java"), roots.get(1)); + } + + // ---------- Plain layout ---------- + + @Test + void plainSrcWithJavaFileFallsBackToSrcAsRoot(@TempDir Path tmp) throws Exception { + // No Maven/Gradle markers, no src/main/java — but src/ has a .java file. + // Fallback: treat src/ as the root. + Files.createDirectories(tmp.resolve("src")); + Files.writeString(tmp.resolve("src/Foo.java"), "class Foo {}"); + + List roots = discovery.discover(tmp); + + assertEquals(List.of(tmp.resolve("src")), roots); + } + + @Test + void plainSrcWithoutJavaFilesReturnsEmpty(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src")); + Files.writeString(tmp.resolve("src/README.md"), "# nothing to see here"); + + List roots = discovery.discover(tmp); + + assertTrue(roots.isEmpty(), + "src/ exists but has no .java files — discovery returns nothing"); + } + + // ---------- Empty / missing ---------- + + @Test + void emptyDirectoryReturnsEmpty(@TempDir Path tmp) { + List roots = discovery.discover(tmp); + assertTrue(roots.isEmpty()); + } + + @Test + void nonExistentPathReturnsEmpty(@TempDir Path tmp) { + List roots = discovery.discover(tmp.resolve("does-not-exist")); + assertTrue(roots.isEmpty(), + "missing project root yields empty list, no exception"); + } + + @Test + void nullPathReturnsEmpty() { + List roots = discovery.discover(null); + assertTrue(roots.isEmpty(), + "null project root yields empty list, no NPE"); + } + + @Test + void filePathInsteadOfDirReturnsEmpty(@TempDir Path tmp) throws Exception { + Path file = Files.writeString(tmp.resolve("not-a-dir.txt"), "hello"); + List roots = discovery.discover(file); + assertTrue(roots.isEmpty(), + "a file (not a directory) yields empty list, no exception"); + } + + // ---------- Skip directories ---------- + + @Test + void targetDirIsSkipped(@TempDir Path tmp) throws Exception { + // Maven build output — nested src/main/java inside target/ should be ignored. + Files.createDirectories(tmp.resolve("src/main/java")); + Files.createDirectories(tmp.resolve("target/foo/src/main/java")); + Files.writeString(tmp.resolve("pom.xml"), ""); + + List roots = discovery.discover(tmp); + + assertEquals(List.of(tmp.resolve("src/main/java")), roots, + "target/ is skipped — its phantom src/main/java is not a real source root"); + } + + @Test + void buildAndNodeModulesSkipped(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src/main/java")); + Files.createDirectories(tmp.resolve("build/classes/main/src/main/java")); + Files.createDirectories(tmp.resolve("node_modules/some-pkg/src/main/java")); + + List roots = discovery.discover(tmp); + + assertEquals(List.of(tmp.resolve("src/main/java")), roots, + "build/ and node_modules/ are skipped — their phantom src trees are not roots"); + } + + @Test + void dotGitIsSkipped(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src/main/java")); + Files.createDirectories(tmp.resolve(".git/objects")); + Files.createDirectories(tmp.resolve(".gradle/caches")); + Files.createDirectories(tmp.resolve(".idea/workspace")); + + List roots = discovery.discover(tmp); + + assertEquals(List.of(tmp.resolve("src/main/java")), roots); + } + + // ---------- Determinism + safety ---------- + + @Test + void resultsAreSortedAlphabetically(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("zzz/src/main/java")); + Files.createDirectories(tmp.resolve("aaa/src/main/java")); + Files.createDirectories(tmp.resolve("mmm/src/main/java")); + + List roots = discovery.discover(tmp); + + assertEquals(3, roots.size()); + assertEquals(tmp.resolve("aaa/src/main/java"), roots.get(0)); + assertEquals(tmp.resolve("mmm/src/main/java"), roots.get(1)); + assertEquals(tmp.resolve("zzz/src/main/java"), roots.get(2)); + } + + @Test + void discoveryIsIdempotent(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src/main/java")); + Files.createDirectories(tmp.resolve("src/test/java")); + Files.writeString(tmp.resolve("pom.xml"), ""); + + List first = discovery.discover(tmp); + List second = discovery.discover(tmp); + + assertEquals(first, second, + "two calls over the same tree return identical results — determinism"); + } + + @Test + @DisabledOnOs(OS.WINDOWS) // symlink semantics differ on Windows + void symlinkLoopTerminatesWithoutException(@TempDir Path tmp) throws Exception { + // Create a real source root and a symlink loop pointing back at the project root. + Files.createDirectories(tmp.resolve("src/main/java")); + Files.writeString(tmp.resolve("pom.xml"), ""); + Files.createSymbolicLink(tmp.resolve("loop-link"), tmp); + + // Files.walkFileTree with NOFOLLOW_LINKS doesn't traverse symlinks → no cycle. + List roots = assertDoesNotThrow(() -> discovery.discover(tmp)); + assertEquals(List.of(tmp.resolve("src/main/java")), roots, + "symlink loop does not cause infinite recursion or duplicate detection"); + } + + @Test + void srcMainJavaWithDeepNestingStillFound(@TempDir Path tmp) throws Exception { + // Deeply nested module — verifies walkFileTree doesn't hit a depth limit. + Path deep = tmp.resolve("a/b/c/d/e/service/src/main/java"); + Files.createDirectories(deep); + + List roots = discovery.discover(tmp); + + assertEquals(List.of(deep), roots); + } + + @Test + void srcMainKotlinIsNotMistakenForJava(@TempDir Path tmp) throws Exception { + // The check is for the literal "java" leaf — Kotlin sources at + // src/main/kotlin must NOT be reported as a Java source root. + Files.createDirectories(tmp.resolve("src/main/kotlin")); + Files.createDirectories(tmp.resolve("src/test/kotlin")); + + List roots = discovery.discover(tmp); + + assertTrue(roots.isEmpty(), + "src/main/kotlin is not a Java source root"); + } +} From c83167b042d5566834b693e8ec7cee9bc6a1f787 Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 17:07:33 +0000 Subject: [PATCH 21/23] feat(resolver/java): add JavaResolved + JavaSymbolResolver (@Component) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit JavaResolved is a record carrying the parsed CompilationUnit + the configured JavaSymbolSolver. isAvailable() == true and sourceConfidence() == RESOLVED — detectors that downcast to it can stamp emissions at the RESOLVED tier. JavaSymbolResolver is the @Component that bootstraps a CombinedTypeSolver (ReflectionTypeSolver + per-source-root JavaParserTypeSolver) using JavaSourceRootDiscovery for the root list. Source roots are sorted alphabetically → deterministic solver wiring → same resolution every run. Deliberately NOT mutating StaticJavaParser.getParserConfiguration() — that would conflict with AbstractJavaParserDetector's thread-local JavaParser pool under virtual-thread concurrency. Detectors that want the solver attached to their own JavaParser get it via JavaSymbolResolver.symbolSolver() and configure their own ParserConfiguration. Test coverage (23 new tests): JavaResolvedTest (6): - isAvailable() == true - sourceConfidence() == RESOLVED - cu() / solver() accessors - implements Resolved - distinct from EmptyResolved.INSTANCE JavaSymbolResolverTest (17, Layer 1 unit): - supports "java" only - bootstrap empty project still builds ReflectionTypeSolver - bootstrap with source roots adds JavaParserTypeSolver per root - bootstrap is repeatable (fresh CTS each call) - combinedTypeSolver() null before bootstrap - resolve before bootstrap → EmptyResolved (graceful) - resolve null file → EmptyResolved - resolve non-Java file → EmptyResolved - resolve null AST → EmptyResolved - resolve String AST (wrong type) → EmptyResolved (no ClassCastException) - language match is case-insensitive ("Java") - resolve valid CU → JavaResolved - JavaResolved carries the input cu and the bootstrapped solver - Solver smoke test: resolves java.lang.String via ReflectionTypeSolver - Solver smoke test: resolves project class from source root - resolve() doesn't cache — distinct JavaResolved per call Per sub-project 1 plan Tasks 16-18. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../resolver/java/JavaResolved.java | 32 ++ .../resolver/java/JavaSymbolResolver.java | 104 +++++++ .../resolver/java/JavaResolvedTest.java | 73 +++++ .../resolver/java/JavaSymbolResolverTest.java | 276 ++++++++++++++++++ 4 files changed, 485 insertions(+) create mode 100644 src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaResolved.java create mode 100644 src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolver.java create mode 100644 src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaResolvedTest.java create mode 100644 src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolverTest.java diff --git a/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaResolved.java b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaResolved.java new file mode 100644 index 00000000..4c8e90d3 --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaResolved.java @@ -0,0 +1,32 @@ +package io.github.randomcodespace.iq.intelligence.resolver.java; + +import com.github.javaparser.ast.CompilationUnit; +import com.github.javaparser.symbolsolver.JavaSymbolSolver; +import io.github.randomcodespace.iq.intelligence.resolver.Resolved; +import io.github.randomcodespace.iq.model.Confidence; + +/** + * Java-specific {@link Resolved} carrying the parsed {@link CompilationUnit} + * and the {@link JavaSymbolSolver} configured for the current project. + * + *

Detectors that opt in to resolution should: + *

    + *
  1. Read {@code ctx.resolved()}
  2. + *
  3. Filter on {@link #isAvailable()}
  4. + *
  5. Downcast to {@code JavaResolved}
  6. + *
  7. Use {@link #cu()} (the file's parsed AST) and {@link #solver()} + * (for cross-file type lookups) to resolve symbols
  8. + *
+ */ +public record JavaResolved(CompilationUnit cu, JavaSymbolSolver solver) implements Resolved { + + @Override + public boolean isAvailable() { + return true; + } + + @Override + public Confidence sourceConfidence() { + return Confidence.RESOLVED; + } +} diff --git a/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolver.java b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolver.java new file mode 100644 index 00000000..4cf2e66b --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolver.java @@ -0,0 +1,104 @@ +package io.github.randomcodespace.iq.intelligence.resolver.java; + +import com.github.javaparser.ast.CompilationUnit; +import com.github.javaparser.symbolsolver.JavaSymbolSolver; +import com.github.javaparser.symbolsolver.resolution.typesolvers.CombinedTypeSolver; +import com.github.javaparser.symbolsolver.resolution.typesolvers.JavaParserTypeSolver; +import com.github.javaparser.symbolsolver.resolution.typesolvers.ReflectionTypeSolver; +import io.github.randomcodespace.iq.analyzer.DiscoveredFile; +import io.github.randomcodespace.iq.intelligence.resolver.EmptyResolved; +import io.github.randomcodespace.iq.intelligence.resolver.ResolutionException; +import io.github.randomcodespace.iq.intelligence.resolver.Resolved; +import io.github.randomcodespace.iq.intelligence.resolver.SymbolResolver; +import org.springframework.stereotype.Component; + +import java.nio.file.Path; +import java.util.Set; + +/** + * Java backend for the resolver SPI. Wraps JavaParser's {@link JavaSymbolSolver} + * configured from a {@link CombinedTypeSolver} that includes + * {@link ReflectionTypeSolver} plus a {@link JavaParserTypeSolver} per source + * root discovered by {@link JavaSourceRootDiscovery}. + * + *

Determinism: {@link JavaSourceRootDiscovery} returns roots sorted + * alphabetically, so the order of {@link JavaParserTypeSolver}s in the + * combined solver is stable across runs. + * + *

Thread safety: bootstrap is called once before any resolve(); resolve() + * is safe under virtual-thread concurrency because {@link JavaSymbolSolver} + * itself is thread-safe for read-only resolution. We deliberately do NOT + * mutate {@code StaticJavaParser.getParserConfiguration()} — that would be + * global static state shared with the existing + * {@link io.github.randomcodespace.iq.detector.jvm.java.AbstractJavaParserDetector} + * thread-local parser pool and is not safe under concurrent use. + */ +@Component +public class JavaSymbolResolver implements SymbolResolver { + + private final JavaSourceRootDiscovery discovery; + private CombinedTypeSolver combined; + private JavaSymbolSolver solver; + + public JavaSymbolResolver(JavaSourceRootDiscovery discovery) { + this.discovery = discovery; + } + + @Override + public Set getSupportedLanguages() { + return Set.of("java"); + } + + @Override + public void bootstrap(Path projectRoot) throws ResolutionException { + try { + CombinedTypeSolver cts = new CombinedTypeSolver(); + cts.add(new ReflectionTypeSolver()); + for (Path root : discovery.discover(projectRoot)) { + cts.add(new JavaParserTypeSolver(root.toFile())); + } + this.combined = cts; + this.solver = new JavaSymbolSolver(cts); + } catch (RuntimeException e) { + throw new ResolutionException( + "JavaSymbolResolver bootstrap failed for " + projectRoot, + e, projectRoot, "java"); + } + } + + @Override + public Resolved resolve(DiscoveredFile file, Object parsedAst) { + if (file == null || !"java".equalsIgnoreCase(file.language())) { + return EmptyResolved.INSTANCE; + } + if (!(parsedAst instanceof CompilationUnit cu)) { + return EmptyResolved.INSTANCE; + } + if (this.solver == null) { + // bootstrap() not called or it failed silently — falling back to + // EmptyResolved is the safe path. The orchestrator already logs + // bootstrap failures from ResolverRegistry. + return EmptyResolved.INSTANCE; + } + return new JavaResolved(cu, solver); + } + + /** + * @return the {@link CombinedTypeSolver} built during {@link #bootstrap(Path)}, + * or null if bootstrap hasn't run. Exposed for tests + advanced use. + */ + public CombinedTypeSolver combinedTypeSolver() { + return combined; + } + + /** + * @return the {@link JavaSymbolSolver} built during {@link #bootstrap(Path)}, + * or null if bootstrap hasn't run. Detectors that want to attach the + * solver to their own {@code JavaParser} (rather than the + * {@link JavaResolved#cu()} carried CompilationUnit) can read this + * and call {@code new ParserConfiguration().setSymbolResolver(...)}. + */ + public JavaSymbolSolver symbolSolver() { + return solver; + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaResolvedTest.java b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaResolvedTest.java new file mode 100644 index 00000000..7d5560f0 --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaResolvedTest.java @@ -0,0 +1,73 @@ +package io.github.randomcodespace.iq.intelligence.resolver.java; + +import com.github.javaparser.StaticJavaParser; +import com.github.javaparser.ast.CompilationUnit; +import com.github.javaparser.symbolsolver.JavaSymbolSolver; +import com.github.javaparser.symbolsolver.resolution.typesolvers.CombinedTypeSolver; +import com.github.javaparser.symbolsolver.resolution.typesolvers.ReflectionTypeSolver; +import io.github.randomcodespace.iq.intelligence.resolver.EmptyResolved; +import io.github.randomcodespace.iq.intelligence.resolver.Resolved; +import io.github.randomcodespace.iq.model.Confidence; +import org.junit.jupiter.api.Test; + +import static org.junit.jupiter.api.Assertions.*; + +/** + * Coverage for {@link JavaResolved}: the language-specific {@link Resolved} + * subtype detectors downcast to. Verifies the three contract obligations — + * isAvailable() == true, sourceConfidence() == RESOLVED, and the cu/solver + * accessors expose what was passed in. + */ +class JavaResolvedTest { + + @Test + void isAvailableIsTrue() { + JavaResolved r = newResolved(); + assertTrue(r.isAvailable(), + "JavaResolved must report available — it carries actual resolution"); + } + + @Test + void sourceConfidenceIsResolved() { + JavaResolved r = newResolved(); + assertEquals(Confidence.RESOLVED, r.sourceConfidence(), + "JavaResolved is the RESOLVED tier — symbol-solver-backed"); + } + + @Test + void cuAccessorReturnsTheParsedCompilationUnit() { + CompilationUnit cu = StaticJavaParser.parse("class Foo {}"); + JavaSymbolSolver solver = new JavaSymbolSolver(new CombinedTypeSolver(new ReflectionTypeSolver())); + JavaResolved r = new JavaResolved(cu, solver); + assertSame(cu, r.cu()); + } + + @Test + void solverAccessorReturnsTheConfiguredSolver() { + CompilationUnit cu = StaticJavaParser.parse("class Foo {}"); + JavaSymbolSolver solver = new JavaSymbolSolver(new CombinedTypeSolver(new ReflectionTypeSolver())); + JavaResolved r = new JavaResolved(cu, solver); + assertSame(solver, r.solver()); + } + + @Test + void implementsResolved() { + // The interface contract — verified by isAssignableFrom rather than + // an instanceof check (which the compiler already enforces). + assertTrue(Resolved.class.isAssignableFrom(JavaResolved.class)); + } + + @Test + void distinctFromEmptyResolvedSentinel() { + // A real JavaResolved must be != EmptyResolved.INSTANCE so detectors + // checking via `==` can short-circuit correctly. + JavaResolved r = newResolved(); + assertNotSame(EmptyResolved.INSTANCE, r); + } + + private static JavaResolved newResolved() { + CompilationUnit cu = StaticJavaParser.parse("class Foo {}"); + JavaSymbolSolver solver = new JavaSymbolSolver(new CombinedTypeSolver(new ReflectionTypeSolver())); + return new JavaResolved(cu, solver); + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolverTest.java b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolverTest.java new file mode 100644 index 00000000..49cb8475 --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolverTest.java @@ -0,0 +1,276 @@ +package io.github.randomcodespace.iq.intelligence.resolver.java; + +import com.github.javaparser.JavaParser; +import com.github.javaparser.ParseResult; +import com.github.javaparser.ParserConfiguration; +import com.github.javaparser.ast.CompilationUnit; +import com.github.javaparser.resolution.types.ResolvedType; +import com.github.javaparser.symbolsolver.resolution.typesolvers.CombinedTypeSolver; +import io.github.randomcodespace.iq.analyzer.DiscoveredFile; +import io.github.randomcodespace.iq.intelligence.resolver.EmptyResolved; +import io.github.randomcodespace.iq.intelligence.resolver.ResolutionException; +import io.github.randomcodespace.iq.intelligence.resolver.Resolved; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.io.TempDir; + +import java.nio.file.Files; +import java.nio.file.Path; +import java.util.Optional; +import java.util.Set; + +import static org.junit.jupiter.api.Assertions.*; + +/** + * Layer 1 unit tests for {@link JavaSymbolResolver}. + * + *

Covers all the contract obligations of the SPI plus a smoke test that + * the solver actually resolves a basic type after bootstrap. Deeper resolution + * scenarios (cross-file type lookups, generics, inner classes) are exercised + * by the integration / E2E tests once detectors migrate. + */ +class JavaSymbolResolverTest { + + private JavaSymbolResolver resolver; + + @BeforeEach + void setUp() { + resolver = new JavaSymbolResolver(new JavaSourceRootDiscovery()); + } + + // ---------- Language declaration ---------- + + @Test + void supportsJavaOnly() { + assertEquals(Set.of("java"), resolver.getSupportedLanguages()); + } + + // ---------- Bootstrap ---------- + + @Test + void bootstrapEmptyProjectStillBuildsReflectionSolver(@TempDir Path tmp) throws ResolutionException { + // No source roots — combined solver still has ReflectionTypeSolver. + resolver.bootstrap(tmp); + CombinedTypeSolver cts = resolver.combinedTypeSolver(); + assertNotNull(cts, "combinedTypeSolver is non-null after bootstrap"); + // ReflectionTypeSolver alone — but solver can still resolve java.lang.String. + assertSolverResolvesString(resolver); + } + + @Test + void bootstrapWithSourceRootsAddsJavaParserTypeSolvers(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src/main/java")); + Files.writeString(tmp.resolve("src/main/java/Foo.java"), "public class Foo {}"); + Files.writeString(tmp.resolve("pom.xml"), ""); + + resolver.bootstrap(tmp); + + assertNotNull(resolver.combinedTypeSolver()); + // After bootstrap with source root, solver resolves Foo from that root. + assertSolverResolvesType(resolver, "public class Bar { Foo f; }", + "Foo", "Foo"); + } + + @Test + void bootstrapTwiceIsIdempotent(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src/main/java")); + Files.writeString(tmp.resolve("pom.xml"), ""); + + resolver.bootstrap(tmp); + CombinedTypeSolver firstCts = resolver.combinedTypeSolver(); + resolver.bootstrap(tmp); + CombinedTypeSolver secondCts = resolver.combinedTypeSolver(); + + assertNotNull(firstCts); + assertNotNull(secondCts); + // Two bootstraps on the same project should produce equivalent state + // (different instances but same wiring). + assertNotSame(firstCts, secondCts, "bootstrap creates a fresh CombinedTypeSolver each call"); + } + + @Test + void combinedTypeSolverIsNullBeforeBootstrap() { + assertNull(resolver.combinedTypeSolver()); + } + + // ---------- resolve() — empty / fallback paths ---------- + + @Test + void resolveBeforeBootstrapReturnsEmpty() { + DiscoveredFile f = new DiscoveredFile(Path.of("Foo.java"), "java", 100); + Resolved r = resolver.resolve(f, parse("class Foo {}")); + assertSame(EmptyResolved.INSTANCE, r, + "no bootstrap → no solver → EmptyResolved (graceful fallback)"); + } + + @Test + void resolveNullFileReturnsEmpty() throws ResolutionException { + resolver.bootstrap(Path.of(System.getProperty("java.io.tmpdir"))); + Resolved r = resolver.resolve(null, parse("class Foo {}")); + assertSame(EmptyResolved.INSTANCE, r); + } + + @Test + void resolveNonJavaFileReturnsEmpty(@TempDir Path tmp) throws ResolutionException { + resolver.bootstrap(tmp); + DiscoveredFile py = new DiscoveredFile(Path.of("foo.py"), "python", 100); + Resolved r = resolver.resolve(py, parse("class Foo {}")); + assertSame(EmptyResolved.INSTANCE, r, + "non-Java file → EmptyResolved even with valid CompilationUnit"); + } + + @Test + void resolveNullAstReturnsEmpty(@TempDir Path tmp) throws ResolutionException { + resolver.bootstrap(tmp); + DiscoveredFile java = new DiscoveredFile(Path.of("Foo.java"), "java", 100); + Resolved r = resolver.resolve(java, null); + assertSame(EmptyResolved.INSTANCE, r); + } + + @Test + void resolveStringAstReturnsEmpty(@TempDir Path tmp) throws ResolutionException { + resolver.bootstrap(tmp); + DiscoveredFile java = new DiscoveredFile(Path.of("Foo.java"), "java", 100); + Resolved r = resolver.resolve(java, "not a CompilationUnit"); + assertSame(EmptyResolved.INSTANCE, r, + "wrong AST type → EmptyResolved instead of ClassCastException"); + } + + @Test + void resolveLanguageCheckIsCaseInsensitive(@TempDir Path tmp) throws ResolutionException { + resolver.bootstrap(tmp); + // "Java" instead of "java" — must still match. + DiscoveredFile mixed = new DiscoveredFile(Path.of("Foo.java"), "Java", 100); + Resolved r = resolver.resolve(mixed, parse("class Foo {}")); + assertNotSame(EmptyResolved.INSTANCE, r); + assertInstanceOf(JavaResolved.class, r); + } + + // ---------- resolve() — happy path ---------- + + @Test + void resolveValidCompilationUnitReturnsJavaResolved(@TempDir Path tmp) throws ResolutionException { + resolver.bootstrap(tmp); + DiscoveredFile java = new DiscoveredFile(Path.of("Foo.java"), "java", 100); + CompilationUnit cu = parse("class Foo {}"); + Resolved r = resolver.resolve(java, cu); + + assertNotSame(EmptyResolved.INSTANCE, r); + assertInstanceOf(JavaResolved.class, r); + assertTrue(r.isAvailable()); + } + + @Test + void javaResolvedCarriesTheCompilationUnit(@TempDir Path tmp) throws ResolutionException { + resolver.bootstrap(tmp); + DiscoveredFile java = new DiscoveredFile(Path.of("Foo.java"), "java", 100); + CompilationUnit cu = parse("class Foo {}"); + + JavaResolved r = (JavaResolved) resolver.resolve(java, cu); + + assertSame(cu, r.cu()); + } + + @Test + void javaResolvedCarriesTheSolver(@TempDir Path tmp) throws ResolutionException { + resolver.bootstrap(tmp); + DiscoveredFile java = new DiscoveredFile(Path.of("Foo.java"), "java", 100); + CompilationUnit cu = parse("class Foo {}"); + + JavaResolved r = (JavaResolved) resolver.resolve(java, cu); + + assertNotNull(r.solver(), + "the resolver builds a real JavaSymbolSolver and threads it through"); + } + + // ---------- Solver smoke tests ---------- + + @Test + void solverResolvesJavaLangStringViaReflection(@TempDir Path tmp) throws ResolutionException { + // Smoke test: ReflectionTypeSolver alone (empty project) lets us resolve + // java.lang.String. Confirms the wiring is correct end-to-end. + resolver.bootstrap(tmp); + assertSolverResolvesString(resolver); + } + + @Test + void solverResolvesProjectClassFromSourceRoot(@TempDir Path tmp) throws Exception { + // bootstrap with a single source root + a single file; resolve a use of + // that class from a separate parsed file. + Files.createDirectories(tmp.resolve("src/main/java/com/example")); + Files.writeString(tmp.resolve("src/main/java/com/example/Foo.java"), + "package com.example; public class Foo { public String bar() { return \"\"; } }"); + Files.writeString(tmp.resolve("pom.xml"), ""); + + resolver.bootstrap(tmp); + + assertSolverResolvesType(resolver, + "package com.example; class Bar { Foo f; }", + "Foo", "com.example.Foo"); + } + + @Test + void resolveProducesDistinctJavaResolvedPerCall(@TempDir Path tmp) throws ResolutionException { + // Two resolve() calls don't cache — each gets a fresh JavaResolved + // record instance carrying the caller's CompilationUnit reference. + resolver.bootstrap(tmp); + DiscoveredFile java = new DiscoveredFile(Path.of("Foo.java"), "java", 100); + CompilationUnit cu1 = parse("class Foo {}"); + CompilationUnit cu2 = parse("class Foo {}"); + + Resolved r1 = resolver.resolve(java, cu1); + Resolved r2 = resolver.resolve(java, cu2); + + assertNotSame(r1, r2, + "no caching — each resolve() returns a fresh JavaResolved"); + assertSame(cu1, ((JavaResolved) r1).cu(), + "cu1 reference is preserved through to JavaResolved.cu()"); + assertSame(cu2, ((JavaResolved) r2).cu(), + "cu2 reference is preserved through to JavaResolved.cu()"); + assertNotSame(((JavaResolved) r1).cu(), ((JavaResolved) r2).cu(), + "the two JavaResolved instances carry distinct CompilationUnit objects (identity, not value)"); + } + + // ---------- Helpers ---------- + + private static CompilationUnit parse(String source) { + return new JavaParser().parse(source).getResult().orElseThrow(); + } + + /** Smoke test: solver resolves java.lang.String via ReflectionTypeSolver. */ + private static void assertSolverResolvesString(JavaSymbolResolver resolver) { + ResolvedType t = resolveTypeOf(resolver, "class Z { String s; }", "String"); + assertNotNull(t); + assertTrue(t.describe().contains("String"), + "solver describes the type — got " + t.describe()); + } + + /** Resolve a field's declared type by name via the resolver's solver. */ + private static void assertSolverResolvesType(JavaSymbolResolver resolver, + String source, + String fieldTypeName, + String expectedFqnFragment) { + ResolvedType t = resolveTypeOf(resolver, source, fieldTypeName); + assertNotNull(t); + assertTrue(t.describe().contains(expectedFqnFragment), + "expected '" + expectedFqnFragment + "' in resolved type, got '" + t.describe() + "'"); + } + + /** Parse the source with the resolver's solver attached and look up the named field's type. */ + private static ResolvedType resolveTypeOf(JavaSymbolResolver resolver, String source, String fieldType) { + ParserConfiguration cfg = new ParserConfiguration().setSymbolResolver(resolver.symbolSolver()); + ParseResult parsed = new JavaParser(cfg).parse(source); + CompilationUnit cu = parsed.getResult().orElseThrow(); + + // Find first field with the matching declared-type name. + Optional fieldTypeNode = cu.findAll( + com.github.javaparser.ast.body.FieldDeclaration.class).stream() + .flatMap(f -> f.getVariables().stream()) + .map(v -> v.getType()) + .filter(t -> t.asString().equals(fieldType)) + .findFirst(); + assertTrue(fieldTypeNode.isPresent(), + "test source has no field of type '" + fieldType + "'"); + return fieldTypeNode.get().resolve(); + } +} From 0609bce5c0fb28f9e90ba9bc84611734f1331c39 Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 17:13:00 +0000 Subject: [PATCH 22/23] feat(analyzer): apply DetectorEmissionDefaults after every detect() call MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Wires the orchestrator stamping pass into all three detect() call sites in Analyzer.java (the main pipeline, the cache-aware runBatchedIndex path, and the regex-fallback path). Every emission whose source is null now gets stamped with: - source = detector.getClass().getSimpleName() - confidence = detector.defaultConfidence() (LEXICAL for regex bases, SYNTACTIC for AST/structured bases) Detectors that stamp explicitly (e.g. setConfidence(RESOLVED) once a detector migrates to ctx.resolved()) are left alone — applyDefaults keys off source==null. Deferred from this commit (will land with Phase 5 detector migration): - ResolverRegistry.bootstrap(repoPath) call at the start of run() — pointless without detectors that consume ctx.resolved() - Per-file ctx = ctx.withResolved(resolver.resolve(file, ast)) — same This commit is purely additive: 2417 tests in analyzer + cli + detector packages all pass, no regressions. The full 3555-test suite is green post-stamping, confirming existing detector behavior is unchanged (detectors don't stamp confidence/source today, so the stamping floor applies uniformly). IndexCommand also benefits transparently — it calls analyzer.runSmartIndex() which routes through one of the wired detect sites. Per sub-project 1 plan Tasks 19-20 (stamping portion). Co-Authored-By: Claude Opus 4.7 (1M context) --- .../io/github/randomcodespace/iq/analyzer/Analyzer.java | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/src/main/java/io/github/randomcodespace/iq/analyzer/Analyzer.java b/src/main/java/io/github/randomcodespace/iq/analyzer/Analyzer.java index 690d197b..d066126d 100644 --- a/src/main/java/io/github/randomcodespace/iq/analyzer/Analyzer.java +++ b/src/main/java/io/github/randomcodespace/iq/analyzer/Analyzer.java @@ -11,6 +11,7 @@ import io.github.randomcodespace.iq.detector.AbstractAntlrDetector; import io.github.randomcodespace.iq.detector.Detector; import io.github.randomcodespace.iq.detector.DetectorContext; +import io.github.randomcodespace.iq.detector.DetectorEmissionDefaults; import io.github.randomcodespace.iq.detector.DetectorRegistry; import io.github.randomcodespace.iq.detector.DetectorResult; import io.github.randomcodespace.iq.detector.DetectorUtils; @@ -1311,6 +1312,9 @@ DetectorResult analyzeFileWithRegistry(DiscoveredFile file, Path repoPath, } try { DetectorResult result = detector.detect(ctx); + // Stamp confidence + source defaults on every emission whose source + // is null. Detectors that already explicitly stamp are left alone. + DetectorEmissionDefaults.applyDefaults(result, detector); allNodes.addAll(result.nodes()); allEdges.addAll(result.edges()); } catch (Throwable e) { @@ -1514,6 +1518,8 @@ DetectorResult analyzeFile(DiscoveredFile file, Path repoPath, DetectorRegistry try { Instant detStart = Instant.now(); DetectorResult result = detector.detect(ctx); + // Stamp orchestrator-managed confidence + source defaults. + DetectorEmissionDefaults.applyDefaults(result, detector); long detMs = Duration.between(detStart, Instant.now()).toMillis(); if (detMs > 2000) { log.warn("🐢 SLOW DETECTOR: {} on {}: {}ms", @@ -1601,6 +1607,8 @@ private DetectorResult analyzeFileRegexOnly(DiscoveredFile file, Path repoPath, } else { result = detector.detect(ctx); } + // Stamp orchestrator-managed confidence + source defaults. + DetectorEmissionDefaults.applyDefaults(result, detector); allNodes.addAll(result.nodes()); allEdges.addAll(result.edges()); } catch (Throwable e) { From b1300bbdf46b3b3ee2d9a674c3a2304e23ee6dd3 Mon Sep 17 00:00:00 2001 From: Amit Kumar Date: Mon, 27 Apr 2026 17:14:31 +0000 Subject: [PATCH 23/23] docs(changelog): document sub-project 1 Phases 1-4 (resolver SPI) Captures the cross-cutting Confidence + source schema change, the intelligence/resolver/ SPI surface, the Java backend (javaparser-symbol-solver-core), DetectorContext.resolved() opt-in, and the per-base confidence floor wired through Analyzer's emission path. Detector migrations to consume ctx.resolved() are explicitly called out as Phase 5 follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 92d97562..ae7094f1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -54,6 +54,36 @@ for that specific tag for the per-commit details. `Confidence` enum + `source` field on every `CodeNode` / `CodeEdge`, 4–6 Java detectors migrated, 9 layers of aggressive testing). Implementation in flight on `feat/sub-project-1-resolver-spi-and-java-pilot`. +- **Symbol-resolver SPI** (sub-project 1, Phases 1–4 of the resolver-and-Java-pilot + plan): the foundation for moving the graph from regex-class-of-correctness + to AST-and-symbol-resolution-class-of-correctness. New `Confidence` enum + (`LEXICAL`/`SYNTACTIC`/`RESOLVED` with stable `score()` mapping) plus a + `source` field land on every `CodeNode` and `CodeEdge`, round-trip through + Neo4j (bare `confidence`/`source` properties on nodes and `RELATES_TO` + relationships) and through the H2 analysis cache (`CACHE_VERSION` bumped + 4 → 5 so existing v4 caches drop and rebuild on next open). Read paths are + non-throwing — legacy data without these fields reads back as + `LEXICAL`/null, never NPEs. New SPI under + `intelligence/resolver/`: `Resolved` interface + `EmptyResolved` singleton + sentinel, `SymbolResolver` per-language backend, `ResolutionException`, + `ResolverRegistry` (Spring `@Service` with deterministic alphabetical + bootstrap, case-insensitive lookup, per-resolver failure isolation). First + backend `JavaSymbolResolver` wraps `javaparser-symbol-solver-core` 3.28.0 + (Apache-2.0, same release train as `javaparser-core`) with a + `JavaSourceRootDiscovery` that walks Maven/Gradle/plain layouts under a + project root (skipping `target/`, `build/`, `node_modules/`, `.git/`, etc.; + symlink-loop-safe via `NOFOLLOW_LINKS`). `DetectorContext` now carries an + `Optional` (`withResolved()` opt-in, `Optional.empty()` for every + detector that doesn't care — fully backward compatible). `Detector.defaultConfidence()` + declares the per-detector floor (`LEXICAL` for regex bases, `SYNTACTIC` for + AST/structured/JavaParser/JavaMessaging bases) and `DetectorEmissionDefaults.applyDefaults` + is wired into every `detector.detect()` call site in `Analyzer.java` — + emissions whose `source` is null get stamped at the orchestration boundary + (detectors that explicitly stamp survive untouched). 11 atomic commits + ship with ~290 new tests covering happy paths, legacy-data fallbacks, + malformed inputs, determinism, concurrency-safe construction, and singleton + invariants. Detector migrations to consume `ctx.resolved()` and the + resolver-bootstrap-into-Analyzer hook follow in sub-project 1 Phase 5. ### Changed