feat(intelligence): Phase 5 language-specific enrichment (RAN-162) by aksOps · Pull Request #29 · RandomCodeSpace/codeiq

aksOps · 2026-04-03T20:03:47Z

Summary

Implements Phase 5 of the Repository Intelligence layer: language-specific extractors for Java, TypeScript/JavaScript, Python, and Go
LanguageExtractor interface + LanguageExtractionResult record in intelligence/extractor/
LanguageEnricher Spring component auto-discovers extractor beans and runs after LexicalEnricher in EnrichCommand
JavaLanguageExtractor: CALLS edges via JavaParser MethodCallExpr, type hierarchy hints (extends_type, implements_types)
TypeScriptLanguageExtractor: import-to-symbol resolution (named + default), JSDoc @param/@returns type hints; also handles .js files via language alias
PythonLanguageExtractor: from module import X + import X resolution, def fn(x: int) -> str type hint extraction
GoLanguageExtractor: block/single import resolution, structural interface satisfaction detection
36 tests covering positive, negative, confidence assertion, determinism, failure resilience

Test plan

JavaLanguageExtractorTest — 8 tests: call edges, type hierarchy, wrong language, no registry, determinism, confidence EXACT/PARTIAL
TypeScriptLanguageExtractorTest — 7 tests: named/default imports, JSDoc hints, empty, confidence, determinism
PythonLanguageExtractorTest — 8 tests: from/plain imports, type hints, self-param exclusion, null content, confidence, determinism
GoLanguageExtractorTest — 7 tests: block/single imports, unknown import, null content, confidence, determinism
LanguageEnricherTest — 6 tests: no extractors, pipeline edges, type hint propagation, failure resilience, JS→TS alias, extension mapping
EnrichCommandTest — existing 3 tests updated to compile with new constructor
Full mvn test — all tests pass

🤖 Generated with Claude Code

Adds lodash >= 4.17.24 override in package.json to resolve two CVEs (HIGH code injection via _.template, MODERATE prototype pollution via _.unset/_.omit) in transitive dependencies swagger-ui-react and @antv/g6. All lodash instances now resolve to 4.18.1. npm audit reports 0 vulnerabilities. Co-Authored-By: Paperclip <noreply@paperclip.ing>

@service

…ry planner (RAN-148) Adds Phase 3 of the Repository Intelligence system: - CapabilityMatrix: static per-language × per-dimension capability registry (EXACT/PARTIAL/LEXICAL_ONLY/UNSUPPORTED) for Java, TypeScript, JavaScript, Python, Go, C#, Rust, and lexical-only languages. - QueryPlanner (@service): deterministic routing to GRAPH_FIRST, MERGED, LEXICAL_FIRST, or DEGRADED paths based solely on QueryType + language + capability level. No LLM, no probabilistic logic. - QueryType enum: FIND_SYMBOL, FIND_REFERENCES, FIND_CALLERS, FIND_DEPENDENCIES, SEARCH_TEXT, FIND_CONFIG. - CapabilityDimension enum: 9 analysis dimensions. - QueryPlan record: carries route, capability snapshot, and optional degradation note. - GET /api/capabilities endpoint (optional ?language= filter). - get_capabilities MCP tool (32nd tool). - 40 unit + determinism tests (20 CapabilityMatrixTest, 20 QueryPlannerTest). Co-Authored-By: Paperclip <noreply@paperclip.ing>

…le inventory (RAN-146) Implements the foundational contracts for the Repository Intelligence layer: - intelligence/ package: Provenance, RepositoryIdentity, FileInventory, FileEntry, FileClassification, CapabilityLevel, ArtifactManifest records - Provenance stored via prov_* keys in CodeNode.properties (round-trips through Neo4j) - RepositoryIdentity resolves git URL, commit SHA, branch from git CLI at analysis time - FileInventory builds a deterministic sorted list of all discovered files with classification heuristics (source/config/doc/test/generated) - GraphBuilder now accepts Provenance as constructor parameter (not a mutable setter) - Analyzer and EnrichCommand stamp provenance on all nodes during pipeline - BundleCommand upgraded to use ArtifactManifest record (repo identity, inventory summary) - Tests: ProvenanceTest (6), FileInventoryTest (8), ArtifactManifestTest (5), ProvenanceIntegrationTest (2) — all nodes carry provenance + determinism verified Addresses PE architecture review blocking constraints from RAN-150: - BLOCKING 1: Provenance uses properties map (prov_* prefix), not direct CodeNode fields - BLOCKING 2: Provenance is a GraphBuilder constructor parameter, not a setter - BLOCKING 3: FileEntry added to intelligence/ without modifying DiscoveredFile Co-Authored-By: Paperclip <noreply@paperclip.ing>

…y constructor + AnalysisCache hash reuse - GraphBuilder now accepts RepositoryIdentity + extractorVersion as constructor params; Provenance is derived internally (never constructed externally by callers) - Analyzer and EnrichCommand updated to pass RepositoryIdentity directly to GraphBuilder - AnalysisCache.getHashForPath() added for reverse path→hash lookup - buildFileInventory() now populates FileEntry.contentHash from cache (no file re-reads) Addresses BLOCKING 2 and BLOCKING 3 from PE review on RAN-150. Co-Authored-By: Paperclip <noreply@paperclip.ing>

…rovenance round-trip (RAN-146) - FileInventory.countsByClassification() now uses TreeMap for deterministic key ordering (fixes non-deterministic HashMap iteration in manifest by_classification field) - Provenance.fromProperties() handles String schema version from Neo4j round-trip (bulkSave stores Integer props as String via .toString(); parseInt handles both types) - Add ProvenanceNeo4jRoundTripTest: two mock-based tests verifying prov_* -> prop_prov_* -> prov_* round-trip including schemaVersion Integer/String coercion and null fields Co-Authored-By: Paperclip <noreply@paperclip.ing>

…re entries (RAN-154) - countsByLanguage(): use TreeMap::new for deterministic alphabetical key ordering - toSummary() byLang: add thenComparing secondary sort to break ties alphabetically - toSummary() byCls: use LinkedHashMap::new to preserve TreeMap insertion order - .gitignore: add playwright-report/ and test-results/ frontend build artifacts Co-Authored-By: Paperclip <noreply@paperclip.ing>

@Profile

…anner profile guard (RAN-155) - Add CPP_CAPS table (distinct from C# — no ORM, lexical-only auth) - Add explicit case "cpp","c++" to CapabilityMatrix.tableFor() - Add "cpp" to asSerializableMap() hardcoded language list - Remove incorrect CSHARP_CAPS fallback for cpp in ANTLR_LANGUAGES branch - Add @Profile("serving") to QueryPlanner so it is not instantiated during indexing CLI runs Co-Authored-By: Paperclip <noreply@paperclip.ing>

…nGit() Process does not implement AutoCloseable in Java 25, so try-with-resources is not applicable. Use try-finally with proc.destroy() to ensure OS process handles are always released, resolving SonarQube C-Reliability finding. Closes RAN-156 Co-Authored-By: Paperclip <noreply@paperclip.ing>

…be S2095 Wrap proc.getInputStream() in try-with-resources so the InputStream is closed after readAllBytes(). proc.destroy() in the finally block remains to terminate the child process; the InputStream close ensures the file descriptor is released immediately rather than waiting on GC. Co-Authored-By: Paperclip <noreply@paperclip.ing>

…+ snippet store (RAN-147) New package: intelligence/lexical - CodeSnippet: bounded source snippet record (path, line range, language, provenance) - LexicalResult: query result record (node, score, matchedField, snippet, provenance) - DocCommentExtractor: extracts Javadoc/JSDoc, Go/Rust line comments, Python docstrings - SnippetStore: extracts bounded code snippets (max 50 lines) from source files - LexicalEnricher: populates lex_comment and lex_config_keys properties before Neo4j load - LexicalQueryService: findByIdentifier, findByDocComment, findByConfigKey (serving profile) Infrastructure changes: - GraphStore: add searchLexical() + lexical_index (standard analyzer on prop_lex_* fields) - EnrichCommand: inject LexicalEnricher, add enrichment step before Neo4j bulk load - lexical_index created in both GraphStore.bulkSave() and EnrichCommand Tests: 24 new tests across DocCommentExtractor, SnippetStore, LexicalEnricher All 1591 tests passing. Co-Authored-By: Paperclip <noreply@paperclip.ing>

…uage, RepositoryIdentity (RAN-159) - RepositoryIdentityTest (8 tests): non-git dir graceful null, commit SHA on git repo, detached HEAD branch normalised to null, record equality/null safety - ProvenanceEdgeCasesTest (6 tests): empty dir, single-file, unsupported-language-only, mixed-language (Java/TS/Python/Go), no-git-history null provenance fields, mixed-language determinism - LexicalCrossLanguageTest (11 tests): TypeScript/JavaScript block comments, Python triple-quoted docstrings (single-line and multiline), Go line comments, cross-language determinism, DocCommentExtractor direct calls All 1616 tests pass (0 failures, 0 errors, 31 skipped). Co-Authored-By: Paperclip <noreply@paperclip.ing>

…_DEFAULT_ENCODING (RAN-160) Replace new String(is.readAllBytes()) with new String(is.readAllBytes(), StandardCharsets.UTF_8) to eliminate SpotBugs HIGH DM_DEFAULT_ENCODING finding on RepositoryIdentity.java:44. This was the sole blocker gating all Phase 1-3 PRs from merge. Co-Authored-By: Paperclip <noreply@paperclip.ing>

… for Java, TS, Python, Go (RAN-162) - LanguageExtractor interface + LanguageExtractionResult record - LanguageEnricher Spring component runs after LexicalEnricher in EnrichCommand - JavaLanguageExtractor: CALLS edges via JavaParser MethodCallExpr, type hierarchy hints - TypeScriptLanguageExtractor: import-to-symbol resolution, JSDoc type hints - PythonLanguageExtractor: from/import resolution, def signature type hints - GoLanguageExtractor: package import resolution, interface satisfaction detection - 36 tests across all extractors + pipeline integration and failure-resilience - EnrichCommand updated to inject and invoke LanguageEnricher Co-Authored-By: Paperclip <noreply@paperclip.ing>

…LS, confidence, per-edge provenance (RAN-164) - Scope lookupByLabel() to dedup by node ID — drop ambiguous matches where 2+ nodes share the same method name, eliminating false-positive CALLS edges (save, get, execute…) - Change confidence to PARTIAL for all registry-lookup edges (cross-file by definition) - Populate confidence + extractorName properties on every emitted edge for all 4 extractors - Add negative test: two unrelated classes with same method name produce zero CALLS edges - Update extract_confidenceIsExact_whenCallsFound → expect PARTIAL (correct semantics) Co-Authored-By: Paperclip <noreply@paperclip.ing>

aksOps · 2026-04-03T20:41:13Z

Code re-review (RAN-164 fixes)

Original issues A and B are correctly fixed. Found 1 new issue:

** — duplicate IMPORTS edges when file uses both import styles**

applies IMPORT_SINGLE_RE (lines 82–90) and IMPORT_BLOCK_RE (lines 93–104) independently with no deduplication. A Go file using both styles for the same package (e.g. import "fmt" AND a block import ( "fmt" )) will emit two IMPORTS edges with identical source, target, and kind. GraphBuilder.flush() does not deduplicate edges by (source, target, kind) before persisting to Neo4j. Fix: collect paths into a LinkedHashSet before building edges, or perform a set-union of the two regex results.

https://github.com/RandomCodeSpace/code-iq/blob/eb049ceae6a375080f9ede7281610c18e4197cbf/src/main/java/io/github/randomcodespace/iq/detector/go/GoLanguageExtractor.java#L80-L106

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

collectImportPaths() used an ArrayList allowing duplicate paths when a file has both a block import and a single-line import for the same package path. Switch to LinkedHashSet for deduplication (insertion order preserved), then return a new ArrayList to maintain the List<String> contract. Added test: extract_duplicateImportBothStyles_noDuplicateEdges verifies that a file with both import styles for the same package produces exactly one IMPORTS edge. Co-Authored-By: Paperclip <noreply@paperclip.ing>

aksOps · 2026-04-03T21:21:16Z

Code review

RAN-164 and RAN-170 fixes verified. Found 3 issues:

GoLanguageExtractor.extractInterfaceHints() builds satisfied list by iterating registry.values() (unordered Map) and calls String.join(", ", satisfied) without sorting first — same input produces different satisfies_interfaces hint ordering across JVM runs. (CLAUDE.md says "Determinism is Non-Negotiable: Same input MUST produce same output, every time")

https://github.com/RandomCodeSpace/code-iq/blob/f7390b7be69dcbbfebacaf5a4b6ec49a5c1a557f/src/main/java/io/github/randomcodespace/iq/intelligence/extractor/go/GoLanguageExtractor.java#L138-L153

TypeScriptLanguageExtractor.extractImportEdges() collects edges into a plain ArrayList and runs both NAMED_IMPORT and DEFAULT_IMPORT matchers with no deduplication — the same (source, target) pair can be added twice, producing duplicate IMPORTS edges. This is the same class of bug just fixed in Go (RAN-170).

https://github.com/RandomCodeSpace/code-iq/blob/f7390b7be69dcbbfebacaf5a4b6ec49a5c1a557f/src/main/java/io/github/randomcodespace/iq/intelligence/extractor/typescript/TypeScriptLanguageExtractor.java#L74-L111

PythonLanguageExtractor.extractImportEdges() handles from module import symbol by discarding the module and looking up the symbol name alone (registry.get(sym)). Common names like join, get, load will match any identically-named node in the registry, producing false-positive IMPORTS edges. JavaLanguageExtractor addressed the same pattern with lookupByLabel() dropping ambiguous multi-match results; Python has no equivalent guard.

https://github.com/RandomCodeSpace/code-iq/blob/f7390b7be69dcbbfebacaf5a4b6ec49a5c1a557f/src/main/java/io/github/randomcodespace/iq/intelligence/extractor/python/PythonLanguageExtractor.java#L74-L94

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

…, Python ambiguity (RAN-173) - GoLanguageExtractor: sort satisfied list before joining to guarantee deterministic satisfies_interfaces hint across JVM runs - TypeScriptLanguageExtractor: use LinkedHashSet to deduplicate IMPORTS edges when same symbol matched by both NAMED_IMPORT and DEFAULT_IMPORT - PythonLanguageExtractor: replace registry.get(sym) with lookupUnambiguous() to skip false-positive IMPORTS edges for common short names (join, get, load) that match multiple nodes — mirrors JavaLanguageExtractor.lookupByLabel() - Add regression tests for all three fixes Co-Authored-By: Paperclip <noreply@paperclip.ing>

aksOps · 2026-04-03T22:03:57Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

All three issues from the previous review verified fixed in 00b62f5:

GoLanguageExtractor determinism: Collections.sort(satisfied) before String.join — guaranteed alphabetical ordering of satisfies_interfaces hint
TypeScriptLanguageExtractor dedup: LinkedHashSet<String> seen guards both NAMED_IMPORT and DEFAULT_IMPORT paths — duplicate IMPORTS edges impossible
PythonLanguageExtractor ambiguity guard: lookupUnambiguous() mirrors JavaLanguageExtractor.lookupByLabel() — returns null when multiple nodes share the same label, eliminating false-positive IMPORTS edges for short names like join, get, load

Regression tests for all three cases pass. PR #29 approved for merge.

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

Resolves add/add conflicts: - CapabilityMatrix.java/Test: kept main's version (cpp support, RAN-172) Resolves content conflicts: - EnrichCommand.java/Test: kept feature branch (LanguageEnricher, Phase 5) Co-Authored-By: Paperclip <noreply@paperclip.ing>

aksOps · 2026-04-03T22:23:05Z

Code review

Found 4 issues:

LanguageEnricher iterates nodesByFile (a HashMap) without sorting — non-deterministic edge ordering (CLAUDE.md says "No Set iteration without sorting first (TreeSet or stream().sorted())" and "Same input MUST produce same output, every time")

https://github.com/RandomCodeSpace/code-iq/blob/6003cb82f480bf7091ec175bc2a311f63c35b6bc/src/main/java/io/github/randomcodespace/iq/intelligence/extractor/LanguageEnricher.java#L70-L83

HashMap.entrySet() iteration order is non-deterministic across JVM runs. Files are processed in unpredictable order, so edges are appended to the result list in different order each run. Fix: replace with TreeMap at construction time, or sort entries by key before iterating.

GoLanguageExtractor.extractImportEdges uses registry.get(pkgName) with no ambiguity guard — false-positive IMPORTS edges (same class of bug fixed in Java via RAN-164 and Python via RAN-173)

https://github.com/RandomCodeSpace/code-iq/blob/6003cb82f480bf7091ec175bc2a311f63c35b6bc/src/main/java/io/github/randomcodespace/iq/intelligence/extractor/go/GoLanguageExtractor.java#L90-L97

pkgName is the last path segment (e.g. "utils", "http", "config"). A direct registry.get(pkgName) returns an arbitrary match if multiple nodes share that label, creating false-positive edges. Both JavaLanguageExtractor (lookupByLabel) and PythonLanguageExtractor (lookupUnambiguous) guard against this — Go should follow the same pattern.

PythonLanguageExtractor.extractImportEdges missing seen dedup set — duplicate IMPORTS edges possible (TypeScript added this guard in RAN-173)

https://github.com/RandomCodeSpace/code-iq/blob/6003cb82f480bf7091ec175bc2a311f63c35b6bc/src/main/java/io/github/randomcodespace/iq/intelligence/extractor/python/PythonLanguageExtractor.java#L68-L105

Both FROM_IMPORT and PLAIN_IMPORT patterns can match the same symbol in the same file (e.g. a file with both from os import path and import path). Without a Set<String> seen guard (like TypeScriptLanguageExtractor added), duplicate IMPORTS edges with the same ID are emitted. lookupUnambiguous does not provide dedup protection across pattern matches.

PythonLanguageExtractor silently drops aliased imports (from X import Y as Z) — edges for aliased symbols are never created

https://github.com/RandomCodeSpace/code-iq/blob/6003cb82f480bf7091ec175bc2a311f63c35b6bc/src/main/java/io/github/randomcodespace/iq/intelligence/extractor/python/PythonLanguageExtractor.java#L74-L82

The FROM_IMPORT pattern captures the full text after import, including aliases (e.g. "Path as P"). This string is passed directly to lookupUnambiguous which searches for a node labeled "Path as P" — no such node exists. TypeScriptLanguageExtractor strips aliases before lookup (sym = sym.substring(0, asIdx).trim()); the same fix is needed here.

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

…RAN-175) - LanguageEnricher: replace HashMap with TreeMap for nodesByFile to ensure deterministic file-iteration order (violates CLAUDE.md otherwise) - GoLanguageExtractor: apply lookupUnambiguous() on both pkgName and full-path fallback — prevents false-positive IMPORTS edges for ambiguous short names - PythonLanguageExtractor: add Set<String> seen dedup guard so FROM_IMPORT + PLAIN_IMPORT cannot emit duplicate IMPORTS edges for the same target - PythonLanguageExtractor: strip alias before lookupUnambiguous in FROM_IMPORT loop ("Y as Z" → "Y") so aliased imports are resolved rather than silently dropped Co-Authored-By: Paperclip <noreply@paperclip.ing>

sonarqubecloud · 2026-04-03T22:32:43Z

Quality Gate failed

Failed conditions
2 Security Hotspots

See analysis details on SonarQube Cloud

aksOps and others added 15 commits April 3, 2026 16:32

checkpoint: pre-yolo 20260403-163239

b4d03ea

aksOps merged commit ca0d932 into main Apr 3, 2026
9 of 10 checks passed

aksOps deleted the feature/ran-162-language-extractors branch April 26, 2026 05:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(intelligence): Phase 5 language-specific enrichment (RAN-162)#29

feat(intelligence): Phase 5 language-specific enrichment (RAN-162)#29
aksOps merged 19 commits into
mainfrom
feature/ran-162-language-extractors

aksOps commented Apr 3, 2026

Uh oh!

aksOps commented Apr 3, 2026

Uh oh!

aksOps commented Apr 3, 2026

Uh oh!

aksOps commented Apr 3, 2026

Uh oh!

aksOps commented Apr 3, 2026

Uh oh!

sonarqubecloud Bot commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aksOps commented Apr 3, 2026

Summary

Test plan

Uh oh!

aksOps commented Apr 3, 2026

Code re-review (RAN-164 fixes)

Uh oh!

aksOps commented Apr 3, 2026

Code review

Uh oh!

aksOps commented Apr 3, 2026

Code review

Uh oh!

aksOps commented Apr 3, 2026

Code review

Uh oh!

sonarqubecloud Bot commented Apr 3, 2026

Quality Gate failed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant