feat(intelligence): Phase 5 language-specific enrichment (RAN-162)#29
Conversation
Adds lodash >= 4.17.24 override in package.json to resolve two CVEs (HIGH code injection via _.template, MODERATE prototype pollution via _.unset/_.omit) in transitive dependencies swagger-ui-react and @antv/g6. All lodash instances now resolve to 4.18.1. npm audit reports 0 vulnerabilities. Co-Authored-By: Paperclip <noreply@paperclip.ing>
…ry planner (RAN-148) Adds Phase 3 of the Repository Intelligence system: - CapabilityMatrix: static per-language × per-dimension capability registry (EXACT/PARTIAL/LEXICAL_ONLY/UNSUPPORTED) for Java, TypeScript, JavaScript, Python, Go, C#, Rust, and lexical-only languages. - QueryPlanner (@service): deterministic routing to GRAPH_FIRST, MERGED, LEXICAL_FIRST, or DEGRADED paths based solely on QueryType + language + capability level. No LLM, no probabilistic logic. - QueryType enum: FIND_SYMBOL, FIND_REFERENCES, FIND_CALLERS, FIND_DEPENDENCIES, SEARCH_TEXT, FIND_CONFIG. - CapabilityDimension enum: 9 analysis dimensions. - QueryPlan record: carries route, capability snapshot, and optional degradation note. - GET /api/capabilities endpoint (optional ?language= filter). - get_capabilities MCP tool (32nd tool). - 40 unit + determinism tests (20 CapabilityMatrixTest, 20 QueryPlannerTest). Co-Authored-By: Paperclip <noreply@paperclip.ing>
…le inventory (RAN-146) Implements the foundational contracts for the Repository Intelligence layer: - intelligence/ package: Provenance, RepositoryIdentity, FileInventory, FileEntry, FileClassification, CapabilityLevel, ArtifactManifest records - Provenance stored via prov_* keys in CodeNode.properties (round-trips through Neo4j) - RepositoryIdentity resolves git URL, commit SHA, branch from git CLI at analysis time - FileInventory builds a deterministic sorted list of all discovered files with classification heuristics (source/config/doc/test/generated) - GraphBuilder now accepts Provenance as constructor parameter (not a mutable setter) - Analyzer and EnrichCommand stamp provenance on all nodes during pipeline - BundleCommand upgraded to use ArtifactManifest record (repo identity, inventory summary) - Tests: ProvenanceTest (6), FileInventoryTest (8), ArtifactManifestTest (5), ProvenanceIntegrationTest (2) — all nodes carry provenance + determinism verified Addresses PE architecture review blocking constraints from RAN-150: - BLOCKING 1: Provenance uses properties map (prov_* prefix), not direct CodeNode fields - BLOCKING 2: Provenance is a GraphBuilder constructor parameter, not a setter - BLOCKING 3: FileEntry added to intelligence/ without modifying DiscoveredFile Co-Authored-By: Paperclip <noreply@paperclip.ing>
…y constructor + AnalysisCache hash reuse - GraphBuilder now accepts RepositoryIdentity + extractorVersion as constructor params; Provenance is derived internally (never constructed externally by callers) - Analyzer and EnrichCommand updated to pass RepositoryIdentity directly to GraphBuilder - AnalysisCache.getHashForPath() added for reverse path→hash lookup - buildFileInventory() now populates FileEntry.contentHash from cache (no file re-reads) Addresses BLOCKING 2 and BLOCKING 3 from PE review on RAN-150. Co-Authored-By: Paperclip <noreply@paperclip.ing>
…rovenance round-trip (RAN-146) - FileInventory.countsByClassification() now uses TreeMap for deterministic key ordering (fixes non-deterministic HashMap iteration in manifest by_classification field) - Provenance.fromProperties() handles String schema version from Neo4j round-trip (bulkSave stores Integer props as String via .toString(); parseInt handles both types) - Add ProvenanceNeo4jRoundTripTest: two mock-based tests verifying prov_* -> prop_prov_* -> prov_* round-trip including schemaVersion Integer/String coercion and null fields Co-Authored-By: Paperclip <noreply@paperclip.ing>
…re entries (RAN-154) - countsByLanguage(): use TreeMap::new for deterministic alphabetical key ordering - toSummary() byLang: add thenComparing secondary sort to break ties alphabetically - toSummary() byCls: use LinkedHashMap::new to preserve TreeMap insertion order - .gitignore: add playwright-report/ and test-results/ frontend build artifacts Co-Authored-By: Paperclip <noreply@paperclip.ing>
…anner profile guard (RAN-155) - Add CPP_CAPS table (distinct from C# — no ORM, lexical-only auth) - Add explicit case "cpp","c++" to CapabilityMatrix.tableFor() - Add "cpp" to asSerializableMap() hardcoded language list - Remove incorrect CSHARP_CAPS fallback for cpp in ANTLR_LANGUAGES branch - Add @Profile("serving") to QueryPlanner so it is not instantiated during indexing CLI runs Co-Authored-By: Paperclip <noreply@paperclip.ing>
…nGit() Process does not implement AutoCloseable in Java 25, so try-with-resources is not applicable. Use try-finally with proc.destroy() to ensure OS process handles are always released, resolving SonarQube C-Reliability finding. Closes RAN-156 Co-Authored-By: Paperclip <noreply@paperclip.ing>
…be S2095 Wrap proc.getInputStream() in try-with-resources so the InputStream is closed after readAllBytes(). proc.destroy() in the finally block remains to terminate the child process; the InputStream close ensures the file descriptor is released immediately rather than waiting on GC. Co-Authored-By: Paperclip <noreply@paperclip.ing>
…+ snippet store (RAN-147) New package: intelligence/lexical - CodeSnippet: bounded source snippet record (path, line range, language, provenance) - LexicalResult: query result record (node, score, matchedField, snippet, provenance) - DocCommentExtractor: extracts Javadoc/JSDoc, Go/Rust line comments, Python docstrings - SnippetStore: extracts bounded code snippets (max 50 lines) from source files - LexicalEnricher: populates lex_comment and lex_config_keys properties before Neo4j load - LexicalQueryService: findByIdentifier, findByDocComment, findByConfigKey (serving profile) Infrastructure changes: - GraphStore: add searchLexical() + lexical_index (standard analyzer on prop_lex_* fields) - EnrichCommand: inject LexicalEnricher, add enrichment step before Neo4j bulk load - lexical_index created in both GraphStore.bulkSave() and EnrichCommand Tests: 24 new tests across DocCommentExtractor, SnippetStore, LexicalEnricher All 1591 tests passing. Co-Authored-By: Paperclip <noreply@paperclip.ing>
…uage, RepositoryIdentity (RAN-159) - RepositoryIdentityTest (8 tests): non-git dir graceful null, commit SHA on git repo, detached HEAD branch normalised to null, record equality/null safety - ProvenanceEdgeCasesTest (6 tests): empty dir, single-file, unsupported-language-only, mixed-language (Java/TS/Python/Go), no-git-history null provenance fields, mixed-language determinism - LexicalCrossLanguageTest (11 tests): TypeScript/JavaScript block comments, Python triple-quoted docstrings (single-line and multiline), Go line comments, cross-language determinism, DocCommentExtractor direct calls All 1616 tests pass (0 failures, 0 errors, 31 skipped). Co-Authored-By: Paperclip <noreply@paperclip.ing>
…_DEFAULT_ENCODING (RAN-160) Replace new String(is.readAllBytes()) with new String(is.readAllBytes(), StandardCharsets.UTF_8) to eliminate SpotBugs HIGH DM_DEFAULT_ENCODING finding on RepositoryIdentity.java:44. This was the sole blocker gating all Phase 1-3 PRs from merge. Co-Authored-By: Paperclip <noreply@paperclip.ing>
… for Java, TS, Python, Go (RAN-162) - LanguageExtractor interface + LanguageExtractionResult record - LanguageEnricher Spring component runs after LexicalEnricher in EnrichCommand - JavaLanguageExtractor: CALLS edges via JavaParser MethodCallExpr, type hierarchy hints - TypeScriptLanguageExtractor: import-to-symbol resolution, JSDoc type hints - PythonLanguageExtractor: from/import resolution, def signature type hints - GoLanguageExtractor: package import resolution, interface satisfaction detection - 36 tests across all extractors + pipeline integration and failure-resilience - EnrichCommand updated to inject and invoke LanguageEnricher Co-Authored-By: Paperclip <noreply@paperclip.ing>
…LS, confidence, per-edge provenance (RAN-164) - Scope lookupByLabel() to dedup by node ID — drop ambiguous matches where 2+ nodes share the same method name, eliminating false-positive CALLS edges (save, get, execute…) - Change confidence to PARTIAL for all registry-lookup edges (cross-file by definition) - Populate confidence + extractorName properties on every emitted edge for all 4 extractors - Add negative test: two unrelated classes with same method name produce zero CALLS edges - Update extract_confidenceIsExact_whenCallsFound → expect PARTIAL (correct semantics) Co-Authored-By: Paperclip <noreply@paperclip.ing>
Code re-review (RAN-164 fixes)Original issues A and B are correctly fixed. Found 1 new issue:
🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
collectImportPaths() used an ArrayList allowing duplicate paths when a file has both a block import and a single-line import for the same package path. Switch to LinkedHashSet for deduplication (insertion order preserved), then return a new ArrayList to maintain the List<String> contract. Added test: extract_duplicateImportBothStyles_noDuplicateEdges verifies that a file with both import styles for the same package produces exactly one IMPORTS edge. Co-Authored-By: Paperclip <noreply@paperclip.ing>
Code reviewRAN-164 and RAN-170 fixes verified. Found 3 issues:
🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
…, Python ambiguity (RAN-173) - GoLanguageExtractor: sort satisfied list before joining to guarantee deterministic satisfies_interfaces hint across JVM runs - TypeScriptLanguageExtractor: use LinkedHashSet to deduplicate IMPORTS edges when same symbol matched by both NAMED_IMPORT and DEFAULT_IMPORT - PythonLanguageExtractor: replace registry.get(sym) with lookupUnambiguous() to skip false-positive IMPORTS edges for common short names (join, get, load) that match multiple nodes — mirrors JavaLanguageExtractor.lookupByLabel() - Add regression tests for all three fixes Co-Authored-By: Paperclip <noreply@paperclip.ing>
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. All three issues from the previous review verified fixed in 00b62f5:
Regression tests for all three cases pass. PR #29 approved for merge. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Resolves add/add conflicts: - CapabilityMatrix.java/Test: kept main's version (cpp support, RAN-172) Resolves content conflicts: - EnrichCommand.java/Test: kept feature branch (LanguageEnricher, Phase 5) Co-Authored-By: Paperclip <noreply@paperclip.ing>
Code reviewFound 4 issues:
Both
The 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
…RAN-175)
- LanguageEnricher: replace HashMap with TreeMap for nodesByFile to ensure
deterministic file-iteration order (violates CLAUDE.md otherwise)
- GoLanguageExtractor: apply lookupUnambiguous() on both pkgName and full-path
fallback — prevents false-positive IMPORTS edges for ambiguous short names
- PythonLanguageExtractor: add Set<String> seen dedup guard so FROM_IMPORT +
PLAIN_IMPORT cannot emit duplicate IMPORTS edges for the same target
- PythonLanguageExtractor: strip alias before lookupUnambiguous in FROM_IMPORT
loop ("Y as Z" → "Y") so aliased imports are resolved rather than silently dropped
Co-Authored-By: Paperclip <noreply@paperclip.ing>
|


Summary
LanguageExtractorinterface +LanguageExtractionResultrecord inintelligence/extractor/LanguageEnricherSpring component auto-discovers extractor beans and runs afterLexicalEnricherinEnrichCommandJavaLanguageExtractor: CALLS edges via JavaParserMethodCallExpr, type hierarchy hints (extends_type,implements_types)TypeScriptLanguageExtractor: import-to-symbol resolution (named + default), JSDoc@param/@returnstype hints; also handles.jsfiles via language aliasPythonLanguageExtractor:from module import X+import Xresolution,def fn(x: int) -> strtype hint extractionGoLanguageExtractor: block/single import resolution, structural interface satisfaction detectionTest plan
JavaLanguageExtractorTest— 8 tests: call edges, type hierarchy, wrong language, no registry, determinism, confidence EXACT/PARTIALTypeScriptLanguageExtractorTest— 7 tests: named/default imports, JSDoc hints, empty, confidence, determinismPythonLanguageExtractorTest— 8 tests: from/plain imports, type hints, self-param exclusion, null content, confidence, determinismGoLanguageExtractorTest— 7 tests: block/single imports, unknown import, null content, confidence, determinismLanguageEnricherTest— 6 tests: no extractors, pipeline edges, type hint propagation, failure resilience, JS→TS alias, extension mappingEnrichCommandTest— existing 3 tests updated to compile with new constructormvn test— all tests pass🤖 Generated with Claude Code