feat(detector): SQL/migration detector + SQL_ENTITY NodeKind (#48)#57
Merged
Conversation
Adds a SqlMigrationDetector under detector/sql that extracts schema-level entities (tables, views, schemas) from raw SQL DDL and framework-specific migration files: Flyway (V*__*.sql), Liquibase (XML + YAML), Alembic (versions/*.py with alembic/op marker guard), Rails (db/migrate/*.rb), and Prisma (migrations/*/migration.sql). Path/marker discriminators prevent false positives on arbitrary .py/.rb/.xml/.yml. Enum additions: - NodeKind.SQL_ENTITY (new): schema-level table/view/schema node, distinct from the code-level ENTITY (JPA/ORM) kind. - EdgeKind.REFERENCES_TABLE (new): any node (JPA ENTITY, ORM model, raw SQL_ENTITY) -> SQL_ENTITY, pairing with existing ORM detectors. - EdgeKind.MIGRATES (reused): MIGRATION -> SQL_ENTITY. Unused in production code elsewhere; only referenced by ModelCoverageTest. LayerClassifier: SQL_ENTITY classified as `infra`. Deterministic output (sorted by id on emit); detector is stateless. ALTER TABLE ADD COLUMN enriches the owning entity via columns_added property; did not model columns as child nodes to keep graph size reasonable. DROP TABLE is skipped with a debug log. Tests: 16 new tests covering positive paths (raw SQL, Flyway, Alembic, Liquibase XML, Liquibase YAML, Rails, Prisma), negative paths (plain .py/.yaml, Alembic path without marker), determinism, and DDL variants (DROP, CREATE INDEX, ALTER TABLE). Test count 3278 -> 3294. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




Summary
Implements task #48: a SQL / migration detector under
io.github.randomcodespace.iq.detector.sqlthat extracts schema-level entities from raw SQL DDL and multi-format migration files.What the detector handles
*.sql):CREATE TABLE,CREATE VIEW,CREATE SCHEMA,ALTER TABLE ... ADD COLUMN,CREATE INDEX, inlineFOREIGN KEY ... REFERENCES,DROP TABLE(skipped with debug log).**/V\d+(?:_\d+)*__.+\.sql): parsed as raw SQL withformat=flywayand a version parsed from the filename.**/changelog.xml,**/db.changelog*.xml):<createTable>,<addColumn>,<addForeignKeyConstraint>.**/db.changelog*.yml/.yaml):createTable,addForeignKeyConstraint. Regex-based walk to avoid wiring SnakeYAML into the detector.**/versions/*.py+from alembicorop.create_tablemarker):op.create_table,op.add_column,op.create_index,op.create_foreign_key.**/db/migrate/<14-digit>_*.rb):create_table,add_column,add_foreign_key.**/migrations/*/migration.sql): delegates to the raw-SQL path withformat=prismaand directory-name version.A file is only treated as a migration if the path/filename matches one of those discriminators (Alembic additionally requires a content marker) — arbitrary
.py,.rb,.xml,.ymlfiles are ignored. The detector is@Component-scoped, stateless, and emits nodes/edges sorted by id for byte-equal determinism across runs.Enum changes
Added:
NodeKind.SQL_ENTITY— new. Schema-level table/view/schema. Distinct from the code-levelENTITY(JPA/ORM) kind; the two are deliberately not collapsed.EdgeKind.REFERENCES_TABLE— new. Any node (a JPAENTITY, SQLAlchemy/TypeORM model, a raw-query node, or anotherSQL_ENTITY) →SQL_ENTITY. Pairs with existing ORM detectors for the "which code references which table" join.Reused:
EdgeKind.MIGRATES— MIGRATION → SQL_ENTITY. The existingMIGRATESis unused in production code (only referenced byModelCoverageTest), soMIGRATES_SCHEMAwas not needed.NodeKind.MIGRATION— existing; used for the migration-script-level node.Test delta
SqlMigrationDetectorTestop.create_table+add_column+create_foreign_key, Liquibase XML changeSet, Liquibase YAML changeSet, Railscreate_table+add_column+add_foreign_key, Prismamigration.sql..pyinapp/utils.py, arbitrary.github/workflows/ci.yml, empty content,.pyunderversions/without the alembic marker.DetectorTestUtils.assertDeterministicplus a byte-equal id-list assertion across two runs.Design call-outs
ALTER TABLE ADD COLUMNenriches properties (columns_added=csv) rather than creating column child-nodes, to keep graph size reasonable for large schemas. Same forCREATE INDEX(indexes=csv).DROP TABLEis skipped with a debug log — the graph models current state, not deletions.StructuredParserif we need nested-key fidelity.SQL_ENTITYadded toINFRA_NODE_KINDS→ classified asinfra.databasesisDATABASE_CONNECTIONonly;SQL_ENTITYsurfaces through the node-kinds / edges-by-kind breakdown incomputeGraph). Follow-up when/api/serveconsumers need a dedicated schema breakdown.Test plan
mvn -B test— 3294 passing, 0 failing, 31 skippedmvn -B test-compile)detect()calls