SQL Asset Graph is a command-line tool for repository-scale SQL analysis, optimized for HiveSQL and SparkSQL workflows. It focuses on extracting embedded SQL, exporting table usage, generating direct table lineage, detecting lineage cycles, and querying lineage graphs from stable CSV or JSON outputs.
It is deliberately specialized for warehouse-style SQL repositories rather than broad multi-dialect parsing. In HiveSQL and SparkSQL projects with embedded SQL, dynamic table templates, and batch-style lineage workflows, it is designed to provide a more predictable operational result than generic statement-oriented lineage tools.
Install from PyPI:
pip install sql-asset-graphFor local development, you can still install from the current repository:
pip install -e .After installation:
sql-asset-graph --help
python -m sql_asset_graph.main --helpGenerate table usage from SQL files:
sql-asset-graph table-usage -i ./sql_dir -o ./output/table_usage.csvGenerate direct table lineage:
sql-asset-graph lineage -i ./sql_dir -o ./output/table_lineage.csvAnalyze table-level lineage cycles:
sql-asset-graph lineage-cycles ./output/table_lineage.csvSQL Asset Graph provides an end-to-end workflow for repository-based SQL analysis.
- Extract SQL fragments from Python files.
- Replace placeholder variables in SQL files.
- Export table read and write usage.
- Generate direct table-level lineage rows.
- Analyze table-level lineage cycles.
- Query upstream, downstream, and cyclic relationships from lineage outputs.
Extract SQL strings from Python files or directories.
sql-asset-graph extract-sql /path/to/file.py
sql-asset-graph extract-sql /path/to/python_dir -o ./output
sql-asset-graph extract-sql /path/to/file.py --format json
sql-asset-graph extract-sql /path/to/file.py --format csvLegacy alias: extract
Replace placeholders in SQL files using values from a constants module.
sql-asset-graph fill-placeholder input.sql -c path/to/constants.py
sql-asset-graph fill-placeholder input.sql -c path/to/constants.py -s
cat input.sql | sql-asset-graph fill-placeholder - -c path/to/constants.pyLegacy alias: replace
Export table read and write usage from SQL files.
sql-asset-graph table-usage -i ./sample.sql
sql-asset-graph table-usage -i ./sql_dir -o ./output/table_usage.csv
sql-asset-graph table-usage -i ./sql_dir --format json -o ./output/table_usage.json
cat sample.sql | sql-asset-graph table-usage -i - --source-name sample.sql
cat sample.sql | sql-asset-graph table-usage -i - --format csv --source-name sample.sqlLegacy alias: scan
CSV header:
file_name,access_type,table_name
Generate direct table-level lineage from SQL files.
sql-asset-graph lineage -i ./sample.sql
sql-asset-graph lineage -i ./sql_dir -o ./output/table_lineage.csv
sql-asset-graph lineage -i ./sql_dir --format json -o ./output/table_lineage.json
cat sample.sql | sql-asset-graph lineage -i - --source-name sample.sql
cat sample.sql | sql-asset-graph lineage -i - --format csv --source-name sample.sqlCSV header:
file_name,statement_index,statement_type,target_table,source_table,unresolved_dynamic_tables
Analyze table-level lineage cycles from table_lineage.csv.
sql-asset-graph lineage-cycles output/table_lineage.csv
sql-asset-graph lineage-cycles output/table_lineage.csv --format json -o cycles.json
cat output/table_lineage.csv | sql-asset-graph lineage-cycles -Legacy alias: analyze
CSV header:
cycle_id,cycle_length,sequence_index,table_name
Query upstream, downstream, and cycle relationships from lineage outputs.
sql-asset-graph lineage-graph output/table_lineage.csv --upstream APP.TARGET_Y
sql-asset-graph lineage-graph output/table_lineage.csv --downstream APP.SOURCE_X
sql-asset-graph lineage-graph output/table_lineage.csv --cycles
cat output/table_lineage.csv | sql-asset-graph lineage-graph - --upstream APP.TARGET_YFor repository-style SQL projects, the common workflow is:
sql-asset-graph extract-sql ./python_jobs -o ./output
sql-asset-graph fill-placeholder ./output/jobs_extracted_sql_*.sql -c ./constants.py -o ./output/jobs_filled.sql
sql-asset-graph table-usage -i ./output/jobs_filled.sql -o ./output/table_usage.csv
sql-asset-graph lineage -i ./output/jobs_filled.sql -o ./output/table_lineage.csv
sql-asset-graph lineage-cycles ./output/table_lineage.csvIf you already have SQL files, you can skip extraction and placeholder replacement.
*_extracted_sql_*.sql: extracted SQL collected from Python sources*_extracted_sql_*.json: structured extracted SQL records*_extracted_sql_*.csv: tabular extracted SQL recordstable_usage_*.csv: table read/write usage rowstable_usage_*.json: structured table usage payloadtable_lineage_*.csv: direct table lineage rowstable_lineage_*.json: structured table lineage payloadtable_lineage_cycles_*.csv: detected table-level lineage cyclestable_lineage_cycles_*.json: structured lineage cycle payload
SQL Asset Graph currently works best for HiveSQL and SparkSQL-style batch SQL workflows, especially when SQL lives in repositories together with Python orchestration scripts.
- Focused on Hive-style DML and lineage paths such as
INSERT OVERWRITE,CREATE TABLE AS SELECT, andCREATE VIEW AS SELECT - Optimized for repository-scale SQL processing instead of one-off interactive parsing
- Suitable when SQL is extracted from Python first and then passed through placeholder replacement, table usage export, lineage, and lineage cycle analysis
- Table-level lineage only; no column-level lineage
- Optimized for HiveSQL and SparkSQL-oriented repository processing rather than broad multi-dialect SQL coverage
- Dynamic table templates are handled conservatively and may be reported separately instead of being forced into guessed lineage edges
- Stable CSV and JSON outputs are prioritized for downstream automation
- Python 3.9+
- Standard library only
See LICENSE.