US Utility Intelligence on Databricks

Natural-language exploration of US electric utility data on Databricks. ~7,400 utilities, ~30,000 power plants, generation, sales, reliability, demand response, distributed generation, FERC Form 1 financials, and county-level service territories -- all queryable in plain English through an AI/BI Genie Space.

The repo is a single Databricks Asset Bundle (DAB) that handles the full ETL: downloads public data, lands it in a Unity Catalog volume, builds raw and curated star-schema layers via Spark Declarative Pipelines (SDP), and creates the Genie Space over the curated tables.

What you get

Raw schema <catalog>.raw_utility_intelligence -- 23 tables landed from EIA Forms 860/861/923, EPA eGRID 2022, PUDL FERC Form 1 + EIA, and Census TIGER/Line.

Curated schema <catalog>.curated_utility_intelligence -- a clean star schema with 4 dimension tables and 14 fact tables, joined and typed, column-level comments included.

Dimensions
`dim_utility`	~7,400 US electric utilities (EIA + PUDL crosswalk)
`dim_plant`	~30,000 power plants (EIA 860 + eGRID emissions)
`dim_generator`	Generators with nameplate capacity, fuel, prime mover
`dim_county`	County boundary polygons (Census TIGER/Line 2024)

Facts
`fact_generation`	Monthly net generation by plant, prime mover, fuel
`fact_fuel_receipts`	Fuel deliveries with heat content, sulfur, cost
`fact_plant_ownership`	Generator ownership shares by utility
`fact_sales`	Annual sales (MWh, $, customers) by sector
`fact_reliability`	SAIDI / SAIFI / CAIDI by utility
`fact_demand_response`	DR program enrollment, savings, costs
`fact_net_metering`	Net-metering capacity, installations, exports
`fact_ami_deployment`	Smart meter deployment by sector
`fact_distributed_generation`	DG capacity and customers by tech
`fact_energy_efficiency`	EE program savings, peak reduction, costs
`fact_ferc1_plants`	FERC Form 1 plant capacity, generation, capex
`fact_ferc1_expenses`	FERC Form 1 operating expenses
`fact_ferc1_revenues`	FERC Form 1 revenues by class
`fact_service_territory`	Counties served per utility-year

A Genie Space named US Utility Intelligence over the 18 curated tables, with sample questions and modeling instructions baked in.

Prerequisites

A Databricks workspace with Unity Catalog and serverless compute enabled.
The Databricks CLI v0.228+ installed and configured for your workspace (e.g. databricks configure).
A UC catalog you can write to (defaults to main).
A serverless SQL warehouse in the workspace (any will do -- the Genie setup picks one automatically).

Deploy

From the repo root:

databricks bundle deploy

# 1. Land + curate the data (downloads ~1 GB, runs two SDP pipelines)
databricks bundle run ingest
databricks bundle run curate

# 2. Create the Genie Space
databricks bundle run setup

When setup finishes, the job output prints the Genie Space URL. Open it and try:

Which 10 utilities have the highest total electricity sales in MWh?

Show me the top 20 power plants by CO2 emissions

What is the total installed solar capacity by state?

Overriding catalog or warehouse

export BUNDLE_VAR_catalog=YOUR_CATALOG
export BUNDLE_VAR_default_warehouse_id=YOUR_WAREHOUSE_ID

Or pass --var=catalog=... / --var=default_warehouse_id=... on each databricks bundle invocation.

What the bundle does

Ingest (ingest job)
- Downloads source files in parallel from EIA, EPA, PUDL, and Census (src/01_download.py).
- Converts Excel/shapefile sources to Parquet and writes to a UC volume.
- Runs the raw_tables SDP pipeline to build 23 materialized views in raw_utility_intelligence (src/raw_*.sql).
- Applies table and column comments plus UC tags (src/03_raw_table_comments.py).
Curate (curate job)
- Runs the curated_tables SDP pipeline to build a clean star schema in curated_utility_intelligence (src/dim_*.sql, src/fact_*.sql).
- Applies curated table and column comments plus UC tags (src/03_curated_table_comments.py).
Setup (setup job)
- Creates or updates the Genie Space US Utility Intelligence over the 18 curated tables, with sample questions and modeling instructions (src/01_create_genie_space.py).

Data sources

Source	What	Year
EIA Form 860	Plant & generator characteristics	2023
EIA Form 861	Utility sales, revenue, reliability	2023
EIA Form 923	Monthly generation & fuel	2023
EPA eGRID	Plant/generator emissions	2022
PUDL	FERC financials, service territory, crosswalks	Stable
Census TIGER/Line	County boundary polygons	2024

All public, no credentials required.

Notes

Serverless only. Both SDP pipelines and the Genie Space's warehouse use Databricks serverless compute.
The first ingest run downloads ~1 GB and runs the SDP pipeline -- expect 10-20 minutes on a fresh deployment. Subsequent runs are incremental.
Genie Spaces aren't yet a native Databricks Asset Bundle resource (see databricks/cli#4191), so creation happens via the REST API from a notebook task in the bundle.

Contributing

This repo is published from an upstream source, so pull requests here will be overwritten on the next release. Please file an issue for bugs, questions, or improvement ideas -- fixes will be made upstream and reflected here on the next push.

License

MIT -- see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
shared/dab_includes		shared/dab_includes
src		src
LICENSE		LICENSE
README.md		README.md
databricks.yml		databricks.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

US Utility Intelligence on Databricks

What you get

Prerequisites

Deploy

Overriding catalog or warehouse

What the bundle does

Data sources

Notes

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

US Utility Intelligence on Databricks

What you get

Prerequisites

Deploy

Overriding catalog or warehouse

What the bundle does

Data sources

Notes

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages