Jm/add model#19
Merged
Merged
Conversation
…plemented first model architecture
Captures the state of the jm/add_model branch on 2026-05-08 prior to a structured refactor driven by numbered notebooks (01–07). Preserves the mid-flight warehouse_ops reorganization (src/susse/io → warehouse_ops/io) and the population ingestion subsystem (jobs, loaders, validators) so they remain available as reference material while the notebook track rebuilds the model, training, validation, and inference layers cleanly. Also: add .idea/ and .venv/ to .gitignore; remove stale Kampala MERRA sample under notebooks/U10M. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace module-level PROJECT_ID/DATASET constants and the implicit TableRefs() defaults with a structured set of frozen dataclasses: * WarehouseConfig — GCP project + dataset + region. * TableSchema + TableSchemas — registry of every warehouse table with its table_id, MERGE-key contract, and description. Single source of truth for both loaders and coverage queries. * TableRefs — FQTN properties derived from a WarehouseConfig. * MatchStrategy StrEnum and validated WarehouseOptions. BigQueryClient now takes a WarehouseConfig directly (no implicit module imports) and gains an existing_keys() method used by ingest jobs to implement idempotent re-runs. Adds dependencies: google-cloud-bigquery, db-dtypes, pyarrow, pygeohash. Server-side geohash5 in the existing warehouse matches pygeohash output (verified against kampala station). Also adds idempotent DDL for cams_daily_vars_long (the long-format CAMS table introduced for symmetry with nasa_daily_vars_long) and dim_variable so the schema definitions live in version control. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 6 CrossBoundary Energy daily-GHI CSVs under data/ground_measurements/CBE_Data/ are covered by an NDA and must not be redistributed publicly. They remain ingested in the BigQuery warehouse and the trained model bundle the portal serves; originals live on Google Drive for internal use. Changes - Delete the 6 CBE CSVs (egypt/ghana/kenya/madagascar/nigeria/somalia). - Add /data/ground_measurements/CBE_Data/ to .gitignore so a fresh copy cannot be accidentally re-committed. - Update data/ground_measurements/README.md: drop the CBE row, add an NDA note, tweak the Schema-row wording. - warehouse/extending_the_warehouse.ipynb: swap the Pattern-3 demo from CBE somalia.csv to ministry_energy_ug/soroti.csv; switch the Pattern 1/2 example from kenya_location3 / kenya_locations to Uganda-only (Makerere + Min. of Energy); clear all cell outputs. - notebooks/tutorial/0[1-7]*.ipynb: clear cell outputs (they contained CBE station coords / GHI values baked in from prior runs). NB 04 swaps a single LOSO-example station name (kenya_location13 -> tororo). - notebooks/papers/mukiibi_mikelson_2026/01_recomputation.ipynb + _build_notebook.py: clear outputs; anonymise two ghana_location3 mentions to "stations in the Gulf of Guinea". CrossBoundary partner-name acknowledgements (paper context) kept. Followups not handled here - Cell outputs are now empty; re-execute with whatever CBE-source filter you settle on so the public repo has rich outputs again. - Older commits on this branch and on main still contain CBE data; a history rewrite (git filter-repo + force-push) is a separate decision. - Local-only branches (cn/*, jm/new_model, rm/data_cleaning) still carry CBE files at tip; prune or rewrite before any push. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each DerivedFeature now declares a DerivedColumnMetadata (column, label, unit, description) per output column, exposed via the new output_metadata abstract property. Derived features have no entry in the warehouse VariableCatalog, so this gives the portal's upcoming variable inspector a single source of truth for their labels and units rather than a hardcoded parallel table that can drift. Commit 2 (Predictor feature catalog) must reconcile DerivedColumnMetadata.label with VariableSpec.display_name and the unit vocabularies of the two value objects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Predictor.feature_catalog exposes a FeatureCatalog: one FeatureMetadata (column, label, unit, description, presentation group) per model-input column predict() returns. It composes warehouse VariableCatalog metadata for NASA POWER / CAMS variables with derived features' output_metadata, reconciling display_name onto the shared `label` field — so a variable-inspector UI can show what every input is and where it comes from. Built purely from the bundle and cached. The new aux_column_prefix / aux_feature_column helpers in warehouse_ops.population.types are now the single source of truth for the <prefix>_<variable_id> aux-column naming; FeatureService and FeatureSelection.aux_columns route through them instead of each hardcoding the per-source prefixes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rradiation into rm/repo_navigation
Rm/repo navigation
rogerzmukiibi
approved these changes
Jun 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.