Skip to content

Feature/meteo manager#9

Merged
sacha-lma merged 35 commits into
mainfrom
feature/meteo_manager
May 26, 2026
Merged

Feature/meteo manager#9
sacha-lma merged 35 commits into
mainfrom
feature/meteo_manager

Conversation

@sacha-lma

Copy link
Copy Markdown
Collaborator

No description provided.

sacha-lma added 30 commits May 24, 2026 01:26
…ss, distributions, outliers, validity, and effectiveness
sacha-lma and others added 5 commits May 26, 2026 01:27
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Agent-Logs-Url: https://github.com/OpenCz/Tardis/sessions/41f2cac2-b68e-4263-b9f8-3c2c3a29ea60

Co-authored-by: sacha-lma <233736501+sacha-lma@users.noreply.github.com>
# Conflicts:
#	scripts/cleaning/trains/pipeline.py
#	tardis_eda.ipynb
Copilot AI review requested due to automatic review settings May 26, 2026 05:22
@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
5.6% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

@sacha-lma sacha-lma merged commit baccacd into main May 26, 2026
4 of 6 checks passed

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures the cleaning code into dataset-specific trains and meteo packages and adds a new streaming météo cleaning, auditing, merging, feature engineering, and visualization pipeline alongside the existing train-delay pipeline.

Changes:

  • Moves train cleaning/audit/merge/visualization modules under scripts.cleaning.trains with relative imports.
  • Adds scripts.cleaning.meteo with streaming per-département loading, cleaning, merging, feature engineering, auditing, and plots.
  • Updates package initializers to expose the new nested package layout and train-compatible aliases.

Reviewed changes

Copilot reviewed 28 out of 58 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
scripts/__init__.py Reworks package-level exports around the new cleaning package layout.
scripts/cleaning/__init__.py Adds train/meteo package entry points and train-default aliases.
scripts/merging/__init__.py Removes the old top-level merging export.
scripts/cleaning/trains/__init__.py Adds train package exports.
scripts/cleaning/trains/pipeline.py Switches train pipeline imports to relative modules.
scripts/cleaning/trains/loading.py Adds train CSV loader under the new package.
scripts/cleaning/trains/features.py Adds train feature engineering helpers under the new package.
scripts/cleaning/trains/cleaning/__init__.py Exposes train cleaning helpers.
scripts/cleaning/trains/cleaning/preprocessing.py Adds train preprocessing/drop helpers.
scripts/cleaning/trains/cleaning/type_conversion.py Adds train date/numeric/string conversion helpers.
scripts/cleaning/trains/cleaning/normalization.py Adds train label normalization.
scripts/cleaning/trains/cleaning/station_clustering.py Adds train station fuzzy clustering.
scripts/cleaning/trains/cleaning/nan_recovery.py Adds train delay recovery logic.
scripts/cleaning/trains/cleaning/corrections.py Adds train consistency corrections and rate recomputation.
scripts/cleaning/trains/audit/__init__.py Exposes train audit helpers.
scripts/cleaning/trains/audit/tracker.py Adds train audit report tracking.
scripts/cleaning/trains/audit/quality.py Updates train audit import path.
scripts/cleaning/trains/merging/__init__.py Exposes train merge helper.
scripts/cleaning/trains/merging/merging_trains.py Adds train/station merge implementation.
scripts/cleaning/trains/visualization/__init__.py Updates train visualization imports to relative paths.
scripts/cleaning/trains/visualization/cleaning_plots.py Adds train cleaning diagnostic plots.
scripts/cleaning/trains/visualization/eda_plots.py Adds train EDA plots.
scripts/cleaning/meteo/__init__.py Adds météo package exports.
scripts/cleaning/meteo/pipeline.py Adds streaming météo cleaning pipeline.
scripts/cleaning/meteo/loading.py Adds météo index/path grouping and optimized CSV loaders.
scripts/cleaning/meteo/features.py Adds météo time, season, wind, precipitation, and temperature features.
scripts/cleaning/meteo/cleaning/__init__.py Exposes météo cleaning helpers.
scripts/cleaning/meteo/cleaning/preprocessing.py Adds météo quality, sparse-column, deduplication, and critical-NaN handling.
scripts/cleaning/meteo/cleaning/type_conversion.py Adds météo date/numeric/category conversion.
scripts/cleaning/meteo/cleaning/normalization.py Adds météo station-name normalization.
scripts/cleaning/meteo/cleaning/nan_recovery.py Adds météo interpolation recovery.
scripts/cleaning/meteo/cleaning/corrections.py Adds météo physical consistency corrections.
scripts/cleaning/meteo/audit/__init__.py Updates météo audit exports to relative imports.
scripts/cleaning/meteo/audit/tracker.py Adds météo audit report tracking.
scripts/cleaning/meteo/audit/quality.py Adds météo quality checks.
scripts/cleaning/meteo/merging/__init__.py Exposes météo merge helper.
scripts/cleaning/meteo/merging/merging_meteo.py Adds météo vent/parameter outer merge.
scripts/cleaning/meteo/visualization/__init__.py Exposes météo visualization helpers.
scripts/cleaning/meteo/visualization/cleaning_plots.py Adds météo cleaning diagnostic plots.
scripts/cleaning/meteo/visualization/eda_plots.py Adds météo EDA plots.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +42 to +43
df[col] = df.groupby("NUM_POSTE")[col].transform(
lambda x: x.interpolate(method="linear", limit=limit, limit_direction="both")
Comment on lines +322 to +326
monthly = pd.DataFrame({
label: df[col].fillna(0).astype(bool).groupby(df["month"]).sum()
for col, label in present.items()
})
monthly.index = _MONTH_NAMES
Comment thread scripts/__init__.py
Comment on lines 12 to +16
__all__ = [
"Pipeline",
"load_data",
"add_time_features",
"add_season",
"add_delay_category",
"add_cancellation_rate",
"add_punctuality_rate",
"cleaning",
"trains",
"meteo",
"audit",
Comment on lines 11 to +15
__all__ = [
"CRITICAL_COLS",
"CRITICAL_COMP_COLS",
"drop_comment_columns",
"deduplicate",
"drop_critical_nan",
"drop_critical_comp_nan",
"parse_dates",
"convert_numerics",
"cast_string_columns",
"normalize_labels",
"StationClusterer",
"recover_departure_delay",
"recover_arrival_delay",
"fix_negative_counts",
"fix_count_overflow",
"fix_delay_hierarchy",
"recompute_rates",
"trains",
"meteo",
"cleaning",
"audit",

# ─ Param file (autres-paramètres): pressure, humidity, radiation, snow …
_PARAM_KEEP: set[str] = {
"NUM_POSTE", "AAAAMMJJ",
Comment on lines +81 to +85
# Append to output CSV (header only on first write)
df.to_csv(
self.output_path,
mode="w" if first_write else "a",
header=first_write,
Comment on lines +184 to +195
# ── 5. Feature engineering ────────────────────────────────────
df = features.add_time_features(df)
df = features.add_season(df)
df = features.add_temperature_amplitude(df)
df = features.add_wind_category(df)
df = features.add_precipitation_category(df)

# ── 6. Consistency fixes ──────────────────────────────────────
df, _ = cleaning.fix_negative_values(df)
df, _ = cleaning.fix_humidity_bounds(df)
df, _ = cleaning.fix_temperature_consistency(df)

Comment on lines +13 to +17
from .eda_plots import (
plot_correlation_matrix,
plot_monthly_precipitation_trend,
plot_monthly_temperature_trend,
plot_precipitation_distribution,
Comment thread scripts/__init__.py
Comment on lines +5 to +9
audit = cleaning.audit
merging = cleaning.merging
visualization = cleaning.visualization
features = cleaning.features
loading = cleaning.loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants