An Early Warning System for Dangerous Heat Conditions (1990–2025)
When are people exposed to dangerous heat conditions in France?
The objective of this project is to build a predictive that serves as an early heat warning system. We focus on extreme daily temperatures, specifically the daily maximum temperature (TX), to identify patterns and predict events early enough to act.
Scientific Definition: In this study, a heatwave is defined as a period where the daily maximum temperature exceeds the 95th percentile (0.95 quantile) of the local historical climate for at least three consecutive days.
Explore our models and climate analysis in the live Streamlit application.
🔗 Live Demo: Extreme Heat Events App
- Introduction & Motivation: Context on global warming trends in France, including raw data analysis and visualizations of the increasing frequency of heatwaves.
- Data Explanation & First Model: View the initial approach using Météo-France daily data and Gradient Boosting.
- Improvements & Final Model: Explore the optimized XGBoost model trained on Copernicus data (1990–2025).
- Conclusion & Next Steps: Summary of key findings and our technical roadmap.
We analyzed four French départements representing diverse climate zones:
- Paris (75), Lyon (69), Bordeaux (33), and Marseille (13).
The project evolved through two main stages:
- First modeling phase (daily data): Tmax and wind data (from five meteorological stations per city). Used Météo-France daily climatological data (Données climatologiques de base – quotidiennes) via data.gouv.fr.
- Second modeling phase: Temperature at various altitudes, air pressure, shortwave radiation, soil moisture, and wind data (reanalysis). Climate data from the Copernicus Climate Data Store (ERA5).
- Predictive Modeling: Transitioned from Gradient Boosting to an optimized XGBoost architecture, incorporating atmospheric features like wind stagnation and persistence signals.
Note on Data Integrity: To ensure a robust predictive system and avoid data leakage, all temperature features (including 2m temperature and temperatures at various upper-atmospheric levels) are used exclusively in a lagged approach. This ensures the model only learns from historical data to predict future events.
The repository includes copernicus_api_script.py, which allows users to programmatically fetch updated climate data directly from the Copernicus Climate Data Store for further analysis or model retraining.
- Regionality Matters: Paris shows stronger urban amplification, while Marseille is shaped by coastal effects.
- Feature Engineering: Adding atmospheric stagnation and 48-hour rolling persistence features significantly made the model more robust.
- Value Proposition: The system is lightweight, runs on standard hardware, and is regionally customizable, providing a foundation for early warning systems relevant to public health services and urban planning.
To evolve this into a production-ready system:
- Sequence Modeling: Benchmark LSTM or Transformer models to evaluate longer temporal dependencies.
- Spatial Expansion: Extend coverage to Spain, Italy, and Germany.
- Climate-Adaptive Thresholds: Use rolling percentiles to account for long-term climate trends.
- Probabilistic Outputs: Generate calibrated probabilities for better stakeholder risk assessment.
-
Clone the repository:
git clone <repository-url> cd <repository-folder>
-
Install dependencies:
pip install -r requirements.txt
-
Run the Streamlit App:
streamlit run streamlit/streamlit_main.py
Note: This project was developed as part of a Data Science certification (Liora).