A small, focused library for joining ad-performance data with Open-Meteo historical weather, then correlating / lagging / regressing the two.
The library was extracted from a real analysis of 3 ad campaigns across 5 US
cities. The methodology, the per-city and per-show findings, and the resulting
recommendations are in the results/ gallery and the
docs/METHODOLOGY.md write-up.
The question the original analysis was built to answer:
Does weather move ad performance — and if so, which weather variable, for which campaign, in which city, at what lag?
The naive answer is "look at the correlation" — but the right answer conditions on three things that change the conclusion:
- Lag. Ad spend effective on day t is often driven by weather on day t−1, t−2, or even t−7. A day-0 correlation can completely miss a real effect.
- Stratification. A national correlation can hide opposite-signed
city-level effects (one city's positive response cancels another's
negative). The right unit of analysis is
(show, city). - Controls. Day-of-week seasonality and city-specific baselines dominate ad performance. Without them, the weather signal is buried.
This library handles all three.
git clone https://github.com/Pouyasharp/poseidon-weather.git
cd poseidon-weather
pip install -r requirements.txt
# Run the synthetic example end-to-end
python3 examples/01_basic_pipeline.pyThe example builds a 3-show × 5-city × 365-day synthetic ad panel with
planted weather effects (so you can verify the library actually finds
them), joins it with synthetic weather, runs correlation / lag /
regression, and writes 5 outputs to examples/figures/:
| File | What it shows |
|---|---|
01_correlation_heatmap.png |
Pearson r of every weather var × purchases, starred at p<0.05 |
02_top_correlations.png |
Top 8 weather variables by |
03_lag_curve.png |
Pearson r vs lag (0..14 days) for one city |
04_regression_summary.txt |
OLS of purchases on weather + DOW + city FE |
05_scatter.png |
Scatter with linear fit for the strongest planted effect |
import pandas as pd
from poseidon import weather, join, correlate, lag, regression, viz
# 1. Fetch weather (cached to disk on second call)
from poseidon.weather import CityRequest, fetch_historical
from pathlib import Path
cache = Path("./data/weather")
city = CityRequest(city_slug="AURORA_N", latitude=47.0, longitude=-122.0)
records = fetch_historical(city, "2024-01-01", "2024-12-31", cache_dir=cache)
weather_df = weather.daily_records_to_dataframe(records)
# 2. Load your ad panel (date, city_slug, show, spend, purchases, impressions, link_clicks)
ads_df = pd.read_csv("my_ads.csv", parse_dates=["date"])
# 3. Join
panel = join.panel_with_weather(ads_df, weather_df)
# 4. Correlate
corr = correlate.weather_perf_correlations(panel, groupby=["show", "city_slug"])
print(corr.sort_values("r", key=abs, ascending=False).head(10))
# 5. Lag analysis
lag_df = lag.lag_correlations(
panel,
weather_vars=["temperature_2m_mean", "sunshine_duration", "precipitation_sum"],
perf_vars=["purchases"],
lags=[0, 1, 2, 3, 5, 7, 10, 14],
groupby=["show", "city_slug"],
)
best = lag.best_lag_per_pair(lag_df, by=["show", "city_slug"])
# 6. Regression
fit = regression.ols_weather(
panel,
target="purchases",
predictors=["temperature_2m_mean", "sunshine_duration", "precipitation_sum",
"severe_flag", "daylight_hours"],
add_dow=True, add_city_fe=True,
)
print(fit.summary())
# 7. Viz
fig = viz.lag_curve(
lag_df[lag_df["city_slug"] == "AURORA_N"],
"temperature_2m_mean", "purchases",
save_path="my_lag.png",
)See examples/01_basic_pipeline.py for the full end-to-end pattern.
poseidon/
├── weather.py # Open-Meteo fetcher (fetch_historical, fetch_many)
│ # + helpers: daily_records_to_dataframe, add_severe_flag,
│ # add_daylight_hours
├── join.py # panel_with_weather, daily_city_aggregate, compute_roas/ctr
├── correlate.py # pearson, weather_perf_correlations, overall_correlation_heatmap
├── lag.py # lag_correlations, best_lag_per_pair
├── regression.py # ols_weather, stepwise_weather, OLSResult dataclass
└── viz.py # lag_curve, top_correlations_bar, time_series_overlay, scatter_with_fit
Each module is self-contained (no hidden state, no globals, no init
files beyond __init__.py). All public functions take DataFrames and
return DataFrames / figures; the only side effect is the optional
save_path= argument on viz functions.
results/ contains the six figures from the original analysis of 3 ad
campaigns across 5 US cities (Jun 2024 – May 2026). The data is real
ad-account data and is not in the repo; the figures are the
deliverable.
| Figure | What it shows |
|---|---|
results/FINAL_01_lag_correlation.png |
Per-(show, city) lag correlations: where in the calendar the weather→purchase signal lives |
results/FINAL_02_mesmerica_retarget_temp.png |
Mesmerica retargeting cohort, ROAS vs temperature — the strongest single signal in the study |
results/FINAL_03_mesmerica_roas_vs_temp.png |
Mesmerica ROAS as a function of mean temperature, with confidence band |
results/FINAL_04_per_city_correlations.png |
Correlation heatmap per (city, show) — the right unit of analysis |
results/FINAL_05_regression_progression.png |
Stepwise OLS progression as more controls are added |
results/FINAL_06_recommendations.png |
The one-slide summary handed to the campaign owners |
Honest summary of the analysis (see docs/METHODOLOGY.md for the
full write-up):
- Mesmerica has a strong, positive temperature→ROAS effect that survives day-of-week and city fixed effects. The retargeting cohort is the main driver.
- DSOTM is a "data gap" — spend is too thin to support inference.
- Severe weather (WMO code ≥ 95, thunderstorms/hail) hurts all shows but the effect is small relative to the temperature signal.
- Lag analysis was a trap: 14-day "effects" were over-claimed in early versions; the right lag window is 0–3 days for the temperature signal.
poseidon-weather/
├── README.md # this file
├── LICENSE # MIT
├── requirements.txt
├── .gitignore
├── poseidon/ # the importable library
│ ├── __init__.py
│ ├── weather.py
│ ├── join.py
│ ├── correlate.py
│ ├── lag.py
│ ├── regression.py
│ └── viz.py
├── examples/
│ ├── 01_basic_pipeline.py
│ └── figures/ # outputs from the example (re-runnable)
├── results/ # the 6 final figures from the real analysis
└── docs/
└── METHODOLOGY.md # analytical rationale + the negative results
Two reasons:
- The real ad-account data is sensitive. The original campaigns were run for paying clients; the ad data does not belong in a public repo.
- The planted effects let you verify the library. Run the example on synthetic data; the planted effects (positive temperature→ROAS for Mesmerica, negative sunshine at 3-day lag, negative severe-weather effect) should be recovered. If they aren't, the library is broken.
MIT. See LICENSE. Real ad data is not in this repo and never was;
the figures in results/ are the only artifact of the original analysis
that ships here.