Skip to content

Pouyasharp/poseidon-weather

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

poseidon-weather

A small, focused library for joining ad-performance data with Open-Meteo historical weather, then correlating / lagging / regressing the two.

The library was extracted from a real analysis of 3 ad campaigns across 5 US cities. The methodology, the per-city and per-show findings, and the resulting recommendations are in the results/ gallery and the docs/METHODOLOGY.md write-up.


Why this exists

The question the original analysis was built to answer:

Does weather move ad performance — and if so, which weather variable, for which campaign, in which city, at what lag?

The naive answer is "look at the correlation" — but the right answer conditions on three things that change the conclusion:

  1. Lag. Ad spend effective on day t is often driven by weather on day t−1, t−2, or even t−7. A day-0 correlation can completely miss a real effect.
  2. Stratification. A national correlation can hide opposite-signed city-level effects (one city's positive response cancels another's negative). The right unit of analysis is (show, city).
  3. Controls. Day-of-week seasonality and city-specific baselines dominate ad performance. Without them, the weather signal is buried.

This library handles all three.


Quick start

git clone https://github.com/Pouyasharp/poseidon-weather.git
cd poseidon-weather
pip install -r requirements.txt

# Run the synthetic example end-to-end
python3 examples/01_basic_pipeline.py

The example builds a 3-show × 5-city × 365-day synthetic ad panel with planted weather effects (so you can verify the library actually finds them), joins it with synthetic weather, runs correlation / lag / regression, and writes 5 outputs to examples/figures/:

File What it shows
01_correlation_heatmap.png Pearson r of every weather var × purchases, starred at p<0.05
02_top_correlations.png Top 8 weather variables by
03_lag_curve.png Pearson r vs lag (0..14 days) for one city
04_regression_summary.txt OLS of purchases on weather + DOW + city FE
05_scatter.png Scatter with linear fit for the strongest planted effect

Use the library on your own data

import pandas as pd
from poseidon import weather, join, correlate, lag, regression, viz

# 1. Fetch weather (cached to disk on second call)
from poseidon.weather import CityRequest, fetch_historical
from pathlib import Path
cache = Path("./data/weather")

city = CityRequest(city_slug="AURORA_N", latitude=47.0, longitude=-122.0)
records = fetch_historical(city, "2024-01-01", "2024-12-31", cache_dir=cache)
weather_df = weather.daily_records_to_dataframe(records)

# 2. Load your ad panel (date, city_slug, show, spend, purchases, impressions, link_clicks)
ads_df = pd.read_csv("my_ads.csv", parse_dates=["date"])

# 3. Join
panel = join.panel_with_weather(ads_df, weather_df)

# 4. Correlate
corr = correlate.weather_perf_correlations(panel, groupby=["show", "city_slug"])
print(corr.sort_values("r", key=abs, ascending=False).head(10))

# 5. Lag analysis
lag_df = lag.lag_correlations(
    panel,
    weather_vars=["temperature_2m_mean", "sunshine_duration", "precipitation_sum"],
    perf_vars=["purchases"],
    lags=[0, 1, 2, 3, 5, 7, 10, 14],
    groupby=["show", "city_slug"],
)
best = lag.best_lag_per_pair(lag_df, by=["show", "city_slug"])

# 6. Regression
fit = regression.ols_weather(
    panel,
    target="purchases",
    predictors=["temperature_2m_mean", "sunshine_duration", "precipitation_sum",
                "severe_flag", "daylight_hours"],
    add_dow=True, add_city_fe=True,
)
print(fit.summary())

# 7. Viz
fig = viz.lag_curve(
    lag_df[lag_df["city_slug"] == "AURORA_N"],
    "temperature_2m_mean", "purchases",
    save_path="my_lag.png",
)

See examples/01_basic_pipeline.py for the full end-to-end pattern.


The public API

poseidon/
├── weather.py      # Open-Meteo fetcher (fetch_historical, fetch_many)
│                   #  + helpers: daily_records_to_dataframe, add_severe_flag,
│                   #  add_daylight_hours
├── join.py         # panel_with_weather, daily_city_aggregate, compute_roas/ctr
├── correlate.py    # pearson, weather_perf_correlations, overall_correlation_heatmap
├── lag.py          # lag_correlations, best_lag_per_pair
├── regression.py   # ols_weather, stepwise_weather, OLSResult dataclass
└── viz.py          # lag_curve, top_correlations_bar, time_series_overlay, scatter_with_fit

Each module is self-contained (no hidden state, no globals, no init files beyond __init__.py). All public functions take DataFrames and return DataFrames / figures; the only side effect is the optional save_path= argument on viz functions.


The real analysis (results gallery)

results/ contains the six figures from the original analysis of 3 ad campaigns across 5 US cities (Jun 2024 – May 2026). The data is real ad-account data and is not in the repo; the figures are the deliverable.

Figure What it shows
results/FINAL_01_lag_correlation.png Per-(show, city) lag correlations: where in the calendar the weather→purchase signal lives
results/FINAL_02_mesmerica_retarget_temp.png Mesmerica retargeting cohort, ROAS vs temperature — the strongest single signal in the study
results/FINAL_03_mesmerica_roas_vs_temp.png Mesmerica ROAS as a function of mean temperature, with confidence band
results/FINAL_04_per_city_correlations.png Correlation heatmap per (city, show) — the right unit of analysis
results/FINAL_05_regression_progression.png Stepwise OLS progression as more controls are added
results/FINAL_06_recommendations.png The one-slide summary handed to the campaign owners

Honest summary of the analysis (see docs/METHODOLOGY.md for the full write-up):

  • Mesmerica has a strong, positive temperature→ROAS effect that survives day-of-week and city fixed effects. The retargeting cohort is the main driver.
  • DSOTM is a "data gap" — spend is too thin to support inference.
  • Severe weather (WMO code ≥ 95, thunderstorms/hail) hurts all shows but the effect is small relative to the temperature signal.
  • Lag analysis was a trap: 14-day "effects" were over-claimed in early versions; the right lag window is 0–3 days for the temperature signal.

Repository layout

poseidon-weather/
├── README.md                  # this file
├── LICENSE                    # MIT
├── requirements.txt
├── .gitignore
├── poseidon/                  # the importable library
│   ├── __init__.py
│   ├── weather.py
│   ├── join.py
│   ├── correlate.py
│   ├── lag.py
│   ├── regression.py
│   └── viz.py
├── examples/
│   ├── 01_basic_pipeline.py
│   └── figures/               # outputs from the example (re-runnable)
├── results/                   # the 6 final figures from the real analysis
└── docs/
    └── METHODOLOGY.md         # analytical rationale + the negative results

Why the example uses synthetic data

Two reasons:

  1. The real ad-account data is sensitive. The original campaigns were run for paying clients; the ad data does not belong in a public repo.
  2. The planted effects let you verify the library. Run the example on synthetic data; the planted effects (positive temperature→ROAS for Mesmerica, negative sunshine at 3-day lag, negative severe-weather effect) should be recovered. If they aren't, the library is broken.

License

MIT. See LICENSE. Real ad data is not in this repo and never was; the figures in results/ are the only artifact of the original analysis that ships here.

About

Library + analysis for joining ad-performance data with Open-Meteo historical weather, then correlating/lagging/regressing the two. Extracted from a 3-show × 5-city real campaign study.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages