Skip to content

Fix issue #28#32

Open
ismayc wants to merge 4 commits into
simonpcouch:mainfrom
ismayc:main
Open

Fix issue #28#32
ismayc wants to merge 4 commits into
simonpcouch:mainfrom
ismayc:main

Conversation

@ismayc

@ismayc ismayc commented May 4, 2026

Copy link
Copy Markdown
Contributor

It only took me a year! 😅

get_flights() previously returned time_hour with airport-local wall clocks implicitly tagged as UTC, while get_weather() returned true UTC instants tagged as GMT. The two columns therefore disagreed for any non-UTC origin and the canonical (origin, time_hour) join produced no rows. After this change both functions return time_hour in the airport's IANA tzone (looked up from get_airports()), matching nycflights13's convention: flights are forced into the local zone (wall clock preserved, instant corrected) and weather is converted into the local zone (instant preserved, wall clock + year/month/day/hour recomputed). New unit tests in test-8-time-hour-tz.R exercise both modes and lock in the cross-table join invariant.

ismayc and others added 2 commits May 4, 2026 16:51
…ch#28)

`get_flights()` previously returned `time_hour` with airport-local wall
clocks implicitly tagged as UTC, while `get_weather()` returned true UTC
instants tagged as GMT. The two columns therefore disagreed for any
non-UTC origin and the canonical `(origin, time_hour)` join produced no
rows. After this change both functions return `time_hour` in the
airport's IANA `tzone` (looked up from `get_airports()`), matching
`nycflights13`'s convention: flights are forced into the local zone
(wall clock preserved, instant corrected) and weather is converted into
the local zone (instant preserved, wall clock + year/month/day/hour
recomputed). New unit tests in `test-8-time-hour-tz.R` exercise both
modes and lock in the cross-table join invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tag `time_hour` with each origin airport's local time zone (simonpcouch#28)

@simonpcouch simonpcouch left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ughhh time zones! This is gnarly, thanks for giving this a go.

The bug is real, but I’m not sure time_hour should be made “origin-local” here. nyclights13 is able to do this because each of the origin airports share a time zone, but while a mixed-origin POSIXct column can store many instants it only has one timezone attribute for the whole column. So for get_flights(c("PDX", "DEN", "JFK"), ...), the column cannot actually be simultaneously Pacific, Mountain, and Eastern time.

That means this PR can preserve the right underlying instants for joins, but the printed clock times will be misleading for some origins. Say each airport has a 9am local departure:

PDX 9am PST = 17:00 UTC
DEN 9am MST = 16:00 UTC
JFK 9am EST = 14:00 UTC

If the combined column ends up carrying "America/Denver", those print as:

PDX 10:00
DEN 09:00
JFK 07:00

So the join works now, which is great, but the docs saying time_hour is "in the airport's local time zone" aren't really true when there's more than one timezone in play.

I think the cleaner contract is just: time_hour is the correct instant, in UTC. Flights would convert their local wall-clock times to proper UTC instants, and weather would keep its UTC observation instants as-is. The join key stays honest regardless of how many timezones are involved:

dplyr::inner_join(flights, weather, by = c("origin", "time_hour"))

(There's also a boundary bug with weather: the API is queried in UTC and months are filtered in UTC, but this PR then converts to local time afterward. For a western timezone like PDX, that means a January query can pick up rows that are locally December 31 (early UTC January hours) and miss the local evening of January 31 (those hours are UTC February and were never fetched). That'd need a padded UTC query range, then filtering on local-time components after conversion. This shouldn't matter if we switch back to UTC.)

Comment thread tests/testthat/test-8-time-hour-tz.R Outdated
hour = unname(utc_hour),
time_hour = ISOdatetime(2023, 1, 1, unname(utc_hour), 0, 0, tz = "GMT")
) %>%
anyflights:::adjust_time_hour_tz(airports, action = "convert")

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
anyflights:::adjust_time_hour_tz(airports, action = "convert")
adjust_time_hour_tz(airports, action = "convert")

testthat should make this function available in this namespace, I think

Comment thread tests/testthat/test-8-time-hour-tz.R Outdated

test_that("force mode preserves wall clock and corrects the instant per origin", {
# Simulates raw flights data: airport-local wall clock implicitly tagged UTC
flights <- tibble::tibble(

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing:

❯ checking for unstated dependencies in ‘tests’ ... WARNING
  '::' or ':::' import not declared from: ‘tibble’

Comment thread NEWS.md Outdated

* `time_hour` in both `get_flights()` and `get_weather()` is now tagged with
each origin airport's local IANA time zone (looked up from
`get_airports()$tzone`), matching the convention used by `nycflights13`.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`get_airports()$tzone`), matching the convention used by `nycflights13`.
`get_airports()$tzone`), matching the convention used by nycflights13.

Tidy style (which, admittedly, might not be uniformly applied across this repo): no formatting for package names

Comment thread R/utils.R Outdated
} else {
dplyr::mutate(
.x,
time_hour = lubridate::with_tz(time_hour, tzone = tz),

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Item from an AI agent that I find reasonable:

  1. R/utils.R:412 / R/utils.R:480 / R/utils.R:224: weather is still queried
    and filtered on UTC month boundaries before time_hour is converted to
    local time. For western time zones, get_weather("PDX", 2023, 1) can return
    local 2022-12-31 rows and miss the final local evening of 2023-01-31.
  ISOdatetime(2023, 1, 1, 0, 0, 0, tz = "GMT") |>
    lubridate::with_tz("America/Los_Angeles")
  # 2022-12-31 16:00:00 PST

I’d either query a padded UTC window and filter after recomputing local year/
month/day/hour, or query ASOS in the station-local timezone if that gives the
desired invariant.

Comment thread R/utils.R Outdated
dplyr::select(airports_data, faa, tzone),
by = c("origin" = "faa")
) %>%
dplyr::group_by(tzone) %>%

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another item that I find reasonable:

  1. R/utils.R:215: grouping by tzone before group_modify() reorders rows by
    timezone group. This means get_flights() no longer preserves the explicit
    arrange(year, month, day, dep_time) done at R/get_flights.R:125. It also
    means mixed-origin output does not actually have a per-row timezone; a
    POSIXct vector has one tzone attribute, so later-origin rows print in
    whatever timezone survives the bind.
 origin x time_hour
 DEN    3 2023-01-01 09:00:00
 PDX    1 2023-01-01 10:00:00
 JFK    2 2023-01-01 07:00:00

The instants may be right, but the documented “airport’s local time zone”
display is not true for multi-timezone outputs.

ismayc and others added 2 commits May 22, 2026 11:00
A POSIXct column carries one `tzone` attribute, so the previous "tag each
row with its origin's local tz" approach couldn't actually hold for
mixed-origin queries -- `bind_rows` collapses the per-group tzs and the
printed display silently lies for the losing rows. Reframe the contract:
`get_flights()$time_hour` is the correct UTC instant (BTS wall clocks
reinterpreted via each origin's IANA tz), `get_weather()$time_hour` already
publishes UTC instants, and the `(origin, time_hour)` join works honestly
across any combination of time zones.

Also removes the weather-side UTC-boundary bug (querying & filtering on
UTC months but then converting to local time pulled in adjacent local
months), since weather no longer gets a tz conversion. Tests cover DST
crossing within a single airport, round-trip back to local wall clocks,
multi-tz row ordering, and empty input. Drops `tibble::` usage in tests
(undeclared dependency WARNING) in favour of `dplyr::tibble`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch `time_hour` contract to UTC instants
@ismayc

ismayc commented May 22, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the review! That makes a lot more sense, and I'm glad you caught the bugginess of trying to get everything on local. I made some updates to hopefully clear up these issues to be more in alignment with what your review added. It does look like some lingering bugs still with GA though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants