Skip to content

Cache GA4 responses at monthly granularity to eliminate load-time latency #289

Description

@DominicBM

Problem

GA4 API calls are the primary performance bottleneck in the dashboard. Calls to run_property_report take 20–30 seconds to complete, which means every event sub-page (catalog views, exhibitions, click-throughs, etc.) loads with a visible "Loading…" state before data appears. Skeleton loading mitigates the perceived delay but does not reduce the actual latency.

The existing WarmComparisonCacheJob attempts to pre-warm caches but is ineffective in production because the dashboard uses Rails' MemoryStore, which is per-process and not shared across ECS tasks. A request hitting a different task than the one that ran the warm-up job sees a cold cache every time.

Proposed solution

Cache GA4 responses at monthly granularity using a shared, persistent cache store (Redis/ElastiCache). Because the dashboard displays all data at per-month resolution, a completed calendar month's data is immutable — it will never change — and can be cached indefinitely. Only the current (partial) month changes day-to-day, and it is excluded from the cached range entirely.

This eliminates GA4 latency for all historical queries. A user viewing Jan 2024 – Mar 2025 would see all data instantly from cache; no GA4 API call is made at all.


Cache architecture

Cache store: Redis (ElastiCache)

Replace MemoryStore with RedisCacheStore. ElastiCache provides a Redis instance shared across all ECS tasks, so pre-warmed data is available to every request regardless of which task handles it. A single cache.t3.micro instance is sufficient for this workload.

Cache key structure

ga4:[hub_slug]:[metric_type]:[YYYY-MM]

Example: ga4:digital-commonwealth:catalog_views:2024-06

Each month's data for each hub/metric combination is stored as a separate cache entry. Completed months get an indefinite TTL (or very long — e.g. 5 years). The current calendar month is never cached.

Pre-warming job

Update WarmComparisonCacheJob to run on the 1st of each month and populate Redis with data for the just-completed month across all hubs and all metric types. With Redis, this job's output is immediately available to all tasks. First-time access for historical months (before the job has run for them) hits GA4 and populates the cache on demand; subsequent requests are served from cache.

Query changes

When handling a date-range request, decompose it into individual calendar months. For each month:

  • If the month is complete and the cache key exists → return cached data
  • If the month is complete and the cache key is missing → query GA4, cache the result, return it
  • If the month is the current (partial) month → not available (see UI changes below)

Aggregate the monthly results before returning to the caller. This is consistent with the current behavior since responses are already aggregated across the requested range.


UI changes

1. Remove current month from available date range

The end-date picker should cap at the last day of the previous calendar month. The current month is excluded because its data is incomplete and not cached.

  • End date picker: max selectable value = last day of previous month
  • If the user's current end date (from URL params or session) is in the current month, silently clamp it to the last completed month

2. Update default date range

The current default (e.g. start: Jan 2024, end: current month) needs to be updated to end at the last completed month. Suggested default: a rolling 24-month window ending at the last completed month.

Example (if today is April 23, 2026): default range = May 2024 – March 2026.

3. Add data availability label

Add a small informational note near the date pickers:

"Analytics data is available through [Month YYYY]. Data is updated on the 1st of each month."

This sets expectations and explains why the current month is absent without requiring partners to figure it out themselves.

4. "All-time" view

The Totals card (which is already independent of the date filter) is unaffected. The "View all-time data" link on event pages should resolve to the full available range (earliest configured date through last completed month), not through the current month.


Migration / rollout

  1. Provision ElastiCache Redis instance (single-node, cache.t3.micro)
  2. Update Rails config/environments/production.rb to use RedisCacheStore
  3. Update cache key generation and monthly decomposition in GaResponseBuilder / GaCacheable
  4. Update WarmComparisonCacheJob to use the new key structure and run via cron on the 1st of each month
  5. Ship UI changes (date picker cap, default range, informational label)
  6. On first deploy, the cache is cold — GA4 calls proceed normally and populate Redis on demand. The warm-up job populates recent history on its next scheduled run, or can be triggered manually once after deploy.

Trade-offs

Current Proposed
Event page load time 20–30s ~0s (cached) / 20–30s (first access only)
Current month data Available (slow) Not available
Cache shared across tasks No (MemoryStore) Yes (Redis)
Infrastructure cost None ~$15–25/month (ElastiCache t3.micro)
Data freshness Real-time Monthly (completed months only)

The loss of current-month data is the only meaningful trade-off. Given that the dashboard is primarily used for trend analysis and reporting (not real-time monitoring), this is acceptable. Partners can always view raw GA4 data for the current month if needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions