Skip to content

Filter ineligible queries from in-request-path shallow auto-create #1641

@altmannmarcelo

Description

@altmannmarcelo

Motivation

Readyset in auto-cache mode should just work for the overwhelming majority of users — point it at a workload, and it should silently start caching the queries that benefit from caching, without the user having to curate a list, tune knobs, or clean up after it.

Today, in-request-path shallow auto-creation (REA-6485) falls short of that bar: it caches every SELECT it sees, including the driver/ORM bootstrap traffic and introspection chatter that every connection issues but no one wants cached. The user's first impression is a cache full of SELECT @@version, pg_catalog lookups, and SELECT NOW() — entries that take up space, never serve real traffic, and make SHOW CACHES output noisy and confusing. To get a clean experience, users have to manually deny-list or drop these caches, which defeats the purpose of "auto."

What's getting cached that shouldn't be

  • information_schema / pg_catalog / mysql.* introspection issued by drivers and ORMs at connect time
  • Session-variable lookups (@@version, @user_var)
  • Non-deterministic functions (NOW(), RAND(), version(), current_user) — caching these produces wrong results, not just wasted space

Each of these also costs an upstream PREPARE round-trip and a CREATE CACHE RPC before we discover they're useless.

Solution

Add an eligibility walk over the sqlparser AST that runs before the upstream PREPARE / CREATE CACHE RPC and rejects the categories above. The walk operates on the sqlparser AST (not Readyset's SelectStatement) because shallow caches deliberately accept queries whose Readyset AST conversion would fail — pre-converting would defeat the feature for the queries that most need shallow caching.

Trigger selection in try_auto_create_shallow_cache becomes typed (AutoCreateTrigger::{Hint, InRequestPath}) instead of comparing against a string label, so a future rename can't silently disable the filter. Only InRequestPath consults the filter; explicit CREATE SHALLOW CACHE DDL and /*rs+ CREATE SHALLOW CACHE */ hints bypass it, so users can always force-create deliberately when they know better than the heuristic.

Rejected QueryIds are memoized in a sharded DashSet on QueryStatusCache to avoid re-walking the AST on every uncached request for the same ineligible query. Read only by the implicit auto-create path, so stuck entries can't block deliberate user creation. Soft-capped at 1,000,000 entries (~18 MB); on overflow the set is bulk-cleared, a warning is logged, and an overflow counter increments. Current size is exposed as a gauge.

Outcome

A new user enabling shallow auto-cache mode against a real workload should see a cache populated with their actual hot queries — not driver chatter — without lifting a finger. This is one step toward "auto-cache mode just works."

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions