Motivation
Readyset in auto-cache mode should just work for the overwhelming majority of users — point it at a workload, and it should silently start caching the queries that benefit from caching, without the user having to curate a list, tune knobs, or clean up after it.
Today, in-request-path shallow auto-creation (REA-6485) falls short of that bar: it caches every SELECT it sees, including the driver/ORM bootstrap traffic and introspection chatter that every connection issues but no one wants cached. The user's first impression is a cache full of SELECT @@version, pg_catalog lookups, and SELECT NOW() — entries that take up space, never serve real traffic, and make SHOW CACHES output noisy and confusing. To get a clean experience, users have to manually deny-list or drop these caches, which defeats the purpose of "auto."
What's getting cached that shouldn't be
information_schema / pg_catalog / mysql.* introspection issued by drivers and ORMs at connect time
- Session-variable lookups (
@@version, @user_var)
- Non-deterministic functions (
NOW(), RAND(), version(), current_user) — caching these produces wrong results, not just wasted space
Each of these also costs an upstream PREPARE round-trip and a CREATE CACHE RPC before we discover they're useless.
Solution
Add an eligibility walk over the sqlparser AST that runs before the upstream PREPARE / CREATE CACHE RPC and rejects the categories above. The walk operates on the sqlparser AST (not Readyset's SelectStatement) because shallow caches deliberately accept queries whose Readyset AST conversion would fail — pre-converting would defeat the feature for the queries that most need shallow caching.
Trigger selection in try_auto_create_shallow_cache becomes typed (AutoCreateTrigger::{Hint, InRequestPath}) instead of comparing against a string label, so a future rename can't silently disable the filter. Only InRequestPath consults the filter; explicit CREATE SHALLOW CACHE DDL and /*rs+ CREATE SHALLOW CACHE */ hints bypass it, so users can always force-create deliberately when they know better than the heuristic.
Rejected QueryIds are memoized in a sharded DashSet on QueryStatusCache to avoid re-walking the AST on every uncached request for the same ineligible query. Read only by the implicit auto-create path, so stuck entries can't block deliberate user creation. Soft-capped at 1,000,000 entries (~18 MB); on overflow the set is bulk-cleared, a warning is logged, and an overflow counter increments. Current size is exposed as a gauge.
Outcome
A new user enabling shallow auto-cache mode against a real workload should see a cache populated with their actual hot queries — not driver chatter — without lifting a finger. This is one step toward "auto-cache mode just works."
Motivation
Readyset in auto-cache mode should just work for the overwhelming majority of users — point it at a workload, and it should silently start caching the queries that benefit from caching, without the user having to curate a list, tune knobs, or clean up after it.
Today, in-request-path shallow auto-creation (REA-6485) falls short of that bar: it caches every SELECT it sees, including the driver/ORM bootstrap traffic and introspection chatter that every connection issues but no one wants cached. The user's first impression is a cache full of
SELECT @@version,pg_cataloglookups, andSELECT NOW()— entries that take up space, never serve real traffic, and makeSHOW CACHESoutput noisy and confusing. To get a clean experience, users have to manually deny-list or drop these caches, which defeats the purpose of "auto."What's getting cached that shouldn't be
information_schema/pg_catalog/mysql.*introspection issued by drivers and ORMs at connect time@@version,@user_var)NOW(),RAND(),version(),current_user) — caching these produces wrong results, not just wasted spaceEach of these also costs an upstream PREPARE round-trip and a
CREATE CACHERPC before we discover they're useless.Solution
Add an eligibility walk over the sqlparser AST that runs before the upstream PREPARE / CREATE CACHE RPC and rejects the categories above. The walk operates on the sqlparser AST (not Readyset's
SelectStatement) because shallow caches deliberately accept queries whose Readyset AST conversion would fail — pre-converting would defeat the feature for the queries that most need shallow caching.Trigger selection in
try_auto_create_shallow_cachebecomes typed (AutoCreateTrigger::{Hint, InRequestPath}) instead of comparing against a string label, so a future rename can't silently disable the filter. OnlyInRequestPathconsults the filter; explicitCREATE SHALLOW CACHEDDL and/*rs+ CREATE SHALLOW CACHE */hints bypass it, so users can always force-create deliberately when they know better than the heuristic.Rejected
QueryIds are memoized in a shardedDashSetonQueryStatusCacheto avoid re-walking the AST on every uncached request for the same ineligible query. Read only by the implicit auto-create path, so stuck entries can't block deliberate user creation. Soft-capped at 1,000,000 entries (~18 MB); on overflow the set is bulk-cleared, a warning is logged, and an overflow counter increments. Current size is exposed as a gauge.Outcome
A new user enabling shallow auto-cache mode against a real workload should see a cache populated with their actual hot queries — not driver chatter — without lifting a finger. This is one step toward "auto-cache mode just works."