Skip to content

docs(skills): query-warehouse Snowflake selectivity heuristic needs implementation detail #35

Description

@argen

Source: code-review of #31; nice-to-have, deferred at merge.

skills/query-warehouse/SKILL.md claims a conservative pre-run cost estimate for Snowflake using INFORMATION_SCHEMA.TABLE_STORAGE_METRICS × selectivity heuristic (vs BigQuery's native dry-run mode). The skill prose names the approach but doesn't spell out the selectivity formula clearly enough for an agent to apply it consistently across queries.

Today the skill says "table size × selectivity heuristic — conservative, will sometimes refuse a query that would have been cheap" but doesn't define what selectivity to assume. Two cases will diverge:

  • A query with a WHERE clause on a clustered column → low selectivity (~1-10% of rows scanned).
  • A query without any filter → 100% scan.

Concrete proposal:

  • Define the selectivity heuristic explicitly: e.g., "no WHERE clause → 100%; WHERE on clustering key → 5%; WHERE on non-clustered column → 50% as upper bound."
  • Add a worked example in the skill: query → estimated bytes → upper-bound dollar → kill-cap check outcome.
  • Or: drop the Snowflake estimate entirely from this skill and route Snowflake users through a warehouse-layer billing alert + small probe query, since mcp-snowflake-server's dry-run support is thin.

Files: skills/query-warehouse/SKILL.md — the Key Concepts and Shape 2 sections.

Risk: medium — until this is sharpened, the Snowflake half of the skill is documented but not operationally usable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions