Open-source scoring rubric for measuring how AI shopping agents (ChatGPT, Claude, Gemini, Mistral, DeepSeek) read e-commerce product catalogs. Calibrated against 270,000+ ground-truth captures from real AI agents.
License: CC0 1.0 Universal — public domain, no attribution required. Use it, fork it, embed it in a competing app. We open-sourced the methodology because the rubric is more valuable to the ecosystem than to us locked behind a paywall.
The reference implementation that runs this rubric on live Shopify stores ships as a closed app at aicatalogscore.com — that's where the dataset, the LLM rewrites, and the Score Guarantee live. The rubric here is the audit definition itself.
An AI shopping agent's recommendation depends on whether your product page has clean, machine-readable signals. We score 8 dimensions, 100 points total:
| # | Dimension | Max | What it measures |
|---|---|---|---|
| 1 | Title quality | 15 | Length 30-80 chars, product-type noun present, distinctive attribute, not ALL CAPS, no placeholder text |
| 2 | Description | 20 | 150+ words, ≥3 factual markers (units/ingredients/dimensions), ≥1 <ul><li> bullet list, ≥1 subheading, ≥2 use-case mentions, zero fluff terms |
| 3 | Images & alt text | 15 | ≥3 images, ≥80% with alt text length > 5 chars, alt text describes product not just brand |
| 4 | Variant structure | 10 | Variants present (not just "Default Title"), SKU set, barcode set on at least one variant |
| 5 | Metafields | 15 | Google Product Category set, vertical-aware "material bucket" set (skincare → key_ingredient; apparel → material; food → ingredients; etc.), vertical-aware "dimensions" + "care" buckets set |
| 6 | Category & tags | 10 | Shopify Standard Product Taxonomy category assigned, ≥5 tags, productType set |
| 7 | SEO | 10 | seo.title 30-60 chars and ≠ product title, seo.description 70-160 chars and present |
| 8 | Pricing & inventory | 5 | price > 0 set, compareAtPrice present on ≥1 variant, inventory tracking enabled |
Grade bands:
- A+ : 95-100 (AI-ready, top 1%)
- A : 85-94 (likely recommended)
- B : 70-84 (sometimes recommended)
- C : 50-69 (rarely recommended)
- D : 30-49 (occasional discovery)
- F : 0-29 (effectively invisible)
The audit uses different regex tables per merchant vertical so each vertical's natural metafield naming + factual markers get credit. 10 verticals modeled, plus a universal fallback:
apparel · beauty · home · electronics · fitness · food
pets · baby · outdoor · gifts · universal
Example: beauty/skincare scores key_ingredient as the material
bucket (not material — a skincare product has ingredients, not a
material). Food vertical scores ingredients + allergen patterns.
Electronics scores housing_material + battery_capacity.
Full vocab tables: rubric/vertical-vocab.md
We didn't pick them. We observed them by running 2,400+ synthetic shopping queries per day across 6 AI agents (ChatGPT, Claude, Gemini, Perplexity, Mistral, DeepSeek) and statistically modeling which catalog signals correlated with being recommended in the answer.
The dataset (270k+ captures, growing) is published under MIT in our
sister repo: ai-visibility-metrics.
This repo intentionally ships the rubric, not a Shopify SDK. If you want a live audit against your own store, install the hosted app at aicatalogscore.com — it's free up to 15 SKUs.
If you want to embed the rubric in your own tooling:
- Read
RUBRIC.md— the full scoring spec - Read
vertical-vocab.md— the regex tables - Implement against your platform (Shopify, WooCommerce, BigCommerce, custom)
We'd love to see ports. Open a PR with a link to your implementation and we'll list it.
The rubric is intentionally a living document. As AI agents shift their ranking signals, the vocab tables and weight distributions need to be re-calibrated. We re-publish a new version every quarter.
PRs welcome for:
- New vertical vocab tables (currently 10 — there are gaps)
- Updated factual-marker regexes (e.g. new beauty ingredient terminology)
- Translations of the rubric to other languages
- Implementations against non-Shopify platforms
See CONTRIBUTING.md.
| Version | Released | What changed |
|---|---|---|
| v2.4 | 2026-05-14 | Vertical-aware scoring across all 10 verticals. Per-vertical material bucket. |
| v2 | 2026-05-14 | Calibration tightening. Stock Shopify avg moved from 64 → 58 to discourage gaming. |
| v1 | 2026-04-12 | Initial release. 8 dimensions, generic vocab. |
Maintained by aicatalogscore.com.