Skip to content

VelkinaStudio/schema-guard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

schema-guard

Extract every JSON-LD block from your HTML and fail CI when a Product, Article, or Organization is missing the fields Google requires for rich results.

One missing field in your structured data — a Product with no offers, an Article with no datePublished — quietly removes a page from rich-result eligibility. On a site with thousands of pages, nobody notices until the traffic graph dips weeks later. The existing checkers are web apps you paste into by hand or paid monitors that ping you after the fact. schema-guard runs at build time, reads your actual output HTML, and exits non-zero before the broken markup ships.

  • Zero runtime dependencies. It reads your package-lock.json and leaves it alone. The JSON-LD extraction is hand-rolled.
  • Per-type rules, not a generic JSON schema. It knows that a Product needs price information and an Article needs a publish date, because that is what Google's rich-result docs require.
  • Baseline mode. Adopt it on a codebase that already has issues, record them once, and fail CI only on errors you introduce after that.
  • Errors vs. warnings. A missing required field is an error and fails the build. A missing recommended field is a warning and does not.

Install

Run it directly with npx:

npx schema-guard "dist/**/*.html"

Or install it as a dev dependency:

npm install --save-dev schema-guard

Requires Node 18 or newer. No other dependencies are installed.

Usage

Point it at built HTML files (globs work) or a live URL:

schema-guard "dist/**/*.html"          # validate your build output
schema-guard index.html about.html     # specific files
schema-guard --url https://your.site/  # fetch and validate a live page
schema-guard "dist/**/*.html" --json   # machine-readable output

Real example

This repo ships two example pages that each contain one deliberate mistake, so you can see what a real catch looks like. Run the tool against them:

schema-guard examples/product-page.html examples/article-page.html

Output:

examples/product-page.html
  ERROR L15 block#0 <Product> [offers] Product has no "offers", "review", or "aggregateRating". At least one is required for a product rich result. Add an Offer with price and priceCurrency.
  WARN  L35 block#1 <Organization> [contactPoint] Organization has no "contactPoint". Recommended for customer service / contact knowledge-panel features.

examples/article-page.html
  ERROR L16 block#0 <BlogPosting> [datePublished] Article is missing "datePublished". Required for freshness and Top Stories eligibility; use ISO 8601, e.g. "2026-06-01T08:00:00+00:00".
  WARN  L16 block#0 <BlogPosting> [author] Article has no "author". Recommended; use a Person or Organization with a "name".
schema-guard: 2 error(s), 2 warning(s).

The Product example has a name, an image, and a brand, but no offers — so it would not show a product rich result, and schema-guard catches it. The article has a headline and image but no publish date. The process exits with code 1 because there are errors, which is what stops a CI job.

The complete Organization block in the same product page is not flagged for anything required — it only gets a recommended-field warning. That is the point: it tells you what is actually broken, not everything that is theoretically missing.

Baseline mode

If you adopt schema-guard on a site that already has structured-data errors, you do not want CI red on day one. Record the current errors as a baseline, then gate only on new ones.

# 1. Snapshot today's errors (run once, commit nothing — the file is yours)
schema-guard "dist/**/*.html" --update-baseline .schema-guard-baseline.json

# 2. In CI, fail only when a NEW error appears
schema-guard "dist/**/*.html" --baseline .schema-guard-baseline.json

With a baseline in place, pre-existing errors are marked (known) and the run passes. The moment a pull request adds a new one, it is marked (NEW) and the run fails:

examples/product-page.html
  ERROR L15 block#0 <Product> [offers] ... (known)
  ERROR L35 block#1 <Organization> [url] Organization is missing "url". Required so Search can associate the markup with your site. (NEW)
schema-guard: 3 error(s), 2 warning(s). 1 NEW error(s) not in baseline.

Exit code is 1 because of the one new error, even though the two known errors are tolerated. Fix the new one, the build goes green, and the old debt stays visible until you get to it.

Exit codes

Code Meaning
0 No failing findings.
1 Failing findings present (errors, or new errors in baseline mode).
2 Usage or runtime error (bad arguments, file not found, fetch failed).

Options

Option Effect
--url <u> Fetch and validate a live URL instead of files.
--json Output machine-readable JSON.
--baseline <file> Fail only on errors not present in the baseline file.
--update-baseline <file> Write current errors to the baseline and exit 0.
--strict Treat warnings as failures too.
--no-warn Hide warnings from the report (still counted in JSON).
-h, --help Show help.

What it checks

schema-guard validates the types that drive the most common rich results:

Type Required (error if absent) Recommended (warning)
Product (+ subtypes) name; an offers / review / aggregateRating; a price needs priceCurrency image, description, offers.availability
Article / NewsArticle / BlogPosting headline, image, datePublished author, dateModified, publisher
Organization (+ LocalBusiness, Corporation) name, url, logo sameAs, contactPoint

Every node is also checked for valid JSON, a present @context, and a present @type. A type schema-guard does not cover is reported as a warning and skipped — it does not pretend to validate something it has no rules for.

The requirement tables live in plain data files under src/rules/. To extend coverage, add a src/rules/<type>.js that exports a rule set, register it in src/rules/index.js, and the validator picks it up — no logic to change. The field rules trace to Google Search Central's structured-data documentation; treat the severities as a starting point and adjust them in those files for your own use case.

Use it programmatically

The validator is exported, so you can run it inside your own scripts:

import { validateHtml, summarize } from 'schema-guard';

const findings = validateHtml(htmlString, 'page.html');
const { errors, warnings } = summarize(findings);
if (errors > 0) process.exit(1);

In CI

GitHub Actions, against your build output:

- run: npm run build
- run: npx schema-guard "dist/**/*.html" --baseline .schema-guard-baseline.json

Development

npm test          # node --test, no test runner to install

The tests cover extraction (multiple blocks, escaped </script>, @graph, invalid JSON) and validation (every required/recommended field per type, baseline diffing, and the planted errors in the example files).

License

MIT. See LICENSE.


Built by Velkina — https://velkina.com

About

Extracts every JSON-LD block from your HTML and fails CI when a Product, Article, or Organization is missing the fields Google requires for rich results.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors