Skip to content

itiden/check-sitemap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

check-sitemap

CLI tool that crawls an XML sitemap, loads every URL in a headless Chrome browser, and reports problems — HTTP errors, console errors, and HTML validation issues.

Install

bun install

This will also download a local Chrome binary via Puppeteer.

To run directly without installing globally:

bunx @itiden/check-sitemap https://example.com/sitemap.xml

Usage

bunx @itiden/check-sitemap <sitemap-url> [options]

Options

Flag Description Default
-c, --concurrency <n> Number of concurrent page checks 3
-t, --timeout <ms> Page load timeout in milliseconds 30000
-a, --auth <user:pass> Basic auth credentials none
-l, --limit <n> Only check the first N URLs all
--no-validate-html Skip HTML validation validate
-v, --verbose Show problem details inline during crawl false
-h, --help Show help

Examples

# Check a sitemap
bunx @itiden/check-sitemap https://example.com/sitemap.xml

# With basic auth and higher concurrency
bunx @itiden/check-sitemap https://staging.example.com/sitemap.xml --auth admin:secret -c 5

# Fast check — skip HTML validation, verbose output
bunx @itiden/check-sitemap https://example.com/sitemap.xml --no-validate-html -v

# Custom timeout for slow pages
bunx @itiden/check-sitemap https://example.com/sitemap.xml -t 60000

# Test with only the first 10 URLs
bunx @itiden/check-sitemap https://example.com/sitemap.xml --limit 10

What it checks

  1. Sitemap resolution — recursively follows <sitemapindex> children to collect all <url><loc> entries
  2. HTTP status — flags any response with status >= 400
  3. Console errors — captures console.error output and uncaught JS exceptions from the page
  4. HTML validation — runs html-validate with recommended rules against the rendered page source

Output

Each page is logged with a status as it completes. At the end, a summary lists all pages with problems grouped by URL.

Exits with code 1 if any problems were found, 0 otherwise.

Automated release with GitHub Releases

This repository includes three GitHub workflows:

  • .github/workflows/pr-labeler.yml auto-labels pull requests
  • .github/workflows/release-drafter.yml updates the upcoming release draft with merged PRs
  • .github/workflows/release.yml publishes to npm when you manually publish a GitHub Release

One-time setup

  1. In npm package settings for @itiden/check-sitemap, configure Trusted Publisher for GitHub Actions:
    • Owner: itiden
    • Repository: check-sitemap
    • Workflow filename: release.yml
  2. Keep this workflow permission in .github/workflows/release.yml: id-token: write.

Release flow

  1. Merge PRs to main — each merged PR is added as a row in the draft release notes.
  2. Bump version in package.json (for example 0.1.1) and push to main when you are ready.
  3. Open GitHub Releases and publish the draft (or create/publish a release) with tag v0.1.1 (or 0.1.1).
  4. The publish workflow validates tag/version match and publishes to npm.

No npm token secret is required when Trusted Publishing is active.

About

CLI tool that crawls an XML sitemap, loads every URL in a headless Chrome browser, and reports problems — HTTP errors, console errors, and HTML validation issues.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors