Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions .github/workflows/content-lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Content lint: gate pull requests on textbook authoring conventions.
#
# Runs lightweight checks on the Markdown content. Separate from the deploy
# workflow (jekyll.yml) on purpose: a content-convention miss should block a
# MERGE, but never take down the live site. This is also the home for future
# content gates (e.g. the #99 code-block standardization lint).
name: Content lint

on:
pull_request:
workflow_dispatch:

permissions:
contents: read

jobs:
seo-frontmatter:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: "3.x"

- name: Check per-page SEO front matter (description:)
run: python scripts/check_seo_frontmatter.py
14 changes: 14 additions & 0 deletions scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,27 @@ Python utilities that bulk-edit or generate content across the textbook. They ar

| Script | Purpose | Apply flag |
|---|---|---|
| [`check_seo_frontmatter.py`](check_seo_frontmatter.py) | **CI gate** — fail if any published page is missing `description:` front matter (reminds about MP4-hero pages lacking a poster). Read-only. | _(none)_ |
| [`generate_og_posters.py`](generate_og_posters.py) | Generate static OG/social-card poster images from a page's hero `<video>` (MP4) via ffmpeg, and set `image:` front matter. | `--run` |
| [`fix_embedded_media.py`](fix_embedded_media.py) | Normalize `<video>` inline styles and wrap bare YouTube iframes responsively. | `--run` |
| [`update_lesson_nav.py`](update_lesson_nav.py) | Migrate old `.btn` lesson nav to card-style `<nav class="lesson-nav">` (rewrites `.md`→`.html`). | `--run` |
| [`fix_arduino_urls.py`](fix_arduino_urls.py) | Migrate old `arduino.cc` URLs to `docs.arduino.cc`. **Untested/brittle — use with care.** | `--apply` |

## Details

### `check_seo_frontmatter.py`

Enforces the per-page SEO convention: every published page must set `description:`.
Run by the **Content lint** GitHub Actions workflow (`.github/workflows/content-lint.yml`)
on every pull request — it exits non-zero (failing the PR check, but never the deploy) if
any page is missing it. Pages marked `nav_exclude: true`/`search_exclude: true`, plus the
contributor docs and deprecated pages, are exempt. `image:` is advisory: the script only
prints a reminder when an MP4-hero page has no poster yet. Read-only; takes no flags.

```bash
python scripts/check_seo_frontmatter.py # exit 0 = all good, 1 = a page is missing description:
```

### `generate_og_posters.py`

For pages whose **first/hero media is an MP4 `<video>`**, extracts a representative
Expand Down
120 changes: 120 additions & 0 deletions scripts/check_seo_frontmatter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
"""
check_seo_frontmatter.py — CI gate: every published page must set `description:`.

Per-page `description:` (and, where possible, `image:`) drives search snippets and
social link-preview cards via jekyll-seo-tag. This script enforces the convention so
new content (and in-flight branches) can't quietly regress to the generic site card.
See the "SEO and social cards" section of website-dev.md.

Rules:
- A "page" = a .md file with YAML front matter containing `layout:`.
- Every page MUST have a non-empty `description:` — EXCEPT:
* pages marked draft via `nav_exclude: true` or `search_exclude: true`, and
* paths in IGNORE (contributor docs, deprecated pages).
A draft becomes subject to the rule as soon as it's published (nav_exclude removed).
- `image:` is ADVISORY: a page whose hero is an MP4 <video> but has no `image:`
yet gets a non-fatal reminder to run scripts/generate_og_posters.py. Pages may
legitimately have no image (they fall back to the site card).

Exit code: 1 if any required page is missing `description:` (fails the CI check);
0 otherwise. Advisory image reminders never affect the exit code.

Usage:
python scripts/check_seo_frontmatter.py
"""

import re
import sys
from pathlib import Path

DOCS_DIR = "."
SKIP_DIRS = {"_site", ".git", "node_modules", "vendor", ".jekyll-cache",
"scripts", "_includes", "_layouts", "_data", "_sass", "assets"}

# Pages exempt from the description: requirement (contributor docs, deprecated).
IGNORE = {
"website-dev.md", "website-install.md", "teaching-notes.md",
"website-content-ideas.md", "README.md", "LICENSE.md", "CLAUDE.md",
"404.md", "arduino/potentiometers-old.md",
}

FRONT_MATTER_RE = re.compile(r"^---\s*\n(.*?)\n---\s*\n", re.DOTALL)
HTML_COMMENT_RE = re.compile(r"<!--.*?-->", re.DOTALL)
MEDIA_TOKEN_RE = re.compile(r"<video\b|!\[|<img\b|<iframe\b", re.IGNORECASE)
MP4_SOURCE_RE = re.compile(r'<source\b[^>]*\.mp4"', re.IGNORECASE)


def front_matter(content):
m = FRONT_MATTER_RE.match(content)
return (m.group(1), content[m.end():]) if m else (None, content)


def fm_has(fm, key):
"""True if front matter has a non-empty value for `key`."""
m = re.search(rf"^{key}:\s*(.+?)\s*$", fm, re.MULTILINE)
return bool(m and m.group(1).strip() not in ("", '""', "''"))


def fm_true(fm, key):
return bool(re.search(rf"^{key}:\s*true\s*$", fm, re.MULTILINE | re.IGNORECASE))


def hero_is_mp4(body):
visible = HTML_COMMENT_RE.sub("", body)
first = MEDIA_TOKEN_RE.search(visible)
if not first or not visible[first.start():first.end()].lower().startswith("<video"):
return False
return bool(MP4_SOURCE_RE.search(visible, first.start()))


def rel(p):
return str(p).replace("\\", "/").lstrip("./")


def main():
missing_desc = []
image_reminders = []
checked = 0

for path in sorted(Path(DOCS_DIR).rglob("*.md")):
if any(part in SKIP_DIRS for part in path.parts):
continue
relpath = rel(path)
if relpath in IGNORE:
continue

fm, body = front_matter(path.read_text(encoding="utf-8"))
if fm is None or not re.search(r"^layout:", fm, re.MULTILINE):
continue # not a page
if fm_true(fm, "nav_exclude") or fm_true(fm, "search_exclude"):
continue # draft / hidden

checked += 1
if not fm_has(fm, "description"):
missing_desc.append(relpath)
if not fm_has(fm, "image") and hero_is_mp4(body):
image_reminders.append(relpath)

print(f"Checked {checked} published page(s).")

if image_reminders:
print(f"\nReminder ({len(image_reminders)}): MP4-hero page(s) with no `image:` "
f"— run `python scripts/generate_og_posters.py --run`:")
for p in image_reminders:
print(f" - {p}")

if missing_desc:
print(f"\nERROR: {len(missing_desc)} published page(s) missing `description:` "
f"front matter:")
for p in missing_desc:
print(f" - {p}")
print("\nAdd a `description:` (see website-dev.md -> 'SEO and social cards'). "
"Drafts can set `nav_exclude: true` to defer.")
return 1

print("\nAll published pages have `description:`. OK")
return 0


if __name__ == "__main__":
sys.exit(main())
28 changes: 28 additions & 0 deletions website-dev.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,34 @@ To verify after a build, grep the output, e.g.:
grep -oiE '<meta (name|property)="(og:image|og:description|description)" content="[^"]*"' _site/arduino/led-fade.html
```

### New pages and enforcement

This is **required**, not optional. A CI check (`scripts/check_seo_frontmatter.py`, run by
the **Content lint** workflow on every pull request) fails the PR if any published page is
missing `description:`. So when you author a new lesson, start from this minimal front matter:

```yaml
---
layout: default
title: "Your Lesson Title"
description: "One or two sentences (≤160 chars) on what the reader learns or builds."
# image: ← add per the rules above; for an MP4 hero, run the poster script (below) instead
parent: Your Section
nav_order: 1
---
```

If a page isn't ready to publish, mark it `nav_exclude: true` (or `search_exclude: true`) and
the check skips it until you publish it. The `image:` key is advisory — the check only *reminds*
you when an MP4-hero page has no poster yet.

For a new page whose hero is an **MP4 `<video>`**, generate its social poster (and have `image:`
set for you) with:

```bash
python scripts/generate_og_posters.py --run <module>/<your-page>.md
```

## Code highlighting
<!-- Code snippet highlighting: https://jekyllrb.com/docs/liquid/tags/#code-snippet-highlighting -->

Expand Down
Loading