Skip to content

ci: guard the published page against stray external domains#49

Merged
tymofiy merged 1 commit into
mainfrom
guard/external-domain-allowlist
Jun 28, 2026
Merged

ci: guard the published page against stray external domains#49
tymofiy merged 1 commit into
mainfrom
guard/external-domain-allowlist

Conversation

@tymofiy

@tymofiy tymofiy commented Jun 28, 2026

Copy link
Copy Markdown
Owner

What

Adds the committed, no-secret half of the deny-scan defense (Option C in the proposal): a CI + pre-commit guard that fails if docs/index.html (published to GitHub Pages) links to any external host outside a small allowlist.

  • scripts/check_external_domains.py — extracts external hosts from the published page, fails on any not in ALLOWED_HOSTS (standards / repo / identity domains + the author's current product domain), with a built-in --self-test.
  • Wired into .github/workflows/ci.yml (self-test + real check) and mirrored in the lefthook.yml pre-commit hook (globbed to docs/index.html).

Why

A stray external link — a client, vendor, or product domain — reaching the published page is a confidentiality leak. The existing client-name defense lives only in a gitignored local file, so CI never sees it. This catches the leak shape with zero secrets, and (unlike the local hook) covers fresh clones and fork PRs.

Scope / deliberately deferred

  • Does not try to catch arbitrary client names — the spec's own vocabulary (collection / provenance / gallery) makes a name-shape word list false-positive heavy, and arbitrary surnames match no shape. That's the private client-name list in a CI secret (Option A), intentionally deferred.
  • The author's product domain is allowlisted to reflect the page's current state — not an endorsement of keeping the bio link. If that link is dropped to keep the spec brand-neutral, remove the host from ALLOWED_HOSTS so the guard re-tightens.

Verified locally: self-test passes, the current page passes clean (no red-lock), and a planted external domain is correctly rejected.

🤖 Generated with Claude Code

docs/index.html publishes to GitHub Pages, so an external link to a client,
vendor, or product domain reaching it is a confidentiality leak. The existing
client-name defense lives only in a gitignored local file, so CI never sees it.

This adds the committed, no-secret half:
- scripts/check_external_domains.py fails if the published page links to any
  external host outside a small allowlist (standards/repo/identity domains plus
  the author's current product domain), with a built-in --self-test.
- Wired into ci.yml (self-test + real check) and mirrored in the lefthook
  pre-commit hook (globbed to docs/index.html).

Catches the leak shape without the private name list, and unlike the local hook
it covers fresh clones and fork PRs. The private client-name layer (a CI secret)
is intentionally deferred.

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 28, 2026 01:08

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce54a46bfb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

)

PAGE = pathlib.Path(__file__).resolve().parent.parent / "docs" / "index.html"
_HOST_RE = re.compile(r"https?://([A-Za-z0-9._-]+)")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Recognize scheme-relative external links

The guard promises to fail any non-allowlisted external host in docs/index.html, but this regex only matches URLs that explicitly include http:// or https://. In the CI path inspected in .github/workflows/ci.yml, the script is run directly, so a valid published link such as <a href="//client.example/path"> is still an external browser navigation while violations(...) returns [], letting CI and the pre-commit guard pass. Include scheme-relative URLs, or parse link attributes with a URL parser before applying the allowlist.

Useful? React with 👍 / 👎.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a committed (no-secret) CI + pre-commit safeguard to prevent confidentiality leaks via the published GitHub Pages homepage (docs/index.html) by failing builds/commits when the page references external domains outside a small allowlist.

Changes:

  • Add scripts/check_external_domains.py to extract external hosts from docs/index.html and fail on any host not in ALLOWED_HOSTS (with a --self-test mode).
  • Wire the new check into GitHub Actions CI (.github/workflows/ci.yml) and mirror it in the lefthook.yml pre-commit hooks.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
scripts/check_external_domains.py Introduces the external-domain allowlist scanner + self-test used by CI/hooks.
lefthook.yml Adds a pre-commit hook intended to block non-allowlisted external domains in docs/index.html.
.github/workflows/ci.yml Adds CI steps to run the self-test and enforce the external-domain allowlist check.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +81 to +84
if not PAGE.exists():
print(f"::error::{PAGE} not found")
return 1
bad = violations(PAGE.read_text(encoding="utf-8"))
Comment thread lefthook.yml
Comment on lines +59 to +63
external-domains:
glob: "docs/index.html"
run: |
if [ -x .venv/bin/python ]; then PY=.venv/bin/python; else PY=python3; fi
"$PY" scripts/check_external_domains.py
@tymofiy tymofiy merged commit a5455f1 into main Jun 28, 2026
5 checks passed
@tymofiy tymofiy deleted the guard/external-domain-allowlist branch June 28, 2026 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants