Skip to content

feat: data-driven roles, self-validating instruction layer, and scheduled Arch CI#155

Merged
tenhishadow merged 20 commits into
masterfrom
refactor/maintainability
Jun 13, 2026
Merged

feat: data-driven roles, self-validating instruction layer, and scheduled Arch CI#155
tenhishadow merged 20 commits into
masterfrom
refactor/maintainability

Conversation

@tenhishadow

Copy link
Copy Markdown
Owner

Summary

This PR brings three stacked refactorings of the dotfiles repo to master. The
default go-task boundary stays user-level and sudo-free throughout; no
privileged behaviour changes except where explicitly noted as behaviour-
preserving.

  1. Structural refactor — make the config data-driven and pin reports.
  2. AI-instruction layer — collapse to a single source and make it
    self-validating.
  3. Maintainability — collapse the managed-drop-in duplication, schedule the
    Arch container CI, and single-source the architecture docs.

go-task verify passes end-to-end locally (Docker up), including the Arch
container test:system (idempotency changed=0) and Super-Linter.

1. Structural refactor

  • browser_policies is now data-driven: Chromium/Firefox/Thunderbird/VS Code
    policy files are built from shared descriptors instead of four near-duplicate
    task paths.
  • Validation checks path/name variables by name lookup instead of a parallel
    variable list.
  • workstation_report.py managed paths are pinned to the Ansible role variables
    and kept honest by a new managed-paths:check drift guard.
  • Taskfile.yml repeated blocks collapsed into vars/anchors.
  • Optional subsystems are gated at the include level; playbook become,
    connection, and tags normalized.

2. AI-instruction layer — single source + self-validating

Mechanical rules became enforced checks (all wired into go-task verify):

New target Enforces
lint:ansible-semantics <Domain> | <Verb> <object> names + exact notify/handler match (structural YAML walk)
lint:english English-only repo text (non-Latin letters; UI glyphs allowed)
docs:instructions:check No doc references a missing go-task target, role, playbook, or path
docs:agents / :check The nested-AGENTS.md ownership map is generated and drift-checked
  • Repo-wide rules now live in exactly one canonical place (root AGENTS.md);
    the full naming spec and validation matrix are no longer duplicated.
  • The documentation-sync contract was rewritten: edit the one canonical rule and
    the checks guarantee the rest, replacing the manual fan-out across README,
    AGENTS, copilot-instructions, and instructions/*.
  • .github/copilot-instructions.md is now under the repo's own 4,000-char
    Copilot limit (4238 → 3910 bytes), which it previously exceeded.

3. Maintainability

  • Drop-in collapse (behaviour-preserving): journald/timesyncd/sshd shared
    the same shape across three near-parallel task files and three j2 templates.
    They now use two shared format partials (dropin-systemd.conf.j2,
    dropin-sshd.conf.j2); journald and timesyncd are described by a
    system_dropins descriptor and rendered by one sys-dropins.yml flow. sshd
    keeps its dedicated flow (verify-include → write(validate) → validate-config
    ordering and absent-style cleanup are not shared) but renders through the
    shared partial.
    • Proven equivalent: localhost render of every drop-in is byte-for-byte
      identical before/after; host system:check shows the drop-in tasks
      unchanged with the SSHD Include line intact; the Arch container
      test:system idempotency run reports changed=0.
  • Scheduled CI: a weekly schedule reuses the existing task-all Arch
    container job (no copied steps) to catch rolling-release upstream breakage;
    the cross-OS apply matrix is skipped on scheduled runs and permissions stay
    contents: read.
  • Single-source docs: docs/architecture.md is the canonical architecture
    and safety-boundary description; README and AGENTS link to it.

Validation

  • go-task verify → green (Docker up): ansible-lint (production profile),
    yamllint, vint, stylua, the four new checks, actionlint, renovate, pre-commit,
    managed-paths:check, system:check, test:system, browser-policies check,
    Super-Linter.
  • Drop-in equivalence: diff -r /tmp/eq-before /tmp/eq-after empty.
  • Container idempotency: test:system second run changed=0.

Release

Titled feat: so a squash merge bumps a minor release via release-please
(2.2.3 → 2.3.0). Merging this PR makes release-please open the release PR.

Notes

  • No secrets or machine-local runtime state added.
  • Independent host-review notes were kept in a local-only REVIEW.md
    (git-ignored, not part of this PR). Two follow-ups it flagged are pre-existing
    and out of scope here: go-task test:nvim fails on a clean checkout (missing
    terraform/terragrunt root-marker fixtures), and the reusable workflow
    taskfile.uv.yaml@main is pinned to a mutable branch.

🤖 Generated with Claude Code

tenhishadow and others added 20 commits June 13, 2026 16:58
The chromium, firefox, thunderbird and vscode task files each repeated the
same inline-Jinja builder (with list .append side effects) to turn family
targets into the {name, family, directory, path, content} spec consumed by
policy-files.yml; firefox and thunderbird were byte-identical bar a prefix
and a literal. validate.yml repeated the same append pattern three times to
assemble the path-target checks.

Describe the families as data in defaults (browser_policies_families) and add
a single build-policy-files.yml that derives the spec list from a descriptor
using map/combine/ternary and a namespace accumulator instead of .append.
main.yml keeps one thin include per family so the per-family tags and enable
flags stay intact, and validate.yml builds its path-target checks from the
same descriptors.

Rendered policy specs are byte-identical across the default, lockdown and
custom-extension input sets (verified by dumping browser_policy_file
to_nice_json before/after; diff -r empty).
roles/system/tasks/validate.yml carried a second, hand-maintained copy of the
defaults inventory: 30 path variables and 15 name variables each written as a
{key, value} pair with the name repeated inside a {{ }} expression.
roles/dotfiles/tasks/validate.yml repeated the same "is string + starts with
/" pair across a dozen path variables inline.

Iterate over a plain list of variable names and resolve the value with
lookup('ansible.builtin.vars', item) instead. The name lists are byte-for-byte
the same set the old key/value pairs validated (varnames auto-discovery was
rejected because ^system_.*_(path|dir)$ would miss system_aur_build_root and
system_crontab_bin and reclassify system_sshd_dropin_include_path), so the
checks are unchanged in strength: a clean apply still validates, and forcing a
path variable relative or a required string empty still fails the assert.
workstation_report.py hardcoded the managed system and browser-policy paths
that are also defined as role variables. A path change in Ansible silently
desynced the report. The report stays dependency-light (stdlib only), so the
literals remain, but they now live in named constants and a new consistency
check renders the same paths from the role variables (managed_paths.yml) and
asserts they match (check_managed_paths.py), wired into go-task verify via
managed-paths:check.

A deliberate path change in the role variables that is not mirrored in the
report now fails verify. Report stdout for doctor, dotfiles:plan,
system:report and browser-policies:report is unchanged (byte-identical).
Several blocks were duplicated verbatim: the pacman install flags across the
three DEPS_*_ARCHLINUX vars, the browser_policies playbook invocation across
two tasks, the super-linter docker run across two image tags, the nvim
bootstrap commands across test:nvim and test:nvim:compat, and the XDG home
paths across the three nvim test env blocks.

Extract the shared pieces into a PACMAN_INSTALL prefix, a RUN_BROWSER_PLAYBOOK
var (used by both browser tasks), a SUPERLINTER_IMAGE_TAG var (overridden to
"latest" in superlinter:latest), YAML scalar anchors for the bootstrap
commands, and anchors for the XDG paths. go-task does not support YAML merge
keys, so the per-env extras stay inline; only the shared scalars are anchored.

Task names and resolved commands are unchanged: go-task <task> --dry is
byte-identical for every task, and the resolved env maps for the three nvim
test tasks are unchanged.
connection: local was set on the install and browser playbooks but not on
system, even though the inventory already pins ansible_connection: local for
this_host. The browser playbook also tagged browser_policies at both the play
and the role level, and the system play left its privilege model implicit.

Drop the redundant connection from install and browser so all three rely on
the inventory, make the system play's unprivileged model explicit with
become: false (the role still escalates per task), and keep the browser tag at
the role level only, matching how install and system tag their roles.

No task gains or loses become (the 55 role become directives are unchanged),
per-task tags are identical so tag filtering is unaffected, and the playbooks
still connect locally and escalate where required.
Subsystem gating was inconsistent: limits, docker, laptop, user-services and
aur were gated at the include with system_*_enabled, sysctl carried its
system_sysctl_enabled flag inside the task, and locale, console, login, cron,
journald, sshd and time had no off-switch at all.

Give every optional subsystem an include-level when: system_<x>_enabled, add
the seven missing flags (all default true) and validate them as booleans, and
drop the now-redundant system_sysctl_enabled check from sys-sysctl.yml. The
environment guards (containers, CI, systemd availability) stay inside each
subsystem since they are not enable checks.

At default values nothing changes: --list-tasks matches the baseline and the
system:check executed task set is identical (94 tasks); disabling a flag now
skips that include, and test:system stays green.
dotfiles/.config/kitty/gruvbox-dark.conf was byte-identical to gruvbox_dark.conf
and was linked through a second mapping entry (kitty_gruvbox_dark_alias). No
kitty config includes the hyphenated name (kitty.conf includes theme.conf,
which does not reference it).

BEHAVIOR CHANGE: remove the duplicate payload and its mapping, and add the old
destination to dotfiles_cleanup_paths so existing installs drop the stale
~/.config/kitty/gruvbox-dark.conf symlink on the next apply. The canonical
gruvbox_dark.conf link is unchanged, so the theme still loads.

dotfiles:check reports exactly one change (removing the stale symlink); apply
is otherwise clean.
The go-task command vocabulary was spread across many docs, and the
documentation contract told contributors to keep README, AGENTS.md,
copilot-instructions and the instruction files manually in sync whenever
commands changed.

Designate the README Common Tasks table as the single source of truth for the
command catalog, and update the contract in README and AGENTS.md so other docs
reference commands by name instead of repeating the table. The AGENTS.md
Validation Matrix and copilot-instructions Suggested Validation now point at
the README table; their change-type-to-command guidance is kept as it is
path/context specific, not a duplicate catalog. No command tables are removed
and no files are deleted.
Super-Linter's pylint flags the missing docstring on main(); add one so
go-task verify (and CI Super-Linter) pass.
Add .claude/skills/dotfiles-repo as a thin Claude Code skill that points to the
authoritative sources (the nearest AGENTS.md for rules, the README Common Tasks
table for the go-task command catalog, and the AGENTS.md Validation Matrix for
which check to run) instead of duplicating them.
ansible-lint covers FQCN, var-naming, and modes but not two repo-specific
contracts: the <Domain> | <Verb> <object> name format and exact notify/handler
matching. Add a structural check that walks plays, tasks, handlers, and nested
block/rescue/always so module-parameter and loop-item `name:` keys are not
mistaken for task names. Wire it into verify so these rules are mechanical
instead of prose only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
AGENTS.md requires repository text to stay in English. Add a regression guard
that flags non-ASCII letters (Unicode category L*) in tracked files while
allowing the non-ASCII symbol glyphs that terminal configs legitimately use
(Powerline icons, git ahead/behind arrows, box-drawing, zero-width spaces).
Wire it into verify.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Instruction docs (AGENTS.md files, Copilot instructions, README, docs) name
go-task targets, roles, playbooks, and repository paths that silently rot when
something is renamed or removed. Add a check that resolves every go-task
reference against Taskfile.yml and every repo-anchored path against the tree,
ignoring system paths and URLs. Wire it into verify. This absorbs the README
command-catalog freshness intent by guaranteeing the catalog never references a
non-existent target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The root AGENTS.md lists every nested AGENTS.md an editor must consult; that
list rots when a role or area gains or loses one. Generate the list from the
filesystem and add docs:agents / docs:agents:check mirroring docs:nvim-keymaps.
Together with the reference check (every listed path exists) the ownership map
can no longer drift in either direction. Wire the check into verify.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The .test report and check scripts generate __pycache__ when run (including in
go-task verify). Ignore it so the checks do not leave an untracked directory.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Repo-wide normative prose now lives in exactly one canonical place. The full
<Domain> | <Verb> <object> naming spec and the validation matrix appear once in
root AGENTS.md; nested AGENTS.md, the Copilot surface, and the path-scoped
instructions reference them instead of restating them. Mechanical rules
(naming, notify/handler, variable naming, English-only) are annotated as
enforced by their checks rather than lectured.

Rewrite the documentation-sync contract for the single-source model: edit the
one canonical rule and let go-task checks (lint:ansible-semantics, lint:english,
docs:agents:check, docs:instructions:check) guarantee the rest, replacing the
manual fan-out across README, AGENTS, copilot-instructions, and instructions/*.

This also brings .github/copilot-instructions.md under the repo's own
4,000-character Copilot limit (4238 -> 3910 bytes), which it previously
exceeded.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The .test/*.py scripts are validated by Super-Linter (only .test/nvim/ is
excluded), so match the repo's black/ruff/isort/pylint style: wrap long lines
and drop the unused sys imports. Behavior is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The layer model and safety boundaries were retold in three slightly diverging
places (README, docs/architecture.md, root AGENTS.md). Designate
docs/architecture.md as canonical, fix its own Thunderbird omission, and have
README "Project Evolution" and the AGENTS.md architecture section link to it
instead of re-narrating the layer model. Command quick-reference tables stay.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Arch is rolling, so upstream eventually breaks a package name or default
config. Add a weekly schedule trigger to the ansible workflow that reuses the
existing task-all Arch container job (go-task all -- --skip-tags pkg,aur) with
no copied steps. Skip the cross-OS apply matrix on scheduled runs to keep them
lean; permissions stay contents:read and checkout keeps persist-credentials
false. Document the schedule in the README.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
journald, timesyncd, and sshd each had a near-parallel task file plus its own
j2 template for the same shape: render a settings map into a drop-in, reset
legacy main-file lines, and remove legacy drop-ins. Replace the three templates
with two shared format partials (dropin-systemd.conf.j2 for Key=Value under a
[Section], dropin-sshd.conf.j2 for Key value) keyed by render_style, and
describe journald and timesyncd as system_dropins descriptors rendered by one
sys-dropins.yml flow (reset via sys-dropin-reset.yml, ensure dir, write, remove
legacy). timesyncd's hardcoded NTP/FallbackNTP became system_timesyncd_settings
so it shares the systemd partial.

sshd keeps its dedicated flow (sys-sshd.yml) but now renders through the shared
sshd partial, because its verify-include -> write(validate) -> validate-config
ordering and absent-style main-file cleanup are not shared with the reset-style
systemd entries. sys-journald.yml is gone; time.yml keeps only its non-drop-in
timesyncd service tasks.

Behavior-preserving: rendered journald/timesyncd/sshd drop-ins are byte-for-byte
identical before/after (localhost render diff is empty), host system:check shows
the drop-in tasks unchanged with the sshd Include line intact, and the Arch
container test:system idempotency run reports changed=0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added documentation Documentation and Markdown changes dotfiles User-level dotfiles payload or install flow ansible Ansible playbooks, inventory, roles, or Galaxy requirements ci GitHub Actions and CI configuration automation Repository automation, Taskfile, Renovate, or release tooling tests Test fixtures, smoke tests, or validation harnesses ai-instructions AI agent and Copilot instruction changes browser-policies Browser and VS Code enterprise policy automation github GitHub repository metadata, templates, or workflows inventory Ansible inventory and host variable ownership labels Jun 13, 2026
@github-actions github-actions Bot added security Security-sensitive settings, secrets hygiene, SSHD, sysctl, or policy surfaces system Opt-in Arch Linux workstation system role labels Jun 13, 2026
@tenhishadow tenhishadow merged commit ad44c03 into master Jun 13, 2026
29 checks passed
@tenhishadow tenhishadow deleted the refactor/maintainability branch June 13, 2026 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-instructions AI agent and Copilot instruction changes ansible Ansible playbooks, inventory, roles, or Galaxy requirements automation Repository automation, Taskfile, Renovate, or release tooling browser-policies Browser and VS Code enterprise policy automation ci GitHub Actions and CI configuration documentation Documentation and Markdown changes dotfiles User-level dotfiles payload or install flow github GitHub repository metadata, templates, or workflows inventory Ansible inventory and host variable ownership security Security-sensitive settings, secrets hygiene, SSHD, sysctl, or policy surfaces system Opt-in Arch Linux workstation system role tests Test fixtures, smoke tests, or validation harnesses

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant