feat: data-driven roles, self-validating instruction layer, and scheduled Arch CI#155
Merged
Conversation
The chromium, firefox, thunderbird and vscode task files each repeated the
same inline-Jinja builder (with list .append side effects) to turn family
targets into the {name, family, directory, path, content} spec consumed by
policy-files.yml; firefox and thunderbird were byte-identical bar a prefix
and a literal. validate.yml repeated the same append pattern three times to
assemble the path-target checks.
Describe the families as data in defaults (browser_policies_families) and add
a single build-policy-files.yml that derives the spec list from a descriptor
using map/combine/ternary and a namespace accumulator instead of .append.
main.yml keeps one thin include per family so the per-family tags and enable
flags stay intact, and validate.yml builds its path-target checks from the
same descriptors.
Rendered policy specs are byte-identical across the default, lockdown and
custom-extension input sets (verified by dumping browser_policy_file
to_nice_json before/after; diff -r empty).
roles/system/tasks/validate.yml carried a second, hand-maintained copy of the
defaults inventory: 30 path variables and 15 name variables each written as a
{key, value} pair with the name repeated inside a {{ }} expression.
roles/dotfiles/tasks/validate.yml repeated the same "is string + starts with
/" pair across a dozen path variables inline.
Iterate over a plain list of variable names and resolve the value with
lookup('ansible.builtin.vars', item) instead. The name lists are byte-for-byte
the same set the old key/value pairs validated (varnames auto-discovery was
rejected because ^system_.*_(path|dir)$ would miss system_aur_build_root and
system_crontab_bin and reclassify system_sshd_dropin_include_path), so the
checks are unchanged in strength: a clean apply still validates, and forcing a
path variable relative or a required string empty still fails the assert.
workstation_report.py hardcoded the managed system and browser-policy paths that are also defined as role variables. A path change in Ansible silently desynced the report. The report stays dependency-light (stdlib only), so the literals remain, but they now live in named constants and a new consistency check renders the same paths from the role variables (managed_paths.yml) and asserts they match (check_managed_paths.py), wired into go-task verify via managed-paths:check. A deliberate path change in the role variables that is not mirrored in the report now fails verify. Report stdout for doctor, dotfiles:plan, system:report and browser-policies:report is unchanged (byte-identical).
Several blocks were duplicated verbatim: the pacman install flags across the three DEPS_*_ARCHLINUX vars, the browser_policies playbook invocation across two tasks, the super-linter docker run across two image tags, the nvim bootstrap commands across test:nvim and test:nvim:compat, and the XDG home paths across the three nvim test env blocks. Extract the shared pieces into a PACMAN_INSTALL prefix, a RUN_BROWSER_PLAYBOOK var (used by both browser tasks), a SUPERLINTER_IMAGE_TAG var (overridden to "latest" in superlinter:latest), YAML scalar anchors for the bootstrap commands, and anchors for the XDG paths. go-task does not support YAML merge keys, so the per-env extras stay inline; only the shared scalars are anchored. Task names and resolved commands are unchanged: go-task <task> --dry is byte-identical for every task, and the resolved env maps for the three nvim test tasks are unchanged.
connection: local was set on the install and browser playbooks but not on system, even though the inventory already pins ansible_connection: local for this_host. The browser playbook also tagged browser_policies at both the play and the role level, and the system play left its privilege model implicit. Drop the redundant connection from install and browser so all three rely on the inventory, make the system play's unprivileged model explicit with become: false (the role still escalates per task), and keep the browser tag at the role level only, matching how install and system tag their roles. No task gains or loses become (the 55 role become directives are unchanged), per-task tags are identical so tag filtering is unaffected, and the playbooks still connect locally and escalate where required.
Subsystem gating was inconsistent: limits, docker, laptop, user-services and aur were gated at the include with system_*_enabled, sysctl carried its system_sysctl_enabled flag inside the task, and locale, console, login, cron, journald, sshd and time had no off-switch at all. Give every optional subsystem an include-level when: system_<x>_enabled, add the seven missing flags (all default true) and validate them as booleans, and drop the now-redundant system_sysctl_enabled check from sys-sysctl.yml. The environment guards (containers, CI, systemd availability) stay inside each subsystem since they are not enable checks. At default values nothing changes: --list-tasks matches the baseline and the system:check executed task set is identical (94 tasks); disabling a flag now skips that include, and test:system stays green.
dotfiles/.config/kitty/gruvbox-dark.conf was byte-identical to gruvbox_dark.conf and was linked through a second mapping entry (kitty_gruvbox_dark_alias). No kitty config includes the hyphenated name (kitty.conf includes theme.conf, which does not reference it). BEHAVIOR CHANGE: remove the duplicate payload and its mapping, and add the old destination to dotfiles_cleanup_paths so existing installs drop the stale ~/.config/kitty/gruvbox-dark.conf symlink on the next apply. The canonical gruvbox_dark.conf link is unchanged, so the theme still loads. dotfiles:check reports exactly one change (removing the stale symlink); apply is otherwise clean.
The go-task command vocabulary was spread across many docs, and the documentation contract told contributors to keep README, AGENTS.md, copilot-instructions and the instruction files manually in sync whenever commands changed. Designate the README Common Tasks table as the single source of truth for the command catalog, and update the contract in README and AGENTS.md so other docs reference commands by name instead of repeating the table. The AGENTS.md Validation Matrix and copilot-instructions Suggested Validation now point at the README table; their change-type-to-command guidance is kept as it is path/context specific, not a duplicate catalog. No command tables are removed and no files are deleted.
Super-Linter's pylint flags the missing docstring on main(); add one so go-task verify (and CI Super-Linter) pass.
Add .claude/skills/dotfiles-repo as a thin Claude Code skill that points to the authoritative sources (the nearest AGENTS.md for rules, the README Common Tasks table for the go-task command catalog, and the AGENTS.md Validation Matrix for which check to run) instead of duplicating them.
ansible-lint covers FQCN, var-naming, and modes but not two repo-specific contracts: the <Domain> | <Verb> <object> name format and exact notify/handler matching. Add a structural check that walks plays, tasks, handlers, and nested block/rescue/always so module-parameter and loop-item `name:` keys are not mistaken for task names. Wire it into verify so these rules are mechanical instead of prose only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
AGENTS.md requires repository text to stay in English. Add a regression guard that flags non-ASCII letters (Unicode category L*) in tracked files while allowing the non-ASCII symbol glyphs that terminal configs legitimately use (Powerline icons, git ahead/behind arrows, box-drawing, zero-width spaces). Wire it into verify. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Instruction docs (AGENTS.md files, Copilot instructions, README, docs) name go-task targets, roles, playbooks, and repository paths that silently rot when something is renamed or removed. Add a check that resolves every go-task reference against Taskfile.yml and every repo-anchored path against the tree, ignoring system paths and URLs. Wire it into verify. This absorbs the README command-catalog freshness intent by guaranteeing the catalog never references a non-existent target. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The root AGENTS.md lists every nested AGENTS.md an editor must consult; that list rots when a role or area gains or loses one. Generate the list from the filesystem and add docs:agents / docs:agents:check mirroring docs:nvim-keymaps. Together with the reference check (every listed path exists) the ownership map can no longer drift in either direction. Wire the check into verify. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The .test report and check scripts generate __pycache__ when run (including in go-task verify). Ignore it so the checks do not leave an untracked directory. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Repo-wide normative prose now lives in exactly one canonical place. The full <Domain> | <Verb> <object> naming spec and the validation matrix appear once in root AGENTS.md; nested AGENTS.md, the Copilot surface, and the path-scoped instructions reference them instead of restating them. Mechanical rules (naming, notify/handler, variable naming, English-only) are annotated as enforced by their checks rather than lectured. Rewrite the documentation-sync contract for the single-source model: edit the one canonical rule and let go-task checks (lint:ansible-semantics, lint:english, docs:agents:check, docs:instructions:check) guarantee the rest, replacing the manual fan-out across README, AGENTS, copilot-instructions, and instructions/*. This also brings .github/copilot-instructions.md under the repo's own 4,000-character Copilot limit (4238 -> 3910 bytes), which it previously exceeded. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The .test/*.py scripts are validated by Super-Linter (only .test/nvim/ is excluded), so match the repo's black/ruff/isort/pylint style: wrap long lines and drop the unused sys imports. Behavior is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The layer model and safety boundaries were retold in three slightly diverging places (README, docs/architecture.md, root AGENTS.md). Designate docs/architecture.md as canonical, fix its own Thunderbird omission, and have README "Project Evolution" and the AGENTS.md architecture section link to it instead of re-narrating the layer model. Command quick-reference tables stay. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Arch is rolling, so upstream eventually breaks a package name or default config. Add a weekly schedule trigger to the ansible workflow that reuses the existing task-all Arch container job (go-task all -- --skip-tags pkg,aur) with no copied steps. Skip the cross-OS apply matrix on scheduled runs to keep them lean; permissions stay contents:read and checkout keeps persist-credentials false. Document the schedule in the README. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
journald, timesyncd, and sshd each had a near-parallel task file plus its own j2 template for the same shape: render a settings map into a drop-in, reset legacy main-file lines, and remove legacy drop-ins. Replace the three templates with two shared format partials (dropin-systemd.conf.j2 for Key=Value under a [Section], dropin-sshd.conf.j2 for Key value) keyed by render_style, and describe journald and timesyncd as system_dropins descriptors rendered by one sys-dropins.yml flow (reset via sys-dropin-reset.yml, ensure dir, write, remove legacy). timesyncd's hardcoded NTP/FallbackNTP became system_timesyncd_settings so it shares the systemd partial. sshd keeps its dedicated flow (sys-sshd.yml) but now renders through the shared sshd partial, because its verify-include -> write(validate) -> validate-config ordering and absent-style main-file cleanup are not shared with the reset-style systemd entries. sys-journald.yml is gone; time.yml keeps only its non-drop-in timesyncd service tasks. Behavior-preserving: rendered journald/timesyncd/sshd drop-ins are byte-for-byte identical before/after (localhost render diff is empty), host system:check shows the drop-in tasks unchanged with the sshd Include line intact, and the Arch container test:system idempotency run reports changed=0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR brings three stacked refactorings of the dotfiles repo to
master. Thedefault
go-taskboundary stays user-level and sudo-free throughout; noprivileged behaviour changes except where explicitly noted as behaviour-
preserving.
self-validating.
Arch container CI, and single-source the architecture docs.
go-task verifypasses end-to-end locally (Docker up), including the Archcontainer
test:system(idempotencychanged=0) and Super-Linter.1. Structural refactor
browser_policiesis now data-driven: Chromium/Firefox/Thunderbird/VS Codepolicy files are built from shared descriptors instead of four near-duplicate
task paths.
variable list.
workstation_report.pymanaged paths are pinned to the Ansible role variablesand kept honest by a new
managed-paths:checkdrift guard.Taskfile.ymlrepeated blocks collapsed into vars/anchors.become,connection, and tags normalized.
2. AI-instruction layer — single source + self-validating
Mechanical rules became enforced checks (all wired into
go-task verify):lint:ansible-semantics<Domain> | <Verb> <object>names + exactnotify/handler match (structural YAML walk)lint:englishdocs:instructions:checkgo-tasktarget, role, playbook, or pathdocs:agents/:checkAGENTS.mdownership map is generated and drift-checkedAGENTS.md);the full naming spec and validation matrix are no longer duplicated.
the checks guarantee the rest, replacing the manual fan-out across README,
AGENTS, copilot-instructions, and
instructions/*..github/copilot-instructions.mdis now under the repo's own 4,000-charCopilot limit (4238 → 3910 bytes), which it previously exceeded.
3. Maintainability
the same shape across three near-parallel task files and three j2 templates.
They now use two shared format partials (
dropin-systemd.conf.j2,dropin-sshd.conf.j2); journald and timesyncd are described by asystem_dropinsdescriptor and rendered by onesys-dropins.ymlflow. sshdkeeps its dedicated flow (verify-include → write(validate) → validate-config
ordering and absent-style cleanup are not shared) but renders through the
shared partial.
identical before/after; host
system:checkshows the drop-in tasksunchanged with the SSHD
Includeline intact; the Arch containertest:systemidempotency run reportschanged=0.schedulereuses the existingtask-allArchcontainer job (no copied steps) to catch rolling-release upstream breakage;
the cross-OS apply matrix is skipped on scheduled runs and permissions stay
contents: read.docs/architecture.mdis the canonical architectureand safety-boundary description; README and AGENTS link to it.
Validation
go-task verify→ green (Docker up): ansible-lint (production profile),yamllint, vint, stylua, the four new checks, actionlint, renovate, pre-commit,
managed-paths:check,system:check,test:system, browser-policies check,Super-Linter.
diff -r /tmp/eq-before /tmp/eq-afterempty.test:systemsecond runchanged=0.Release
Titled
feat:so a squash merge bumps a minor release via release-please(2.2.3 → 2.3.0). Merging this PR makes release-please open the release PR.
Notes
REVIEW.md(git-ignored, not part of this PR). Two follow-ups it flagged are pre-existing
and out of scope here:
go-task test:nvimfails on a clean checkout (missingterraform/terragrunt root-marker fixtures), and the reusable workflow
taskfile.uv.yaml@mainis pinned to a mutable branch.🤖 Generated with Claude Code