feat: data-driven roles, self-validating instruction layer, and scheduled Arch CI by tenhishadow · Pull Request #155 · tenhishadow/dotfiles

tenhishadow · 2026-06-13T21:45:05Z

Summary

This PR brings three stacked refactorings of the dotfiles repo to master. The
default go-task boundary stays user-level and sudo-free throughout; no
privileged behaviour changes except where explicitly noted as behaviour-
preserving.

Structural refactor — make the config data-driven and pin reports.
AI-instruction layer — collapse to a single source and make it
self-validating.
Maintainability — collapse the managed-drop-in duplication, schedule the
Arch container CI, and single-source the architecture docs.

go-task verify passes end-to-end locally (Docker up), including the Arch
container test:system (idempotency changed=0) and Super-Linter.

1. Structural refactor

browser_policies is now data-driven: Chromium/Firefox/Thunderbird/VS Code
policy files are built from shared descriptors instead of four near-duplicate
task paths.
Validation checks path/name variables by name lookup instead of a parallel
variable list.
workstation_report.py managed paths are pinned to the Ansible role variables
and kept honest by a new managed-paths:check drift guard.
Taskfile.yml repeated blocks collapsed into vars/anchors.
Optional subsystems are gated at the include level; playbook become,
connection, and tags normalized.

2. AI-instruction layer — single source + self-validating

Mechanical rules became enforced checks (all wired into go-task verify):

New target	Enforces
`lint:ansible-semantics`	`<Domain> \| <Verb> <object>` names + exact `notify`/handler match (structural YAML walk)
`lint:english`	English-only repo text (non-Latin letters; UI glyphs allowed)
`docs:instructions:check`	No doc references a missing `go-task` target, role, playbook, or path
`docs:agents` / `:check`	The nested-`AGENTS.md` ownership map is generated and drift-checked

Repo-wide rules now live in exactly one canonical place (root AGENTS.md);
the full naming spec and validation matrix are no longer duplicated.
The documentation-sync contract was rewritten: edit the one canonical rule and
the checks guarantee the rest, replacing the manual fan-out across README,
AGENTS, copilot-instructions, and instructions/*.
.github/copilot-instructions.md is now under the repo's own 4,000-char
Copilot limit (4238 → 3910 bytes), which it previously exceeded.

3. Maintainability

Drop-in collapse (behaviour-preserving): journald/timesyncd/sshd shared
the same shape across three near-parallel task files and three j2 templates.
They now use two shared format partials (dropin-systemd.conf.j2,
dropin-sshd.conf.j2); journald and timesyncd are described by a
system_dropins descriptor and rendered by one sys-dropins.yml flow. sshd
keeps its dedicated flow (verify-include → write(validate) → validate-config
ordering and absent-style cleanup are not shared) but renders through the
shared partial.
- Proven equivalent: localhost render of every drop-in is byte-for-byte
  identical before/after; host system:check shows the drop-in tasks
  unchanged with the SSHD Include line intact; the Arch container
  test:system idempotency run reports changed=0.
Scheduled CI: a weekly schedule reuses the existing task-all Arch
container job (no copied steps) to catch rolling-release upstream breakage;
the cross-OS apply matrix is skipped on scheduled runs and permissions stay
contents: read.
Single-source docs: docs/architecture.md is the canonical architecture
and safety-boundary description; README and AGENTS link to it.

Validation

go-task verify → green (Docker up): ansible-lint (production profile),
yamllint, vint, stylua, the four new checks, actionlint, renovate, pre-commit,
managed-paths:check, system:check, test:system, browser-policies check,
Super-Linter.
Drop-in equivalence: diff -r /tmp/eq-before /tmp/eq-after empty.
Container idempotency: test:system second run changed=0.

Release

Titled feat: so a squash merge bumps a minor release via release-please
(2.2.3 → 2.3.0). Merging this PR makes release-please open the release PR.

Notes

No secrets or machine-local runtime state added.
Independent host-review notes were kept in a local-only REVIEW.md
(git-ignored, not part of this PR). Two follow-ups it flagged are pre-existing
and out of scope here: go-task test:nvim fails on a clean checkout (missing
terraform/terragrunt root-marker fixtures), and the reusable workflow
taskfile.uv.yaml@main is pinned to a mutable branch.

🤖 Generated with Claude Code

The chromium, firefox, thunderbird and vscode task files each repeated the same inline-Jinja builder (with list .append side effects) to turn family targets into the {name, family, directory, path, content} spec consumed by policy-files.yml; firefox and thunderbird were byte-identical bar a prefix and a literal. validate.yml repeated the same append pattern three times to assemble the path-target checks. Describe the families as data in defaults (browser_policies_families) and add a single build-policy-files.yml that derives the spec list from a descriptor using map/combine/ternary and a namespace accumulator instead of .append. main.yml keeps one thin include per family so the per-family tags and enable flags stay intact, and validate.yml builds its path-target checks from the same descriptors. Rendered policy specs are byte-identical across the default, lockdown and custom-extension input sets (verified by dumping browser_policy_file to_nice_json before/after; diff -r empty).

roles/system/tasks/validate.yml carried a second, hand-maintained copy of the defaults inventory: 30 path variables and 15 name variables each written as a {key, value} pair with the name repeated inside a {{ }} expression. roles/dotfiles/tasks/validate.yml repeated the same "is string + starts with /" pair across a dozen path variables inline. Iterate over a plain list of variable names and resolve the value with lookup('ansible.builtin.vars', item) instead. The name lists are byte-for-byte the same set the old key/value pairs validated (varnames auto-discovery was rejected because ^system_.*_(path|dir)$ would miss system_aur_build_root and system_crontab_bin and reclassify system_sshd_dropin_include_path), so the checks are unchanged in strength: a clean apply still validates, and forcing a path variable relative or a required string empty still fails the assert.

workstation_report.py hardcoded the managed system and browser-policy paths that are also defined as role variables. A path change in Ansible silently desynced the report. The report stays dependency-light (stdlib only), so the literals remain, but they now live in named constants and a new consistency check renders the same paths from the role variables (managed_paths.yml) and asserts they match (check_managed_paths.py), wired into go-task verify via managed-paths:check. A deliberate path change in the role variables that is not mirrored in the report now fails verify. Report stdout for doctor, dotfiles:plan, system:report and browser-policies:report is unchanged (byte-identical).

Several blocks were duplicated verbatim: the pacman install flags across the three DEPS_*_ARCHLINUX vars, the browser_policies playbook invocation across two tasks, the super-linter docker run across two image tags, the nvim bootstrap commands across test:nvim and test:nvim:compat, and the XDG home paths across the three nvim test env blocks. Extract the shared pieces into a PACMAN_INSTALL prefix, a RUN_BROWSER_PLAYBOOK var (used by both browser tasks), a SUPERLINTER_IMAGE_TAG var (overridden to "latest" in superlinter:latest), YAML scalar anchors for the bootstrap commands, and anchors for the XDG paths. go-task does not support YAML merge keys, so the per-env extras stay inline; only the shared scalars are anchored. Task names and resolved commands are unchanged: go-task <task> --dry is byte-identical for every task, and the resolved env maps for the three nvim test tasks are unchanged.

connection: local was set on the install and browser playbooks but not on system, even though the inventory already pins ansible_connection: local for this_host. The browser playbook also tagged browser_policies at both the play and the role level, and the system play left its privilege model implicit. Drop the redundant connection from install and browser so all three rely on the inventory, make the system play's unprivileged model explicit with become: false (the role still escalates per task), and keep the browser tag at the role level only, matching how install and system tag their roles. No task gains or loses become (the 55 role become directives are unchanged), per-task tags are identical so tag filtering is unaffected, and the playbooks still connect locally and escalate where required.

Subsystem gating was inconsistent: limits, docker, laptop, user-services and aur were gated at the include with system_*_enabled, sysctl carried its system_sysctl_enabled flag inside the task, and locale, console, login, cron, journald, sshd and time had no off-switch at all. Give every optional subsystem an include-level when: system_<x>_enabled, add the seven missing flags (all default true) and validate them as booleans, and drop the now-redundant system_sysctl_enabled check from sys-sysctl.yml. The environment guards (containers, CI, systemd availability) stay inside each subsystem since they are not enable checks. At default values nothing changes: --list-tasks matches the baseline and the system:check executed task set is identical (94 tasks); disabling a flag now skips that include, and test:system stays green.

dotfiles/.config/kitty/gruvbox-dark.conf was byte-identical to gruvbox_dark.conf and was linked through a second mapping entry (kitty_gruvbox_dark_alias). No kitty config includes the hyphenated name (kitty.conf includes theme.conf, which does not reference it). BEHAVIOR CHANGE: remove the duplicate payload and its mapping, and add the old destination to dotfiles_cleanup_paths so existing installs drop the stale ~/.config/kitty/gruvbox-dark.conf symlink on the next apply. The canonical gruvbox_dark.conf link is unchanged, so the theme still loads. dotfiles:check reports exactly one change (removing the stale symlink); apply is otherwise clean.

The go-task command vocabulary was spread across many docs, and the documentation contract told contributors to keep README, AGENTS.md, copilot-instructions and the instruction files manually in sync whenever commands changed. Designate the README Common Tasks table as the single source of truth for the command catalog, and update the contract in README and AGENTS.md so other docs reference commands by name instead of repeating the table. The AGENTS.md Validation Matrix and copilot-instructions Suggested Validation now point at the README table; their change-type-to-command guidance is kept as it is path/context specific, not a duplicate catalog. No command tables are removed and no files are deleted.

Super-Linter's pylint flags the missing docstring on main(); add one so go-task verify (and CI Super-Linter) pass.

Add .claude/skills/dotfiles-repo as a thin Claude Code skill that points to the authoritative sources (the nearest AGENTS.md for rules, the README Common Tasks table for the go-task command catalog, and the AGENTS.md Validation Matrix for which check to run) instead of duplicating them.

ansible-lint covers FQCN, var-naming, and modes but not two repo-specific contracts: the <Domain> | <Verb> <object> name format and exact notify/handler matching. Add a structural check that walks plays, tasks, handlers, and nested block/rescue/always so module-parameter and loop-item `name:` keys are not mistaken for task names. Wire it into verify so these rules are mechanical instead of prose only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

AGENTS.md requires repository text to stay in English. Add a regression guard that flags non-ASCII letters (Unicode category L*) in tracked files while allowing the non-ASCII symbol glyphs that terminal configs legitimately use (Powerline icons, git ahead/behind arrows, box-drawing, zero-width spaces). Wire it into verify. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Instruction docs (AGENTS.md files, Copilot instructions, README, docs) name go-task targets, roles, playbooks, and repository paths that silently rot when something is renamed or removed. Add a check that resolves every go-task reference against Taskfile.yml and every repo-anchored path against the tree, ignoring system paths and URLs. Wire it into verify. This absorbs the README command-catalog freshness intent by guaranteeing the catalog never references a non-existent target. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The root AGENTS.md lists every nested AGENTS.md an editor must consult; that list rots when a role or area gains or loses one. Generate the list from the filesystem and add docs:agents / docs:agents:check mirroring docs:nvim-keymaps. Together with the reference check (every listed path exists) the ownership map can no longer drift in either direction. Wire the check into verify. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The .test report and check scripts generate __pycache__ when run (including in go-task verify). Ignore it so the checks do not leave an untracked directory. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Repo-wide normative prose now lives in exactly one canonical place. The full <Domain> | <Verb> <object> naming spec and the validation matrix appear once in root AGENTS.md; nested AGENTS.md, the Copilot surface, and the path-scoped instructions reference them instead of restating them. Mechanical rules (naming, notify/handler, variable naming, English-only) are annotated as enforced by their checks rather than lectured. Rewrite the documentation-sync contract for the single-source model: edit the one canonical rule and let go-task checks (lint:ansible-semantics, lint:english, docs:agents:check, docs:instructions:check) guarantee the rest, replacing the manual fan-out across README, AGENTS, copilot-instructions, and instructions/*. This also brings .github/copilot-instructions.md under the repo's own 4,000-character Copilot limit (4238 -> 3910 bytes), which it previously exceeded. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The .test/*.py scripts are validated by Super-Linter (only .test/nvim/ is excluded), so match the repo's black/ruff/isort/pylint style: wrap long lines and drop the unused sys imports. Behavior is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The layer model and safety boundaries were retold in three slightly diverging places (README, docs/architecture.md, root AGENTS.md). Designate docs/architecture.md as canonical, fix its own Thunderbird omission, and have README "Project Evolution" and the AGENTS.md architecture section link to it instead of re-narrating the layer model. Command quick-reference tables stay. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Arch is rolling, so upstream eventually breaks a package name or default config. Add a weekly schedule trigger to the ansible workflow that reuses the existing task-all Arch container job (go-task all -- --skip-tags pkg,aur) with no copied steps. Skip the cross-OS apply matrix on scheduled runs to keep them lean; permissions stay contents:read and checkout keeps persist-credentials false. Document the schedule in the README. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

journald, timesyncd, and sshd each had a near-parallel task file plus its own j2 template for the same shape: render a settings map into a drop-in, reset legacy main-file lines, and remove legacy drop-ins. Replace the three templates with two shared format partials (dropin-systemd.conf.j2 for Key=Value under a [Section], dropin-sshd.conf.j2 for Key value) keyed by render_style, and describe journald and timesyncd as system_dropins descriptors rendered by one sys-dropins.yml flow (reset via sys-dropin-reset.yml, ensure dir, write, remove legacy). timesyncd's hardcoded NTP/FallbackNTP became system_timesyncd_settings so it shares the systemd partial. sshd keeps its dedicated flow (sys-sshd.yml) but now renders through the shared sshd partial, because its verify-include -> write(validate) -> validate-config ordering and absent-style main-file cleanup are not shared with the reset-style systemd entries. sys-journald.yml is gone; time.yml keeps only its non-drop-in timesyncd service tasks. Behavior-preserving: rendered journald/timesyncd/sshd drop-ins are byte-for-byte identical before/after (localhost render diff is empty), host system:check shows the drop-in tasks unchanged with the sshd Include line intact, and the Arch container test:system idempotency run reports changed=0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

tenhishadow and others added 20 commits June 13, 2026 16:58

chore(report): add docstring to the managed-path check

92e9f8b

Super-Linter's pylint flags the missing docstring on main(); add one so go-task verify (and CI Super-Linter) pass.

chore: ignore Python bytecode caches

68b6852

The .test report and check scripts generate __pycache__ when run (including in go-task verify). Ignore it so the checks do not leave an untracked directory. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions Bot added security Security-sensitive settings, secrets hygiene, SSHD, sysctl, or policy surfaces system Opt-in Arch Linux workstation system role labels Jun 13, 2026

tenhishadow merged commit ad44c03 into master Jun 13, 2026
29 checks passed

tenhishadow deleted the refactor/maintainability branch June 13, 2026 21:56

tenhishadow mentioned this pull request Jun 13, 2026

chore(master): release 2.3.0 #156

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: data-driven roles, self-validating instruction layer, and scheduled Arch CI#155

feat: data-driven roles, self-validating instruction layer, and scheduled Arch CI#155
tenhishadow merged 20 commits into
masterfrom
refactor/maintainability

tenhishadow commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tenhishadow commented Jun 13, 2026

Summary

1. Structural refactor

2. AI-instruction layer — single source + self-validating

3. Maintainability

Validation

Release

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant