Skip to content

nix profile add works inside cells, browser state shares one profile across cell login + chromium + MCP, and RDP-into-cell works on first launch#37

Open
DimmKirr wants to merge 13 commits into
mainfrom
feature/wip
Open

nix profile add works inside cells, browser state shares one profile across cell login + chromium + MCP, and RDP-into-cell works on first launch#37
DimmKirr wants to merge 13 commits into
mainfrom
feature/wip

Conversation

@DimmKirr
Copy link
Copy Markdown
Owner

Changes

nix profile add now works inside a running cell

  • feat(nixhome/fragments): 04-nix-daemon.sh defaults DEVCELL_NIX_DAEMON to true so the daemon spawns at boot, sets NIX_REMOTE=daemon, and the previously-gated fixups (chmod 1777 /tmp, setuid sudo, /nix/var/nix/* state dirs) now always run — nix profile add nixpkgs#scc, nix shell nixpkgs#hello, and nix-env -iA nixpkgs.foo all succeed from inside any cell out of the box; previously every invocation tripped could not set permissions on '/nix/var/nix/profiles/per-user' to 755; opt out with DEVCELL_NIX_DAEMON=false
  • feat(nixhome/base): bundle gnutartar is back on $PATH after the Debian-base slimdown left it missing, so curl … | tar xz, pip install, and binary-release extraction all work without the 7z workaround

Unified browser-state layout (~/.chrome/<app>/ + ~/.playwright/)

  • feat(cmd/chrome): cell login <url> writes Chromium profile to ~/.devcell/<session>/.chrome/<app>/ and Playwright cookie snapshot to ~/.devcell/<session>/.playwright/storage-state.json; fingerprint metadata moves to ~/.devcell/<session>/.playwright/fingerprint.json — one canonical location for browser auth state instead of three scattered per-app dirs (.chrome/<app>/, .chrome-<app>/, .playwright-<app>/) that drifted out of sync
  • feat(nixhome/scraping): patchright-mcp wrapper now reads $HOME/.playwright/storage-state.json and falls back to $HOME/.chrome/${APP_NAME}; chromium wrapper also opens $HOME/.chrome/${APP_NAME} — interactive chromium and MCP automation share one cookie jar so a cell login session is immediately visible to MCP without going through the JSON snapshot
  • feat(nixhome/fragments): 20-homedir.sh exports CHROMIUM_PROFILE_PATH and PLAYWRIGHT_MCP_USER_DATA_DIR both pointing at $HOME/.chrome/${APP_NAME} — eliminates the dash-vs-slash mismatch where cell login wrote to one path and the in-container chromium read from another
  • feat(nixhome/fragments): 22-chromium-singleton.sh (new) sweeps stale SingletonLock / SingletonCookie / SingletonSocket files from all ~/.chrome/*/ profile dirs at every container start — no more Failed to acquire SingletonLock errors or hung launches after docker kill / OOM / container restart; PIDs from prior container generations are dead by namespace isolation so the sweep is unconditionally safe
  • feat(cmd/chrome): after cell login, SIGTERM patchright in every sibling running cell that shares the same cell-home bind mount — Claude's MCP client respawns it on the next browser tool call against the freshly written storage-state.json, so sibling cells stop running with the pre-login cookie jar still in their long-lived BrowserContext
  • feat(cmd/chrome): force a PRAGMA wal_checkpoint(TRUNCATE) on the Cookies sqlite db between Phase 1 exit and Phase 2 launch, and warn ⚠ Cookies database wasn't modified during this session — did you actually log in? when its mtime predates the login session start — surfaces silent "closed the window without authenticating" without aborting extraction

RDP / Chromium GUI works on first launch

  • fix(nixhome/desktop): fonts.fontconfig.defaultFonts populated for serif / sans-serif / monospace / emoji — CSS font-family: sans-serif resolves to a real face (IBM Plex Sans / Cascadia Code / Noto Color Emoji) instead of an empty alias chain, so chromium screenshots and Playwright renders stop shipping blank text
  • fix(nixhome/image): chromium-via-playwright/MCP renders text — homeRoot stages /opt/devcell/.nix-profile → home-manager-path so fontconfig's <dir>$HOME/.nix-profile/share/fonts</dir> resolves; bridges /etc/fonts/conf.d → /opt/devcell/.config/fontconfig/conf.d so pkgs.fontconfig's default <include> resolves; fc-list goes from 1 face (DejaVu only) to 5,971
  • fix(nixhome/image): OCI image config.Env bakes FONTCONFIG_FILE, FONTCONFIG_PATH, LIBGL_*, GALLIUM_DRIVER, VK_ICD_FILENAMES — chromium spawned by the MCP server (not a user shell) inherits font + GPU-software-rendering setup that previously only existed in the shell rc fragment
  • fix(nixhome/fragments): 06-nix-ldpath.sh + 05-shell-rc.sh skip the Debian-era LD_LIBRARY_PATH closure injection when /etc/devcell/.image-built-with-nix2container is present — pure-image binaries with correct DT_RUNPATH stop having different library versions injected ahead of their RPATH; uv stops crashing with undefined symbol: _rjem_malloc (jemalloc mismatch) and x11vnc stops crashing with GLIBC_2.42 not found (libgpg-error mismatch)
  • fix(nixhome/fragments): 50-gui.sh wraps every gosu service launch with env -u LD_LIBRARY_PATH -u _DEVCELL_LD_SET — Xvfb / fluxbox / x11vnc / feh / xrdp / xsetroot all start cleanly on pure images even when an inherited closure path lingers from a parent shell; RDP into the cell reaches the real desktop instead of failing the vnc-any backend

cell build correctness

  • fix(cmd/root): runner.Stack / runner.Modules / runner.PerSessionImage resolution moved to PersistentPreRun so every subcommand picks up the project's [cell].stack before its RunE fires — cell build from a project with stack = "ultimate" tags devcell-user:ultimate-pure instead of always devcell-user:base-pure regardless of which stack actually built
  • fix(nixhome/image): system + user nix.conf ship build-users-group = (empty) — in-container cell build / home-manager switch no longer error the group 'nixbld' specified in 'build-users-group' does not exist (the pure image is single-user by design and doesn't stage nixbld1..10 accounts)

Build provenance (docker inspect shows real date + rev)

  • feat(nixhome/image): OCI image manifest carries org.opencontainers.image.created / .revision labels and a real Created field via nix2container's created parameter, threaded in from DEVCELL_BUILD_DATE / DEVCELL_BUILD_REV env read with builtins.getEnv under --impuredocker inspect <image> shows the real wall-clock build time and git rev instead of 0001-01-01 / unknown
  • fix(nixhome/image): metadataJson derivation no longer interpolates per-build timestamp/rev into /etc/devcell/metadata.json — eliminates the ~3.9 GB customization-layer re-push that happened on every cell build even when no source changed; per-build provenance now lives only in the tiny OCI manifest blob, layer SHAs stay content-stable
  • feat(internal/runner): cell <agent> injects -e DEVCELL_BUILD_DATE=<value> / -e DEVCELL_BUILD_REV=<value> by reading OCI labels via a single docker image inspect (replaces the old docker run --rm cat /etc/devcell/metadata.json that spawned a throwaway container) — startup faster (~ms vs container spawn) and the entrypoint's "User image: …" boot log shows real timestamps
  • feat(internal/runner/pure_build): nix build argv gains --impure and cmd.Env carries DEVCELL_BUILD_DATE (RFC3339 now) + DEVCELL_BUILD_REV (resolved via git rev-parse HEAD with -dirty suffix when worktree unclean, env override wins) — the OCI labels above actually get populated at build time
  • chore(nixhome/entrypoint): boot log "User image: …" prefers runner-injected DEVCELL_BUILD_DATE / DEVCELL_BUILD_REV env over the placeholder JSON fields — users see real <commit> built <date> (tag: …) instead of nix2container 1970-01-01T00:00:00Z

Tests

  • test(cmd): TestPersistentPreRun_SetsRunnerStackFromConfig + TestPersistentPreRun_NoConfig_LeavesDefaults pin the cell build stack-tag fix
  • test(cmd): TestSavePlaywrightFingerprint_* (2 cases) + TestReadPlaywrightFingerprint_ReadsFromPlaywrightSubdir + TestChromePaths_PlaywrightSubdir pin the new fingerprint / storage-state path layout + auto-mkdir behaviour
  • test(cmd): TestKickMcpInCellsSharingCellHome_* covers the post-login sibling-MCP kick with fake docker plumbing (no docker shell-out from tests)
  • test(cmd): TestFlushCookieDb_* covers the sqlite WAL checkpoint + freshness signal (cookies-missing, fresh, stale variants)
  • test(internal/runner): TestImageMetadataFromInspect_* (3 cases) + TestImageVersions_Format (4 sub-cases) cover the new label-based metadata path via exported pure helpers (ImageMetadataFromInspectExport, FormatImageVersionUserExport)

DimmKirr added 13 commits May 19, 2026 06:15
…e across `cell login` + chromium + MCP, and RDP-into-cell works on first launch

## `nix profile add` now works inside a running cell

- feat(nixhome/fragments): `04-nix-daemon.sh` defaults `DEVCELL_NIX_DAEMON` to `true` so the daemon spawns at boot, sets `NIX_REMOTE=daemon`, and the previously-gated fixups (chmod 1777 /tmp, setuid sudo, `/nix/var/nix/*` state dirs) now always run — `nix profile add nixpkgs#scc`, `nix shell nixpkgs#hello`, and `nix-env -iA nixpkgs.foo` all succeed from inside any cell out of the box; previously every invocation tripped `could not set permissions on '/nix/var/nix/profiles/per-user' to 755`; opt out with `DEVCELL_NIX_DAEMON=false`
- feat(nixhome/base): bundle `gnutar` — `tar` is back on `$PATH` after the Debian-base slimdown left it missing, so `curl … | tar xz`, `pip install`, and binary-release extraction all work without the `7z` workaround

## Unified browser-state layout (`~/.chrome/<app>/` + `~/.playwright/`)

- feat(cmd/chrome): `cell login <url>` writes Chromium profile to `~/.devcell/<session>/.chrome/<app>/` and Playwright cookie snapshot to `~/.devcell/<session>/.playwright/storage-state.json`; fingerprint metadata moves to `~/.devcell/<session>/.playwright/fingerprint.json` — one canonical location for browser auth state instead of three scattered per-app dirs (`.chrome/<app>/`, `.chrome-<app>/`, `.playwright-<app>/`) that drifted out of sync
- feat(nixhome/scraping): patchright-mcp wrapper now reads `$HOME/.playwright/storage-state.json` and falls back to `$HOME/.chrome/${APP_NAME}`; `chromium` wrapper also opens `$HOME/.chrome/${APP_NAME}` — interactive chromium and MCP automation share one cookie jar so a `cell login` session is immediately visible to MCP without going through the JSON snapshot
- feat(nixhome/fragments): `20-homedir.sh` exports `CHROMIUM_PROFILE_PATH` and `PLAYWRIGHT_MCP_USER_DATA_DIR` both pointing at `$HOME/.chrome/${APP_NAME}` — eliminates the dash-vs-slash mismatch where cell login wrote to one path and the in-container chromium read from another
- feat(nixhome/fragments): `22-chromium-singleton.sh` (new) sweeps stale `SingletonLock` / `SingletonCookie` / `SingletonSocket` files from all `~/.chrome/*/` profile dirs at every container start — no more `Failed to acquire SingletonLock` errors or hung launches after `docker kill` / OOM / container restart; PIDs from prior container generations are dead by namespace isolation so the sweep is unconditionally safe
- feat(cmd/chrome): after `cell login`, SIGTERM patchright in every sibling running cell that shares the same cell-home bind mount — Claude's MCP client respawns it on the next browser tool call against the freshly written storage-state.json, so sibling cells stop running with the pre-login cookie jar still in their long-lived `BrowserContext`
- feat(cmd/chrome): force a `PRAGMA wal_checkpoint(TRUNCATE)` on the Cookies sqlite db between Phase 1 exit and Phase 2 launch, and warn `⚠ Cookies database wasn't modified during this session — did you actually log in?` when its mtime predates the login session start — surfaces silent "closed the window without authenticating" without aborting extraction

## RDP / Chromium GUI works on first launch

- fix(nixhome/desktop): `fonts.fontconfig.defaultFonts` populated for serif / sans-serif / monospace / emoji — CSS `font-family: sans-serif` resolves to a real face (IBM Plex Sans / Cascadia Code / Noto Color Emoji) instead of an empty alias chain, so chromium screenshots and Playwright renders stop shipping blank text
- fix(nixhome/image): chromium-via-playwright/MCP renders text — `homeRoot` stages `/opt/devcell/.nix-profile → home-manager-path` so fontconfig's `<dir>$HOME/.nix-profile/share/fonts</dir>` resolves; bridges `/etc/fonts/conf.d → /opt/devcell/.config/fontconfig/conf.d` so pkgs.fontconfig's default `<include>` resolves; fc-list goes from 1 face (DejaVu only) to 5,971
- fix(nixhome/image): OCI image `config.Env` bakes `FONTCONFIG_FILE`, `FONTCONFIG_PATH`, `LIBGL_*`, `GALLIUM_DRIVER`, `VK_ICD_FILENAMES` — chromium spawned by the MCP server (not a user shell) inherits font + GPU-software-rendering setup that previously only existed in the shell rc fragment
- fix(nixhome/fragments): `06-nix-ldpath.sh` + `05-shell-rc.sh` skip the Debian-era `LD_LIBRARY_PATH` closure injection when `/etc/devcell/.image-built-with-nix2container` is present — pure-image binaries with correct `DT_RUNPATH` stop having different library versions injected ahead of their RPATH; uv stops crashing with `undefined symbol: _rjem_malloc` (jemalloc mismatch) and x11vnc stops crashing with `GLIBC_2.42 not found` (libgpg-error mismatch)
- fix(nixhome/fragments): `50-gui.sh` wraps every gosu service launch with `env -u LD_LIBRARY_PATH -u _DEVCELL_LD_SET` — Xvfb / fluxbox / x11vnc / feh / xrdp / xsetroot all start cleanly on pure images even when an inherited closure path lingers from a parent shell; RDP into the cell reaches the real desktop instead of failing the vnc-any backend

## `cell build` correctness

- fix(cmd/root): `runner.Stack` / `runner.Modules` / `runner.PerSessionImage` resolution moved to `PersistentPreRun` so every subcommand picks up the project's `[cell].stack` before its `RunE` fires — `cell build` from a project with `stack = "ultimate"` tags `devcell-user:ultimate-pure` instead of always `devcell-user:base-pure` regardless of which stack actually built
- fix(nixhome/image): system + user `nix.conf` ship `build-users-group =` (empty) — in-container `cell build` / `home-manager switch` no longer error `the group 'nixbld' specified in 'build-users-group' does not exist` (the pure image is single-user by design and doesn't stage nixbld1..10 accounts)

## Build provenance (`docker inspect` shows real date + rev)

- feat(nixhome/image): OCI image manifest carries `org.opencontainers.image.created` / `.revision` labels and a real `Created` field via nix2container's `created` parameter, threaded in from `DEVCELL_BUILD_DATE` / `DEVCELL_BUILD_REV` env read with `builtins.getEnv` under `--impure` — `docker inspect <image>` shows the real wall-clock build time and git rev instead of `0001-01-01` / `unknown`
- fix(nixhome/image): `metadataJson` derivation no longer interpolates per-build timestamp/rev into `/etc/devcell/metadata.json` — eliminates the ~3.9 GB customization-layer re-push that happened on every `cell build` even when no source changed; per-build provenance now lives only in the tiny OCI manifest blob, layer SHAs stay content-stable
- feat(internal/runner): `cell <agent>` injects `-e DEVCELL_BUILD_DATE=<value>` / `-e DEVCELL_BUILD_REV=<value>` by reading OCI labels via a single `docker image inspect` (replaces the old `docker run --rm cat /etc/devcell/metadata.json` that spawned a throwaway container) — startup faster (~ms vs container spawn) and the entrypoint's "User image: …" boot log shows real timestamps
- feat(internal/runner/pure_build): `nix build` argv gains `--impure` and `cmd.Env` carries `DEVCELL_BUILD_DATE` (RFC3339 now) + `DEVCELL_BUILD_REV` (resolved via `git rev-parse HEAD` with `-dirty` suffix when worktree unclean, env override wins) — the OCI labels above actually get populated at build time
- chore(nixhome/entrypoint): boot log "User image: …" prefers runner-injected `DEVCELL_BUILD_DATE` / `DEVCELL_BUILD_REV` env over the placeholder JSON fields — users see real `<commit> built <date> (tag: …)` instead of `nix2container 1970-01-01T00:00:00Z`

## Tests

- test(cmd): `TestPersistentPreRun_SetsRunnerStackFromConfig` + `TestPersistentPreRun_NoConfig_LeavesDefaults` pin the `cell build` stack-tag fix
- test(cmd): `TestSavePlaywrightFingerprint_*` (2 cases) + `TestReadPlaywrightFingerprint_ReadsFromPlaywrightSubdir` + `TestChromePaths_PlaywrightSubdir` pin the new fingerprint / storage-state path layout + auto-mkdir behaviour
- test(cmd): `TestKickMcpInCellsSharingCellHome_*` covers the post-login sibling-MCP kick with fake docker plumbing (no docker shell-out from tests)
- test(cmd): `TestFlushCookieDb_*` covers the sqlite WAL checkpoint + freshness signal (cookies-missing, fresh, stale variants)
- test(internal/runner): `TestImageMetadataFromInspect_*` (3 cases) + `TestImageVersions_Format` (4 sub-cases) cover the new label-based metadata path via exported pure helpers (`ImageMetadataFromInspectExport`, `FormatImageVersionUserExport`)
…s (no-PAM build); `--debian` → `--impure` (back-compat alias retained); `[cell].hostname` overrides container hostname; Atlassian MCP server bundled; Taskfile refactored to variant-first taxonomy

- fix(ci): `.github/workflows/build.{dev,release}.yml` add `oci-mediatypes=true` to the bake output spec — `skopeo copy` no longer aborts with `unsupported docker v2s2 media type: application/vnd.docker.image.rootfs.diff.tar.zstd`; CI publishes to GHCR + mirror to ECR Public are green again
- fix(nixhome/image): sudo built with `withPam = false` — `sudo` inside cells stops aborting with `unable to initialize PAM: Critical error - immediate abort`; `task nix:validate`, security.nix wordlist symlinks, and every home-manager activation that shells out via sudo now succeed without PAM stack setup
- fix(nixhome/image): symlinks `${pkgs.coreutils}/bin/env` → `/usr/bin/env` — `#!/usr/bin/env <interp>` shebangs in third-party scripts (Claude Code plugin hooks at `~/.claude/plugins/**/*.py`, etc.) execute without `bad interpreter: No such file or directory`
- feat(cmd): `--impure` is the new canonical flag for the legacy Dockerfile build path (DIMM-213 vocab rename); `--debian` is retained as a deprecated alias that strips from forwarded args and routes to the same code path — `cell claude --impure`, `cell build --impure` work; existing `--debian` invocations keep working for one release
- feat(internal/runner): `PickImageTag(impure bool)` parameter renamed from `debian`; `StackImageTagImpure()` added (returns `<reg>:v<ver>-<stack>-impure`); `StackImageTagDebian()` retained as a deprecated alias that forwards to Impure — registry-side tag scheme now reads `-impure` instead of `-debian`, paving the way for full `--debian` removal next release
- feat(internal/cfg): new `[cell].hostname` TOML key + `DEVCELL_HOSTNAME` env override via `ResolvedHostname(computed)` — operators can pin a stable container hostname instead of the default `cell-<basename>-<cellID>` computed value (useful for hostname-based service discovery, logs grep, certificate SANs)
- feat(nixhome/project-management): bundle Atlassian remote HTTP MCP server (`https://mcp.atlassian.com/v1/mcp/authv2`) for Jira / Confluence / Rovo — `cell claude` users can query and update Atlassian issues via OAuth 2.1 (3LO) on first use; opencode/codex/gemini still need an `npx -y mcp-remote …` stdio wrap (filed as follow-up)
- refactor(taskfile): variant-first, noun-first taxonomy with aggregates over both build paths — `image:impure:*` (Dockerfile/bake) and `image:pure:*` (nix2container) become symmetric subtrees; `image:{build,push}` aggregate both; new variant-agnostic `image:mirror` (cross-registry copy) and `image:manifest` (multi-arch stitch); back-compat aliases preserved for every renamed task (`image:build:base/ultimate/pure`, `bake:local`, `image:dev`, `image:push:debian`, `swag:generate`, `web:docs`, `nix:validate`, `test`, `install`)
- refactor(taskfile): `cell:build` now writes `./bin/cell` instead of `~/.local/bin/cell`; `cell:install` cp-copies `./bin/cell` → `~/.local/bin/` (was a no-op `mkdir` + echo) — separates "build" from "install"; existing `task install` muscle memory still works
- refactor(taskfile): deletes `image:build:user-local:dev` + `:force` (duplicate `image:dev` alias was a silent foot-gun; hardcoded `core-local` env predated DIMM-203 tag scheme)
- test(cmd): `TestStripCellFlags_ImpureBoolFlagStripped` (new) pins `--impure` strip; `TestStripCellFlags_DebianAliasStillStripped` (renamed from `_DebianBoolFlagStripped`) pins back-compat alias strip; `TestPickImageTag_FlippedDirection` updated to assert `impure` vocabulary in error messages
- test(internal/runner): `TestStackImageTagImpure_*` (2 cases, was `_Debian_*`) pin the new `-impure` registry suffix; `TestStackImageTagDebian_AliasForwardsToImpure` + `TestStackImageTagDebian_DeprecatedAliasReturnsImpureSuffix` pin the deprecated alias contract; `TestPickImageTag_ImpureTrueReturnsBareTag` (renamed from `_DebianTrueReturnsBareTag`) pins the bare-tag direction
- test(internal/cfg): `TestResolvedHostname_*` (3 cases) pin the env > toml > computed precedence; `TestLoadFile_HostnameTOMLKey` pins the TOML decode path
…file build with `go: go.mod requires go >= 1.25.0 (running go 1.24.13)`

- fix(images): bump `images/Dockerfile` builder stage from `golang:1.24-alpine` → `golang:1.25-alpine` — `go.mod` declared `go 1.25.0` so the 1.24 base image's `GOTOOLCHAIN=local` (set by alpine's `go` package) refused to auto-fetch a newer toolchain and aborted with the version-mismatch error; the impure CI pipeline (`docker-build` matrix → `task image:impure:push`) now compiles the cell binary cleanly on both arches
…every dev push and tagged release — `cell claude` users no longer need a local `cell build` after a fresh clone

- feat(ci): new `pure-build` matrix job in `build.dev.yml` and `build.release.yml` (amd64×base, amd64×ultimate, arm64×base, arm64×ultimate) — runs in parallel with `docker-build` (both depend only on `secrets`); each leg installs Nix via `DeterminateSystems/nix-installer-action`, then calls `task image:pure:push:<stack>` which uses nix2container's `.copyTo` to push OCI-compliant images (zstd layers under OCI manifest — no v2s2 mismatch); each leg then calls `task image:mirror` to copy GHCR → ECR Public
- feat(ci): new `pure-manifest` job stitches the per-arch tags (`v0.0.0-amd64-{base,ultimate}-pure`, `v0.0.0-arm64-{base,ultimate}-pure`) into multi-arch manifests (`v0.0.0-{base,ultimate}-pure`, `latest-…`, `dev-…` for dev / version-tagged for release) on both GHCR and ECR Public via `task image:manifest` — `docker pull public.ecr.aws/w1l3v2k8/devcell:v0.0.0-ultimate-pure` resolves to the correct arch automatically
- feat(ci): caches `/nix/store` between runs via `nix-community/cache-nix-action@v6` with primary key `nix-<arch>-<stack>-<hash(flake.lock + nixhome/**/*.nix)>` and partial-restore fallback `nix-<arch>-<stack>-` — cold first run takes ~30-45 min (full substitution from `cache.nixos.org`); subsequent runs hit the cache and drop to ~5-10 min depending on what changed; cache is GC-capped at 5 GB and purged after 7 days so GHA cache storage stays bounded
- refactor(taskfile): `image:pure:build:stack` + `image:pure:push:stack` gain `SUDO` and `NIX` Taskfile vars (defaults: empty + `nix` on PATH) — CI overrides `SUDO=""` `NIX=nix` to use single-user nix without sudo; devcell containers with daemon off can still pass `SUDO=sudo` to invoke the multi-user store; previously these tasks hardcoded `sudo /opt/devcell/.local/state/nix/profiles/profile/bin/nix` which only worked inside the devcell container
- fix(nixhome/fragments): hoist dangling-symlink cleanup before the
  baked-installs loop so pure images (no /opt/mise) still remove
  stale symlinks from prior impure generations — was the root cause
  of terraform/opentofu shims silently absent on fresh pure cells
- fix(nixhome/fragments): gate cross-bind loop on `[ -d "$baked/installs" ]`
  so pure images skip the impure-only symlinking step cleanly
- fix(nixhome/fragments): reshim failures now log a warning instead of
  being silenced with `2>/dev/null || true` — surfaces future breakage
- fix(images/Dockerfile): add `mkdir + ln + mise reshim` with
  `MISE_DATA_DIR=/opt/devcell/.local/share/mise` for all four
  profile stages — bakes level-2 shim dir into image at build time
- fix(nixhome/image): stage `/etc/devcell/tool-versions` from
  `homeConfig.config.devcell.mise.tools` at nix2container build time
  — activation scripts don't run in pure builds, so the file was
  absent and `mise install -y` never ran at boot
- feat(nixhome/modules): add two-level `home.sessionPath` for mise
  shims — user `~/.local/share/mise/shims` (L1) before image-baked
  `/opt/devcell/.local/share/mise/shims` (L2) so user installs win
- chore(taskfile): document `--retry-times 5 --retry-delay 5s` for
  skopeo copyTo — GHCR 504s on large blobs no longer abort the push
- test(mise): new `test/mise_test.go` — L1 wiring checks (Dockerfile
  reshim step, sessionPath order, entrypoint cleanup, image.nix
  staging) + L2 container checks (declared tools on PATH,
  baked shim dir, terraform/tofu regression pins, PATH precedence)
- fix(nixhome/image): stage permissive /etc/pam.d/sudo pointing at
  pam_permit.so — sudoers.so policy plugin's pam_start succeeds;
  pre-DIMM-216 every `sudo` call died with "unable to initialize
  PAM: Critical error - immediate abort" before any policy check ran
- fix(nixhome/fragments): detect empty tool-install dirs and
  invalidate .tv-global.sha / .tv-workspace.sha so `mise install -y`
  re-runs after a manual `rm -rf installs/<tool>/*` — previously
  sha matched and declared tools stayed missing from PATH (DIMM-215)
- test(sudo): new sudo_test.go — L1 pins PAM stub presence and all
  four phases; L2 container checks `sudo whoami` returns root and no
  PAM error string
- test(mise): TestMise_EntrypointInvalidatesShaOnStaleState pins sha
  invalidation order and conditionality (DIMM-215)
…ll build --debug` reports image identity/age/cache stats, and macOS Sequoia stops SIGKILLing `cell` after every rebuild

- fix(nixhome/packages): stage `/etc/devcell/tool-versions` and `/etc/pam.d/sudo` in `packages/image.nix` (the file `flake.nix:241` actually imports) — pure cells now have `sudo whoami → root`, and declared mise tools (terraform/tofu/node/go) install on first boot. Three prior commits (`e6acce0` sudo no-PAM, `5cefc1f` DIMM-214 tool-versions, `4e200c0` DIMM-215+216) landed in an orphaned top-level `nixhome/image.nix`; no built image ever carried those fixes until now.
- fix(nixhome/packages): `/etc/sudoers` adds `env_keep += "SSL_CERT_FILE NIX_SSL_CERT_FILE NIX_PATH NIX_CONFIG NIX_REMOTE NIX_USER_CONF_FILES LOCALE_ARCHIVE"` — `sudo nix profile add nixpkgs#htop` now works without `--preserve-env=` flags; the SSL-cert path survives `env_reset`.
- chore(nixhome): delete orphaned top-level `nixhome/image.nix` — no file in the flake referenced it; deleting eliminates the "two `image.nix` files with parallel structure" footgun that hid three phantom-fix commits.
- feat(cmd/build): `cell build --debug` appends a post-build summary line — image tag, ID (short), OCI created timestamp, size, total layer count, and new-vs-cached blob counts parsed from skopeo's per-blob log. Answers "did this rebuild actually produce new content, or was it an all-cache reuse?" without a separate `docker inspect` call.
- feat(internal/runner): `LayerCounter` (`io.Writer` decorator) tallies skopeo's `Copying blob ... done|skipped` lines, deduping by blob ID across the two skopeo passes (nix→registry, registry→daemon) so a single new layer isn't double-counted — feeds the `--debug` summary.
- feat(internal/runner): `InspectImageDebug` + `parseImageInspectDebug` extract image metadata in one `docker image inspect` call; handles both array and single-object output shapes.
- fix(infra): `task cell:build` and `task cell:install` run `xattr -c` + `codesign --force --sign -` on macOS hosts — eliminates the macOS Sequoia (15.x) launch-constraint SIGKILL (`zsh: killed cell --version`, exit 137) that fires after every rebuild of an ad-hoc-signed Go binary; uname-gated, no-op on Linux.
- test(test): new `TestImageNix_CanonicalPathReferencedByFlake` guard asserts `nixhome/flake.nix` imports `./packages/image.nix` — locks the canonical image.nix path so future L1 file-content tests can't silently pass against an orphan; the test header carries a post-mortem comment describing the 3-commit phantom-fix saga.
- test(test): repoint `TestMise_ImageNixStagesToolVersions`, `TestSudo_ImageNixStagesPamStub`, `TestSudo_PamStubCoversAllPamPhases` from the orphan `image.nix` to `packages/image.nix` — the tests were passing against unused content.
- test(test): new `TestSudo_SudoersPreservesNixEnv` (L1 pins the `env_keep` line and required vars) and `TestSudo_PreservesNixEnv` (L2 runs `sudo env` in a real container and asserts the vars survived) — block regressions on env propagation across sudo.
- test(test): new `sudo_integration_test.go::TestSudo_NixInstallHtop` behind `-tags=integration` — exercises the full `sudo nix profile add nixpkgs#htop` round-trip end-to-end (network-dependent, ~30s); opted out of default runs.
- test(internal/runner): `build_debug_test.go` covers `LayerCounter` parsing (done/skipped classification, dedup across skopeo passes, partial-line buffering) and `parseImageInspectDebug` for both docker-inspect JSON shapes.
- chore(deps): transitive `go.sum` updates from `go mod tidy` — no `go.mod` changes, no new top-level dependencies.
…in every project root it's launched from — scratch SVGs now land under `.devcell/inkscape/` scoped to the current project

- fix(graphics): set `INKS_WORKSPACE=.devcell/inkscape` in the `inkscape-mcp` env block at `nixhome/modules/graphics.nix:50` — upstream's `config.py` runs `Path(os.getenv("INKS_WORKSPACE", "inkspace")).resolve().mkdir(parents=True, exist_ok=True)` at module import (typo'd `"inkspace"`, relative to cwd), so every MCP spawn pre-created a stray top-level directory; the override redirects scratch artifacts into the project-local `.devcell/` tree where other cell-managed state already lives.
…tes inside cells; `sudo nix profile add` no longer drops SSL/cert env on impure cells; nix-declared MCP servers (inkscape-mcp et al.) reach claude/codex/opencode/gemini in pure cells

- feat(nix-ld): `mise install` of `node@26`, `go@1.26`, `terraform`, and any precompiled tarball now runs the resulting binary in pure (nix2container) AND impure (Debian) cells — the kernel can resolve the binary's hardcoded `/lib/ld-linux-<arch>.so.<n>` interpreter via the nix-ld shim, and the shim resolves shared libs from a curated nix-closure dir. Pre-fix `mise install` extracted node successfully then died on the verify step with `node -v ... No such file or directory (os error 2)`.
- fix(sudoers): `sudo` in impure (Debian-base) cells now preserves `SSL_CERT_FILE` / `NIX_SSL_CERT_FILE` / `NIX_PATH` / `LOCALE_ARCHIVE` across the privilege boundary — `sudo nix profile add nixpkgs#htop` no longer fails with "SSL peer certificate ... was not OK" against `cache.nixos.org`.
- fix(image): pure (nix2container) images now stage each LLM agent's nix-managed MCP config to `/etc/claude-code/nix-mcp-servers.json`, `/etc/codex/nix-mcp-servers.toml`, `/etc/opencode/nix-mcp-servers.json` (+ `nix-providers.json`), `/etc/gemini/nix-mcp-servers.json` at image build — pre-fix the per-agent `home.activation.setupManaged*` script wrote those files via `sudo cp` at home-manager switch, which the pure path skips entirely, so `fragments/30-*.sh` short-circuited at "no nix file" and `~/.claude.json` (etc.) never received nix-declared MCP servers like inkscape-mcp.
- fix(graphics): `inkscape-mcp`'s `INKS_WORKSPACE` override now resolves to `.devcell/inkspace/` (matching upstream's literal `inkspace` naming, kept under the project's `.devcell/` tree) — supersedes the prior `.devcell/inkscape` value from `e5b13b2`. Net effect after rebuild: no more stray `inkspace/` directories at project roots; scratch SVGs land in `.devcell/inkspace/` per project.
- refactor(nix-ld): switched non-nix binary library bootstrap from `LD_LIBRARY_PATH` to `NIX_LD_LIBRARY_PATH` — the closure libs are now visible only to non-nix binaries via the nix-ld shim; nix-built tools (`gpg`, `uv`, `x11vnc`, `curl`, …) follow their RPATH untouched. Fixes the long-running collisions where `gpg` failed with `GLIBC_2.42 not found (libgpg-error-1.59)`, `uv` with `undefined symbol: _rjem_malloc`, and `x11vnc` with the same glibc skew.
- fix(image): `NIX_LD_LIBRARY_PATH` is now a single `/opt/devcell/.nix-ld-libs` directory with one symlink per `.so*` from the closure, not a 25 KB colon-separated path list — basic commands (`grep`, `sleep`, `mkdir`, `gosu`) no longer crash with "Argument list too long" once the var propagates through a few fork/exec generations and the total environment crosses the kernel's `ARG_MAX` (~2 MB).
- fix(image): `/usr/bin/env` shipped in pure cells via `pkgs.dockerTools.usrBinEnv` — Claude Code hook plugins, PyPI/npm CLIs, and any script with the canonical `#!/usr/bin/env <interp>` shebang stop failing with `bad interpreter: No such file or directory`.
- chore(image): `nix-ld` package + arch-conditional `/lib/ld-linux-<arch>.so.<n>` interpreter symlink baked into pure-image `homeRoot`; `NIX_LD` + `NIX_LD_LIBRARY_PATH` baked into OCI `Env` (computed via IFD over a closure-walk side-derivation); stable `~/.nix-ld-loader` and `~/.nix-ld-shim` `home.file` symlinks for the impure path to point `ENV NIX_LD` at without a `/nix/store` hash.
- chore(llm): each LLM module (claude.nix, codex.nix, opencode.nix, gemini.nix) now exposes its generated MCP-config derivation as a read-only option (`devcell.managed<Agent>.nixMcpConfigFile`, plus `nixProvidersFile` for opencode) — image.nix consumes those options for pure-image staging instead of duplicating the per-agent JSON/TOML transformer logic.
- test(image): 3 new L1 file-content tests in `test/managed_mcp_staging_test.go` (`TestImageNix_StagesAllAgentMcpConfigs`, `TestImageNix_StagesMcpConfigsFromExposedOptions`, `TestGraphicsNix_InkscapeWorkspaceUnderDevcell`) pin the four `/etc/<agent>/nix-mcp-servers.*` staging targets, the read-only-option consumption pattern, and the `.devcell/`-prefixed `INKS_WORKSPACE` value.
- test(nix-ld): 5 new L1 file-content tests in `test/mise_node_install_test.go` pin the contract — `pkgs.nix-ld` in `nixhome/modules/base.nix`, interpreter symlink staging in `image.nix`, `NIX_LD=`/`NIX_LD_LIBRARY_PATH=` in OCI Env, `ENV NIX_LD` in impure Dockerfile, fragment migrated from `LD_LIBRARY_PATH` to `NIX_LD_LIBRARY_PATH`.
- test(sudo): 2 new L1 tests in `test/sudo_test.go` (`TestSudo_DockerfileStagesEnvKeep`, `TestSudo_DockerfileSetsNixSSLEnv`) pin the impure Dockerfile's `env_keep` line and `ENV SSL_CERT_FILE`/`NIX_SSL_CERT_FILE`/`LOCALE_ARCHIVE` directives — guards parity with the pure-image sudoers config so `TestSudo_PreservesNixEnv` (L2) can rely on the env being inherited by docker exec sessions.
- test(image): 1 new L1 test in `test/usr_bin_env_test.go` (`TestImageNix_StagesUsrBinEnv`) pins `pkgs.dockerTools.usrBinEnv` in pure-image `copyToRoot` so the `/usr/bin/env` shim can't be silently dropped on a future cleanup.
- test(env): `TestEnv_NixLdLibraryPath` renamed to `TestEnv_NixLdLibs` and rewritten to assert `/opt/devcell/.nix-ld-libs/` is a directory with the expected GUI library symlinks (gtk-3, cairo, pango, nss, nspr, asound, dbus, xkbcommon); `TestEnv_LdLibraryPathSession` renamed to `TestEnv_NixLdLibraryPathSession` and updated to assert the session env carries `NIX_LD_LIBRARY_PATH=/opt/devcell/.nix-ld-libs` instead of the old colon-list `LD_LIBRARY_PATH`.
- test(tests): added `readImagesDockerfile` helper alongside `readNixhomeFile` in `test/mise_test.go` — shared `images/Dockerfile` reader for L1 file-content tests across `sudo_test.go`, `mise_node_install_test.go`, and `managed_mcp_staging_test.go`.
- chore(repo): `.node-version` (single line `26`) staged at repo root — a test repro artifact from in-cell `mise install` debugging. Recommend `git restore --staged .node-version && rm .node-version` before committing; devcell's own tooling versions live in `nixhome/modules/mise.nix` via `devcell.mise.tools`, not in a project-level `.node-version`.
…kouts so CI no longer fails on a missing `docs` package import

- fix(tests): TestCell_Shell now runs `swag init -g cmd/serve.go -o docs --parseDependency --parseInternal` before `go build ./cmd` — `go test ./test/...` on a fresh clone (CI runners, contributor onboarding) no longer fails with `cmd/serve.go:11:2: no required module provides package github.com/DimmKirr/devcell/docs`. `docs/` stays gitignored; the test bypasses `task cell:build` (which already wires `deps: [swagger:generate]`), so it now self-bootstraps the swag-generated package the same way Task does.
…, python 3.13.2) now reach the session-user PATH via `bash -lc`, and the runtime test contract aligns with what the image actually ships (drops orphan `/opt/{npm,python}-tools` and the legacy `user-image-version` stamp)

- fix(entrypoint): `05-shell-rc.sh` always creates `~/.profile` and `~/.bashrc` for the session user, even when `/opt/devcell/` only has `.zshrc`/`.zshenv` (home-manager doesn't enable `programs.bash`) — `bash -lc`, `mise activate`, IDE exec hooks, and `sudo -i` now read a rc file with the full PATH instead of falling back to system default, so declared mise tools (go, terraform, opentofu, node) stop silently disappearing on every cell launch.
- fix(entrypoint): `05-shell-rc.sh` PATH adds `/opt/devcell/.local/share/mise/shims` (level-2 baked dir) after `$HOME/.local/share/mise/shims` — declared tools stay on PATH even when the user shim dir is empty (fresh cell, or after `mise reshim` silently failed), and user installs still win on conflict because level-1 precedes level-2.
- fix(entrypoint): `05-shell-rc.sh` drops dead PATH entries `/opt/python-tools/.venv/bin` and `/opt/npm-tools/node_modules/.bin` left over from the pre-nix patchright/codex install path — these directories haven't been created by either image variant since DIMM-94/DIMM-96 moved those tools to nix, so the references only polluted PATH with non-existent paths.
- feat(python): enable `devcell.mise.tools.python = "3.13.2"` in `python.nix` — `python` (mise shim, latest 3.13) is now available alongside nix-provided `python3` (fallback); projects with a `.tool-versions` or `.python-version` get the right interpreter via mise without manual install.
- test(mise): `declaredMiseTools` parser regex now anchors to line start (`(?m)^\s*…`) and skips commented declarations — the previous regex matched `# devcell.mise.tools.python = …` and made `TestMise_TwoLevelShims_BakedDirExists/python` falsely demand a python shim before python was actually enabled.
- test(mise): new `TestMise_ShellRcHasBakedShimDir`, `TestMise_ShellRc_UserShimsBeforeBaked`, and `TestMise_DeclaredMiseToolsParser_SkipsCommented` lock in the spec for level-1/level-2 PATH order in the shell-rc fragment and the parser regex shape — future edits that drop the baked dir or break the regex fail at L1.
- test(runtime): `TestEnv_BasePermissions` no longer asserts `/opt/npm-tools` or `/opt/python-tools` exist (the image doesn't create them anymore, and the PATH references that demanded them are now gone in this same commit) — failing assertions that test against ghost dirs become passing assertions against the real contract.
- test(runtime): `TestEnv_ImageVersionStamps` reads `/etc/devcell/metadata.json` (canonical per DIMM-84) instead of the legacy `/etc/devcell/{base,user}-image-version` files — `user-image-version` was never written by any build path, `base-image-version` is impure-only, and `metadata.json` is staged by both pure and impure variants, so the test now reflects what the image actually ships.
…al chrome.runtime / WebGL fingerprint on arm64 (Mesa/llvmpipe), and `go test ./test/...` can now route the whole suite to either the pure or impure image with `DEVCELL_TEST_VARIANT={pure,impure}`

- fix(scraping): `nixhome/modules/scraping/default.nix` installs `window.chrome` via `Object.defineProperty(writable:false, configurable:false)` with a try/catch fallback to plain assignment — arm64 detection-suite tests that saw `window.chrome.runtime missing` after the init script ran (Chromium late-injecting its own window.chrome over our mock) now keep the runtime mock through page load.
- fix(scraping): stealth init.js wraps `HTMLCanvasElement.prototype.getContext` to also Object.defineProperty `getParameter` on the returned WebGL/WebGL2 context instance — covers the arm64 Mesa/llvmpipe case where the prototype-level `WebGLRenderingContext.prototype.getParameter` patch was shadowed by an own-property on the context, leaking `"Google Inc. (Mesa)"` / `"ANGLE (Mesa, llvmpipe…)"` instead of the `'Intel Inc.'` / `'Intel Iris OpenGL Engine'` spoof. Prototype patch stays in place as the primary path; instance patch is belt-and-suspenders.
- feat(tests): `test/helpers_test.go` now reads `DEVCELL_TEST_VARIANT` (`pure` / `impure`, default `impure`) and resolves the image tag accordingly — running `DEVCELL_TEST_VARIANT=pure go test ./test/...` routes every test in the suite to `devcell-user:ultimate-pure` (or `DEVCELL_TEST_PURE_IMAGE` override), so pure-image regressions stop shipping undetected. Back-compat: default behavior is unchanged for current CI.
- feat(tests): new `pureImage(t)` and `impureImage(t)` accessors in `helpers_test.go` — tests asserting variant-specific behavior (e.g. base-image-version stamp present only on impure) can opt into a specific variant and `t.Skip` gracefully when the needed local image isn't loaded, instead of failing or silently testing the wrong variant.
- refactor(tests): `image()` in `helpers_test.go` now auto-discovers locally-built tags (`devcell-user:ultimate-pure`, `ghcr.io/dimmkirr/devcell:ultimate-local`) before falling back to a scratch bake — no more manually exporting `DEVCELL_TEST_IMAGE` after every `task image:*:build`, the local iteration loop just works.
- test(scraping): new `test/stealth_test.go` pins the two defensive patterns (chrome.runtime via Object.defineProperty + WebGL instance patch via HTMLCanvasElement.prototype.getContext) so reverts to plain assignment or prototype-only patching fail at L1.
- test(tests): new `test/variant_test.go` table-tests the variant selection logic across 8 cases (env override, local tag present/absent, unknown variant, empty-as-impure back-compat) — no docker dependency, runs in milliseconds.
… hosts — warm boots no longer pay for `mise reshim` and recursive chowns over the persistent `~/.local/share/mise` (~17k entries) when `~/.tool-versions` is unchanged

- fix(entrypoint): move `mise reshim` and the two `chown -R` invocations inside the existing `.tv-global.sha` gate in `nixhome/modules/fragments/10-mise.sh` — warm cell launches skip the recursive walks the sha already proved redundant, cutting steady-state boot time by ~45s on macOS bind mounts; cold/changed boots still pay the full cost once and update the sha
- test(mise): add two L1 structural tests (`TestMise_ReshimGatedByShaOnWarmBoot`, `TestMise_ChownsGatedByShaOnWarmBoot`) asserting reshim and both chowns sit after the sha-gate opens — guards against regressions that re-introduce the per-boot walk
- chore(desktop): add `vollkorn` serif font to `nixhome/modules/desktop/default.nix` — extra body-text option available in desktop-enabled cells
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant