nix profile add works inside cells, browser state shares one profile across cell login + chromium + MCP, and RDP-into-cell works on first launch#37
Open
DimmKirr wants to merge 13 commits into
Open
Conversation
…e across `cell login` + chromium + MCP, and RDP-into-cell works on first launch
## `nix profile add` now works inside a running cell
- feat(nixhome/fragments): `04-nix-daemon.sh` defaults `DEVCELL_NIX_DAEMON` to `true` so the daemon spawns at boot, sets `NIX_REMOTE=daemon`, and the previously-gated fixups (chmod 1777 /tmp, setuid sudo, `/nix/var/nix/*` state dirs) now always run — `nix profile add nixpkgs#scc`, `nix shell nixpkgs#hello`, and `nix-env -iA nixpkgs.foo` all succeed from inside any cell out of the box; previously every invocation tripped `could not set permissions on '/nix/var/nix/profiles/per-user' to 755`; opt out with `DEVCELL_NIX_DAEMON=false`
- feat(nixhome/base): bundle `gnutar` — `tar` is back on `$PATH` after the Debian-base slimdown left it missing, so `curl … | tar xz`, `pip install`, and binary-release extraction all work without the `7z` workaround
## Unified browser-state layout (`~/.chrome/<app>/` + `~/.playwright/`)
- feat(cmd/chrome): `cell login <url>` writes Chromium profile to `~/.devcell/<session>/.chrome/<app>/` and Playwright cookie snapshot to `~/.devcell/<session>/.playwright/storage-state.json`; fingerprint metadata moves to `~/.devcell/<session>/.playwright/fingerprint.json` — one canonical location for browser auth state instead of three scattered per-app dirs (`.chrome/<app>/`, `.chrome-<app>/`, `.playwright-<app>/`) that drifted out of sync
- feat(nixhome/scraping): patchright-mcp wrapper now reads `$HOME/.playwright/storage-state.json` and falls back to `$HOME/.chrome/${APP_NAME}`; `chromium` wrapper also opens `$HOME/.chrome/${APP_NAME}` — interactive chromium and MCP automation share one cookie jar so a `cell login` session is immediately visible to MCP without going through the JSON snapshot
- feat(nixhome/fragments): `20-homedir.sh` exports `CHROMIUM_PROFILE_PATH` and `PLAYWRIGHT_MCP_USER_DATA_DIR` both pointing at `$HOME/.chrome/${APP_NAME}` — eliminates the dash-vs-slash mismatch where cell login wrote to one path and the in-container chromium read from another
- feat(nixhome/fragments): `22-chromium-singleton.sh` (new) sweeps stale `SingletonLock` / `SingletonCookie` / `SingletonSocket` files from all `~/.chrome/*/` profile dirs at every container start — no more `Failed to acquire SingletonLock` errors or hung launches after `docker kill` / OOM / container restart; PIDs from prior container generations are dead by namespace isolation so the sweep is unconditionally safe
- feat(cmd/chrome): after `cell login`, SIGTERM patchright in every sibling running cell that shares the same cell-home bind mount — Claude's MCP client respawns it on the next browser tool call against the freshly written storage-state.json, so sibling cells stop running with the pre-login cookie jar still in their long-lived `BrowserContext`
- feat(cmd/chrome): force a `PRAGMA wal_checkpoint(TRUNCATE)` on the Cookies sqlite db between Phase 1 exit and Phase 2 launch, and warn `⚠ Cookies database wasn't modified during this session — did you actually log in?` when its mtime predates the login session start — surfaces silent "closed the window without authenticating" without aborting extraction
## RDP / Chromium GUI works on first launch
- fix(nixhome/desktop): `fonts.fontconfig.defaultFonts` populated for serif / sans-serif / monospace / emoji — CSS `font-family: sans-serif` resolves to a real face (IBM Plex Sans / Cascadia Code / Noto Color Emoji) instead of an empty alias chain, so chromium screenshots and Playwright renders stop shipping blank text
- fix(nixhome/image): chromium-via-playwright/MCP renders text — `homeRoot` stages `/opt/devcell/.nix-profile → home-manager-path` so fontconfig's `<dir>$HOME/.nix-profile/share/fonts</dir>` resolves; bridges `/etc/fonts/conf.d → /opt/devcell/.config/fontconfig/conf.d` so pkgs.fontconfig's default `<include>` resolves; fc-list goes from 1 face (DejaVu only) to 5,971
- fix(nixhome/image): OCI image `config.Env` bakes `FONTCONFIG_FILE`, `FONTCONFIG_PATH`, `LIBGL_*`, `GALLIUM_DRIVER`, `VK_ICD_FILENAMES` — chromium spawned by the MCP server (not a user shell) inherits font + GPU-software-rendering setup that previously only existed in the shell rc fragment
- fix(nixhome/fragments): `06-nix-ldpath.sh` + `05-shell-rc.sh` skip the Debian-era `LD_LIBRARY_PATH` closure injection when `/etc/devcell/.image-built-with-nix2container` is present — pure-image binaries with correct `DT_RUNPATH` stop having different library versions injected ahead of their RPATH; uv stops crashing with `undefined symbol: _rjem_malloc` (jemalloc mismatch) and x11vnc stops crashing with `GLIBC_2.42 not found` (libgpg-error mismatch)
- fix(nixhome/fragments): `50-gui.sh` wraps every gosu service launch with `env -u LD_LIBRARY_PATH -u _DEVCELL_LD_SET` — Xvfb / fluxbox / x11vnc / feh / xrdp / xsetroot all start cleanly on pure images even when an inherited closure path lingers from a parent shell; RDP into the cell reaches the real desktop instead of failing the vnc-any backend
## `cell build` correctness
- fix(cmd/root): `runner.Stack` / `runner.Modules` / `runner.PerSessionImage` resolution moved to `PersistentPreRun` so every subcommand picks up the project's `[cell].stack` before its `RunE` fires — `cell build` from a project with `stack = "ultimate"` tags `devcell-user:ultimate-pure` instead of always `devcell-user:base-pure` regardless of which stack actually built
- fix(nixhome/image): system + user `nix.conf` ship `build-users-group =` (empty) — in-container `cell build` / `home-manager switch` no longer error `the group 'nixbld' specified in 'build-users-group' does not exist` (the pure image is single-user by design and doesn't stage nixbld1..10 accounts)
## Build provenance (`docker inspect` shows real date + rev)
- feat(nixhome/image): OCI image manifest carries `org.opencontainers.image.created` / `.revision` labels and a real `Created` field via nix2container's `created` parameter, threaded in from `DEVCELL_BUILD_DATE` / `DEVCELL_BUILD_REV` env read with `builtins.getEnv` under `--impure` — `docker inspect <image>` shows the real wall-clock build time and git rev instead of `0001-01-01` / `unknown`
- fix(nixhome/image): `metadataJson` derivation no longer interpolates per-build timestamp/rev into `/etc/devcell/metadata.json` — eliminates the ~3.9 GB customization-layer re-push that happened on every `cell build` even when no source changed; per-build provenance now lives only in the tiny OCI manifest blob, layer SHAs stay content-stable
- feat(internal/runner): `cell <agent>` injects `-e DEVCELL_BUILD_DATE=<value>` / `-e DEVCELL_BUILD_REV=<value>` by reading OCI labels via a single `docker image inspect` (replaces the old `docker run --rm cat /etc/devcell/metadata.json` that spawned a throwaway container) — startup faster (~ms vs container spawn) and the entrypoint's "User image: …" boot log shows real timestamps
- feat(internal/runner/pure_build): `nix build` argv gains `--impure` and `cmd.Env` carries `DEVCELL_BUILD_DATE` (RFC3339 now) + `DEVCELL_BUILD_REV` (resolved via `git rev-parse HEAD` with `-dirty` suffix when worktree unclean, env override wins) — the OCI labels above actually get populated at build time
- chore(nixhome/entrypoint): boot log "User image: …" prefers runner-injected `DEVCELL_BUILD_DATE` / `DEVCELL_BUILD_REV` env over the placeholder JSON fields — users see real `<commit> built <date> (tag: …)` instead of `nix2container 1970-01-01T00:00:00Z`
## Tests
- test(cmd): `TestPersistentPreRun_SetsRunnerStackFromConfig` + `TestPersistentPreRun_NoConfig_LeavesDefaults` pin the `cell build` stack-tag fix
- test(cmd): `TestSavePlaywrightFingerprint_*` (2 cases) + `TestReadPlaywrightFingerprint_ReadsFromPlaywrightSubdir` + `TestChromePaths_PlaywrightSubdir` pin the new fingerprint / storage-state path layout + auto-mkdir behaviour
- test(cmd): `TestKickMcpInCellsSharingCellHome_*` covers the post-login sibling-MCP kick with fake docker plumbing (no docker shell-out from tests)
- test(cmd): `TestFlushCookieDb_*` covers the sqlite WAL checkpoint + freshness signal (cookies-missing, fresh, stale variants)
- test(internal/runner): `TestImageMetadataFromInspect_*` (3 cases) + `TestImageVersions_Format` (4 sub-cases) cover the new label-based metadata path via exported pure helpers (`ImageMetadataFromInspectExport`, `FormatImageVersionUserExport`)
…s (no-PAM build); `--debian` → `--impure` (back-compat alias retained); `[cell].hostname` overrides container hostname; Atlassian MCP server bundled; Taskfile refactored to variant-first taxonomy
- fix(ci): `.github/workflows/build.{dev,release}.yml` add `oci-mediatypes=true` to the bake output spec — `skopeo copy` no longer aborts with `unsupported docker v2s2 media type: application/vnd.docker.image.rootfs.diff.tar.zstd`; CI publishes to GHCR + mirror to ECR Public are green again
- fix(nixhome/image): sudo built with `withPam = false` — `sudo` inside cells stops aborting with `unable to initialize PAM: Critical error - immediate abort`; `task nix:validate`, security.nix wordlist symlinks, and every home-manager activation that shells out via sudo now succeed without PAM stack setup
- fix(nixhome/image): symlinks `${pkgs.coreutils}/bin/env` → `/usr/bin/env` — `#!/usr/bin/env <interp>` shebangs in third-party scripts (Claude Code plugin hooks at `~/.claude/plugins/**/*.py`, etc.) execute without `bad interpreter: No such file or directory`
- feat(cmd): `--impure` is the new canonical flag for the legacy Dockerfile build path (DIMM-213 vocab rename); `--debian` is retained as a deprecated alias that strips from forwarded args and routes to the same code path — `cell claude --impure`, `cell build --impure` work; existing `--debian` invocations keep working for one release
- feat(internal/runner): `PickImageTag(impure bool)` parameter renamed from `debian`; `StackImageTagImpure()` added (returns `<reg>:v<ver>-<stack>-impure`); `StackImageTagDebian()` retained as a deprecated alias that forwards to Impure — registry-side tag scheme now reads `-impure` instead of `-debian`, paving the way for full `--debian` removal next release
- feat(internal/cfg): new `[cell].hostname` TOML key + `DEVCELL_HOSTNAME` env override via `ResolvedHostname(computed)` — operators can pin a stable container hostname instead of the default `cell-<basename>-<cellID>` computed value (useful for hostname-based service discovery, logs grep, certificate SANs)
- feat(nixhome/project-management): bundle Atlassian remote HTTP MCP server (`https://mcp.atlassian.com/v1/mcp/authv2`) for Jira / Confluence / Rovo — `cell claude` users can query and update Atlassian issues via OAuth 2.1 (3LO) on first use; opencode/codex/gemini still need an `npx -y mcp-remote …` stdio wrap (filed as follow-up)
- refactor(taskfile): variant-first, noun-first taxonomy with aggregates over both build paths — `image:impure:*` (Dockerfile/bake) and `image:pure:*` (nix2container) become symmetric subtrees; `image:{build,push}` aggregate both; new variant-agnostic `image:mirror` (cross-registry copy) and `image:manifest` (multi-arch stitch); back-compat aliases preserved for every renamed task (`image:build:base/ultimate/pure`, `bake:local`, `image:dev`, `image:push:debian`, `swag:generate`, `web:docs`, `nix:validate`, `test`, `install`)
- refactor(taskfile): `cell:build` now writes `./bin/cell` instead of `~/.local/bin/cell`; `cell:install` cp-copies `./bin/cell` → `~/.local/bin/` (was a no-op `mkdir` + echo) — separates "build" from "install"; existing `task install` muscle memory still works
- refactor(taskfile): deletes `image:build:user-local:dev` + `:force` (duplicate `image:dev` alias was a silent foot-gun; hardcoded `core-local` env predated DIMM-203 tag scheme)
- test(cmd): `TestStripCellFlags_ImpureBoolFlagStripped` (new) pins `--impure` strip; `TestStripCellFlags_DebianAliasStillStripped` (renamed from `_DebianBoolFlagStripped`) pins back-compat alias strip; `TestPickImageTag_FlippedDirection` updated to assert `impure` vocabulary in error messages
- test(internal/runner): `TestStackImageTagImpure_*` (2 cases, was `_Debian_*`) pin the new `-impure` registry suffix; `TestStackImageTagDebian_AliasForwardsToImpure` + `TestStackImageTagDebian_DeprecatedAliasReturnsImpureSuffix` pin the deprecated alias contract; `TestPickImageTag_ImpureTrueReturnsBareTag` (renamed from `_DebianTrueReturnsBareTag`) pins the bare-tag direction
- test(internal/cfg): `TestResolvedHostname_*` (3 cases) pin the env > toml > computed precedence; `TestLoadFile_HostnameTOMLKey` pins the TOML decode path
…file build with `go: go.mod requires go >= 1.25.0 (running go 1.24.13)` - fix(images): bump `images/Dockerfile` builder stage from `golang:1.24-alpine` → `golang:1.25-alpine` — `go.mod` declared `go 1.25.0` so the 1.24 base image's `GOTOOLCHAIN=local` (set by alpine's `go` package) refused to auto-fetch a newer toolchain and aborted with the version-mismatch error; the impure CI pipeline (`docker-build` matrix → `task image:impure:push`) now compiles the cell binary cleanly on both arches
…every dev push and tagged release — `cell claude` users no longer need a local `cell build` after a fresh clone
- feat(ci): new `pure-build` matrix job in `build.dev.yml` and `build.release.yml` (amd64×base, amd64×ultimate, arm64×base, arm64×ultimate) — runs in parallel with `docker-build` (both depend only on `secrets`); each leg installs Nix via `DeterminateSystems/nix-installer-action`, then calls `task image:pure:push:<stack>` which uses nix2container's `.copyTo` to push OCI-compliant images (zstd layers under OCI manifest — no v2s2 mismatch); each leg then calls `task image:mirror` to copy GHCR → ECR Public
- feat(ci): new `pure-manifest` job stitches the per-arch tags (`v0.0.0-amd64-{base,ultimate}-pure`, `v0.0.0-arm64-{base,ultimate}-pure`) into multi-arch manifests (`v0.0.0-{base,ultimate}-pure`, `latest-…`, `dev-…` for dev / version-tagged for release) on both GHCR and ECR Public via `task image:manifest` — `docker pull public.ecr.aws/w1l3v2k8/devcell:v0.0.0-ultimate-pure` resolves to the correct arch automatically
- feat(ci): caches `/nix/store` between runs via `nix-community/cache-nix-action@v6` with primary key `nix-<arch>-<stack>-<hash(flake.lock + nixhome/**/*.nix)>` and partial-restore fallback `nix-<arch>-<stack>-` — cold first run takes ~30-45 min (full substitution from `cache.nixos.org`); subsequent runs hit the cache and drop to ~5-10 min depending on what changed; cache is GC-capped at 5 GB and purged after 7 days so GHA cache storage stays bounded
- refactor(taskfile): `image:pure:build:stack` + `image:pure:push:stack` gain `SUDO` and `NIX` Taskfile vars (defaults: empty + `nix` on PATH) — CI overrides `SUDO=""` `NIX=nix` to use single-user nix without sudo; devcell containers with daemon off can still pass `SUDO=sudo` to invoke the multi-user store; previously these tasks hardcoded `sudo /opt/devcell/.local/state/nix/profiles/profile/bin/nix` which only worked inside the devcell container
- fix(nixhome/fragments): hoist dangling-symlink cleanup before the baked-installs loop so pure images (no /opt/mise) still remove stale symlinks from prior impure generations — was the root cause of terraform/opentofu shims silently absent on fresh pure cells - fix(nixhome/fragments): gate cross-bind loop on `[ -d "$baked/installs" ]` so pure images skip the impure-only symlinking step cleanly - fix(nixhome/fragments): reshim failures now log a warning instead of being silenced with `2>/dev/null || true` — surfaces future breakage - fix(images/Dockerfile): add `mkdir + ln + mise reshim` with `MISE_DATA_DIR=/opt/devcell/.local/share/mise` for all four profile stages — bakes level-2 shim dir into image at build time - fix(nixhome/image): stage `/etc/devcell/tool-versions` from `homeConfig.config.devcell.mise.tools` at nix2container build time — activation scripts don't run in pure builds, so the file was absent and `mise install -y` never ran at boot - feat(nixhome/modules): add two-level `home.sessionPath` for mise shims — user `~/.local/share/mise/shims` (L1) before image-baked `/opt/devcell/.local/share/mise/shims` (L2) so user installs win - chore(taskfile): document `--retry-times 5 --retry-delay 5s` for skopeo copyTo — GHCR 504s on large blobs no longer abort the push - test(mise): new `test/mise_test.go` — L1 wiring checks (Dockerfile reshim step, sessionPath order, entrypoint cleanup, image.nix staging) + L2 container checks (declared tools on PATH, baked shim dir, terraform/tofu regression pins, PATH precedence)
- fix(nixhome/image): stage permissive /etc/pam.d/sudo pointing at pam_permit.so — sudoers.so policy plugin's pam_start succeeds; pre-DIMM-216 every `sudo` call died with "unable to initialize PAM: Critical error - immediate abort" before any policy check ran - fix(nixhome/fragments): detect empty tool-install dirs and invalidate .tv-global.sha / .tv-workspace.sha so `mise install -y` re-runs after a manual `rm -rf installs/<tool>/*` — previously sha matched and declared tools stayed missing from PATH (DIMM-215) - test(sudo): new sudo_test.go — L1 pins PAM stub presence and all four phases; L2 container checks `sudo whoami` returns root and no PAM error string - test(mise): TestMise_EntrypointInvalidatesShaOnStaleState pins sha invalidation order and conditionality (DIMM-215)
…ll build --debug` reports image identity/age/cache stats, and macOS Sequoia stops SIGKILLing `cell` after every rebuild - fix(nixhome/packages): stage `/etc/devcell/tool-versions` and `/etc/pam.d/sudo` in `packages/image.nix` (the file `flake.nix:241` actually imports) — pure cells now have `sudo whoami → root`, and declared mise tools (terraform/tofu/node/go) install on first boot. Three prior commits (`e6acce0` sudo no-PAM, `5cefc1f` DIMM-214 tool-versions, `4e200c0` DIMM-215+216) landed in an orphaned top-level `nixhome/image.nix`; no built image ever carried those fixes until now. - fix(nixhome/packages): `/etc/sudoers` adds `env_keep += "SSL_CERT_FILE NIX_SSL_CERT_FILE NIX_PATH NIX_CONFIG NIX_REMOTE NIX_USER_CONF_FILES LOCALE_ARCHIVE"` — `sudo nix profile add nixpkgs#htop` now works without `--preserve-env=` flags; the SSL-cert path survives `env_reset`. - chore(nixhome): delete orphaned top-level `nixhome/image.nix` — no file in the flake referenced it; deleting eliminates the "two `image.nix` files with parallel structure" footgun that hid three phantom-fix commits. - feat(cmd/build): `cell build --debug` appends a post-build summary line — image tag, ID (short), OCI created timestamp, size, total layer count, and new-vs-cached blob counts parsed from skopeo's per-blob log. Answers "did this rebuild actually produce new content, or was it an all-cache reuse?" without a separate `docker inspect` call. - feat(internal/runner): `LayerCounter` (`io.Writer` decorator) tallies skopeo's `Copying blob ... done|skipped` lines, deduping by blob ID across the two skopeo passes (nix→registry, registry→daemon) so a single new layer isn't double-counted — feeds the `--debug` summary. - feat(internal/runner): `InspectImageDebug` + `parseImageInspectDebug` extract image metadata in one `docker image inspect` call; handles both array and single-object output shapes. - fix(infra): `task cell:build` and `task cell:install` run `xattr -c` + `codesign --force --sign -` on macOS hosts — eliminates the macOS Sequoia (15.x) launch-constraint SIGKILL (`zsh: killed cell --version`, exit 137) that fires after every rebuild of an ad-hoc-signed Go binary; uname-gated, no-op on Linux. - test(test): new `TestImageNix_CanonicalPathReferencedByFlake` guard asserts `nixhome/flake.nix` imports `./packages/image.nix` — locks the canonical image.nix path so future L1 file-content tests can't silently pass against an orphan; the test header carries a post-mortem comment describing the 3-commit phantom-fix saga. - test(test): repoint `TestMise_ImageNixStagesToolVersions`, `TestSudo_ImageNixStagesPamStub`, `TestSudo_PamStubCoversAllPamPhases` from the orphan `image.nix` to `packages/image.nix` — the tests were passing against unused content. - test(test): new `TestSudo_SudoersPreservesNixEnv` (L1 pins the `env_keep` line and required vars) and `TestSudo_PreservesNixEnv` (L2 runs `sudo env` in a real container and asserts the vars survived) — block regressions on env propagation across sudo. - test(test): new `sudo_integration_test.go::TestSudo_NixInstallHtop` behind `-tags=integration` — exercises the full `sudo nix profile add nixpkgs#htop` round-trip end-to-end (network-dependent, ~30s); opted out of default runs. - test(internal/runner): `build_debug_test.go` covers `LayerCounter` parsing (done/skipped classification, dedup across skopeo passes, partial-line buffering) and `parseImageInspectDebug` for both docker-inspect JSON shapes. - chore(deps): transitive `go.sum` updates from `go mod tidy` — no `go.mod` changes, no new top-level dependencies.
…in every project root it's launched from — scratch SVGs now land under `.devcell/inkscape/` scoped to the current project
- fix(graphics): set `INKS_WORKSPACE=.devcell/inkscape` in the `inkscape-mcp` env block at `nixhome/modules/graphics.nix:50` — upstream's `config.py` runs `Path(os.getenv("INKS_WORKSPACE", "inkspace")).resolve().mkdir(parents=True, exist_ok=True)` at module import (typo'd `"inkspace"`, relative to cwd), so every MCP spawn pre-created a stray top-level directory; the override redirects scratch artifacts into the project-local `.devcell/` tree where other cell-managed state already lives.
…tes inside cells; `sudo nix profile add` no longer drops SSL/cert env on impure cells; nix-declared MCP servers (inkscape-mcp et al.) reach claude/codex/opencode/gemini in pure cells - feat(nix-ld): `mise install` of `node@26`, `go@1.26`, `terraform`, and any precompiled tarball now runs the resulting binary in pure (nix2container) AND impure (Debian) cells — the kernel can resolve the binary's hardcoded `/lib/ld-linux-<arch>.so.<n>` interpreter via the nix-ld shim, and the shim resolves shared libs from a curated nix-closure dir. Pre-fix `mise install` extracted node successfully then died on the verify step with `node -v ... No such file or directory (os error 2)`. - fix(sudoers): `sudo` in impure (Debian-base) cells now preserves `SSL_CERT_FILE` / `NIX_SSL_CERT_FILE` / `NIX_PATH` / `LOCALE_ARCHIVE` across the privilege boundary — `sudo nix profile add nixpkgs#htop` no longer fails with "SSL peer certificate ... was not OK" against `cache.nixos.org`. - fix(image): pure (nix2container) images now stage each LLM agent's nix-managed MCP config to `/etc/claude-code/nix-mcp-servers.json`, `/etc/codex/nix-mcp-servers.toml`, `/etc/opencode/nix-mcp-servers.json` (+ `nix-providers.json`), `/etc/gemini/nix-mcp-servers.json` at image build — pre-fix the per-agent `home.activation.setupManaged*` script wrote those files via `sudo cp` at home-manager switch, which the pure path skips entirely, so `fragments/30-*.sh` short-circuited at "no nix file" and `~/.claude.json` (etc.) never received nix-declared MCP servers like inkscape-mcp. - fix(graphics): `inkscape-mcp`'s `INKS_WORKSPACE` override now resolves to `.devcell/inkspace/` (matching upstream's literal `inkspace` naming, kept under the project's `.devcell/` tree) — supersedes the prior `.devcell/inkscape` value from `e5b13b2`. Net effect after rebuild: no more stray `inkspace/` directories at project roots; scratch SVGs land in `.devcell/inkspace/` per project. - refactor(nix-ld): switched non-nix binary library bootstrap from `LD_LIBRARY_PATH` to `NIX_LD_LIBRARY_PATH` — the closure libs are now visible only to non-nix binaries via the nix-ld shim; nix-built tools (`gpg`, `uv`, `x11vnc`, `curl`, …) follow their RPATH untouched. Fixes the long-running collisions where `gpg` failed with `GLIBC_2.42 not found (libgpg-error-1.59)`, `uv` with `undefined symbol: _rjem_malloc`, and `x11vnc` with the same glibc skew. - fix(image): `NIX_LD_LIBRARY_PATH` is now a single `/opt/devcell/.nix-ld-libs` directory with one symlink per `.so*` from the closure, not a 25 KB colon-separated path list — basic commands (`grep`, `sleep`, `mkdir`, `gosu`) no longer crash with "Argument list too long" once the var propagates through a few fork/exec generations and the total environment crosses the kernel's `ARG_MAX` (~2 MB). - fix(image): `/usr/bin/env` shipped in pure cells via `pkgs.dockerTools.usrBinEnv` — Claude Code hook plugins, PyPI/npm CLIs, and any script with the canonical `#!/usr/bin/env <interp>` shebang stop failing with `bad interpreter: No such file or directory`. - chore(image): `nix-ld` package + arch-conditional `/lib/ld-linux-<arch>.so.<n>` interpreter symlink baked into pure-image `homeRoot`; `NIX_LD` + `NIX_LD_LIBRARY_PATH` baked into OCI `Env` (computed via IFD over a closure-walk side-derivation); stable `~/.nix-ld-loader` and `~/.nix-ld-shim` `home.file` symlinks for the impure path to point `ENV NIX_LD` at without a `/nix/store` hash. - chore(llm): each LLM module (claude.nix, codex.nix, opencode.nix, gemini.nix) now exposes its generated MCP-config derivation as a read-only option (`devcell.managed<Agent>.nixMcpConfigFile`, plus `nixProvidersFile` for opencode) — image.nix consumes those options for pure-image staging instead of duplicating the per-agent JSON/TOML transformer logic. - test(image): 3 new L1 file-content tests in `test/managed_mcp_staging_test.go` (`TestImageNix_StagesAllAgentMcpConfigs`, `TestImageNix_StagesMcpConfigsFromExposedOptions`, `TestGraphicsNix_InkscapeWorkspaceUnderDevcell`) pin the four `/etc/<agent>/nix-mcp-servers.*` staging targets, the read-only-option consumption pattern, and the `.devcell/`-prefixed `INKS_WORKSPACE` value. - test(nix-ld): 5 new L1 file-content tests in `test/mise_node_install_test.go` pin the contract — `pkgs.nix-ld` in `nixhome/modules/base.nix`, interpreter symlink staging in `image.nix`, `NIX_LD=`/`NIX_LD_LIBRARY_PATH=` in OCI Env, `ENV NIX_LD` in impure Dockerfile, fragment migrated from `LD_LIBRARY_PATH` to `NIX_LD_LIBRARY_PATH`. - test(sudo): 2 new L1 tests in `test/sudo_test.go` (`TestSudo_DockerfileStagesEnvKeep`, `TestSudo_DockerfileSetsNixSSLEnv`) pin the impure Dockerfile's `env_keep` line and `ENV SSL_CERT_FILE`/`NIX_SSL_CERT_FILE`/`LOCALE_ARCHIVE` directives — guards parity with the pure-image sudoers config so `TestSudo_PreservesNixEnv` (L2) can rely on the env being inherited by docker exec sessions. - test(image): 1 new L1 test in `test/usr_bin_env_test.go` (`TestImageNix_StagesUsrBinEnv`) pins `pkgs.dockerTools.usrBinEnv` in pure-image `copyToRoot` so the `/usr/bin/env` shim can't be silently dropped on a future cleanup. - test(env): `TestEnv_NixLdLibraryPath` renamed to `TestEnv_NixLdLibs` and rewritten to assert `/opt/devcell/.nix-ld-libs/` is a directory with the expected GUI library symlinks (gtk-3, cairo, pango, nss, nspr, asound, dbus, xkbcommon); `TestEnv_LdLibraryPathSession` renamed to `TestEnv_NixLdLibraryPathSession` and updated to assert the session env carries `NIX_LD_LIBRARY_PATH=/opt/devcell/.nix-ld-libs` instead of the old colon-list `LD_LIBRARY_PATH`. - test(tests): added `readImagesDockerfile` helper alongside `readNixhomeFile` in `test/mise_test.go` — shared `images/Dockerfile` reader for L1 file-content tests across `sudo_test.go`, `mise_node_install_test.go`, and `managed_mcp_staging_test.go`. - chore(repo): `.node-version` (single line `26`) staged at repo root — a test repro artifact from in-cell `mise install` debugging. Recommend `git restore --staged .node-version && rm .node-version` before committing; devcell's own tooling versions live in `nixhome/modules/mise.nix` via `devcell.mise.tools`, not in a project-level `.node-version`.
…kouts so CI no longer fails on a missing `docs` package import - fix(tests): TestCell_Shell now runs `swag init -g cmd/serve.go -o docs --parseDependency --parseInternal` before `go build ./cmd` — `go test ./test/...` on a fresh clone (CI runners, contributor onboarding) no longer fails with `cmd/serve.go:11:2: no required module provides package github.com/DimmKirr/devcell/docs`. `docs/` stays gitignored; the test bypasses `task cell:build` (which already wires `deps: [swagger:generate]`), so it now self-bootstraps the swag-generated package the same way Task does.
…, python 3.13.2) now reach the session-user PATH via `bash -lc`, and the runtime test contract aligns with what the image actually ships (drops orphan `/opt/{npm,python}-tools` and the legacy `user-image-version` stamp)
- fix(entrypoint): `05-shell-rc.sh` always creates `~/.profile` and `~/.bashrc` for the session user, even when `/opt/devcell/` only has `.zshrc`/`.zshenv` (home-manager doesn't enable `programs.bash`) — `bash -lc`, `mise activate`, IDE exec hooks, and `sudo -i` now read a rc file with the full PATH instead of falling back to system default, so declared mise tools (go, terraform, opentofu, node) stop silently disappearing on every cell launch.
- fix(entrypoint): `05-shell-rc.sh` PATH adds `/opt/devcell/.local/share/mise/shims` (level-2 baked dir) after `$HOME/.local/share/mise/shims` — declared tools stay on PATH even when the user shim dir is empty (fresh cell, or after `mise reshim` silently failed), and user installs still win on conflict because level-1 precedes level-2.
- fix(entrypoint): `05-shell-rc.sh` drops dead PATH entries `/opt/python-tools/.venv/bin` and `/opt/npm-tools/node_modules/.bin` left over from the pre-nix patchright/codex install path — these directories haven't been created by either image variant since DIMM-94/DIMM-96 moved those tools to nix, so the references only polluted PATH with non-existent paths.
- feat(python): enable `devcell.mise.tools.python = "3.13.2"` in `python.nix` — `python` (mise shim, latest 3.13) is now available alongside nix-provided `python3` (fallback); projects with a `.tool-versions` or `.python-version` get the right interpreter via mise without manual install.
- test(mise): `declaredMiseTools` parser regex now anchors to line start (`(?m)^\s*…`) and skips commented declarations — the previous regex matched `# devcell.mise.tools.python = …` and made `TestMise_TwoLevelShims_BakedDirExists/python` falsely demand a python shim before python was actually enabled.
- test(mise): new `TestMise_ShellRcHasBakedShimDir`, `TestMise_ShellRc_UserShimsBeforeBaked`, and `TestMise_DeclaredMiseToolsParser_SkipsCommented` lock in the spec for level-1/level-2 PATH order in the shell-rc fragment and the parser regex shape — future edits that drop the baked dir or break the regex fail at L1.
- test(runtime): `TestEnv_BasePermissions` no longer asserts `/opt/npm-tools` or `/opt/python-tools` exist (the image doesn't create them anymore, and the PATH references that demanded them are now gone in this same commit) — failing assertions that test against ghost dirs become passing assertions against the real contract.
- test(runtime): `TestEnv_ImageVersionStamps` reads `/etc/devcell/metadata.json` (canonical per DIMM-84) instead of the legacy `/etc/devcell/{base,user}-image-version` files — `user-image-version` was never written by any build path, `base-image-version` is impure-only, and `metadata.json` is staged by both pure and impure variants, so the test now reflects what the image actually ships.
…al chrome.runtime / WebGL fingerprint on arm64 (Mesa/llvmpipe), and `go test ./test/...` can now route the whole suite to either the pure or impure image with `DEVCELL_TEST_VARIANT={pure,impure}`
- fix(scraping): `nixhome/modules/scraping/default.nix` installs `window.chrome` via `Object.defineProperty(writable:false, configurable:false)` with a try/catch fallback to plain assignment — arm64 detection-suite tests that saw `window.chrome.runtime missing` after the init script ran (Chromium late-injecting its own window.chrome over our mock) now keep the runtime mock through page load.
- fix(scraping): stealth init.js wraps `HTMLCanvasElement.prototype.getContext` to also Object.defineProperty `getParameter` on the returned WebGL/WebGL2 context instance — covers the arm64 Mesa/llvmpipe case where the prototype-level `WebGLRenderingContext.prototype.getParameter` patch was shadowed by an own-property on the context, leaking `"Google Inc. (Mesa)"` / `"ANGLE (Mesa, llvmpipe…)"` instead of the `'Intel Inc.'` / `'Intel Iris OpenGL Engine'` spoof. Prototype patch stays in place as the primary path; instance patch is belt-and-suspenders.
- feat(tests): `test/helpers_test.go` now reads `DEVCELL_TEST_VARIANT` (`pure` / `impure`, default `impure`) and resolves the image tag accordingly — running `DEVCELL_TEST_VARIANT=pure go test ./test/...` routes every test in the suite to `devcell-user:ultimate-pure` (or `DEVCELL_TEST_PURE_IMAGE` override), so pure-image regressions stop shipping undetected. Back-compat: default behavior is unchanged for current CI.
- feat(tests): new `pureImage(t)` and `impureImage(t)` accessors in `helpers_test.go` — tests asserting variant-specific behavior (e.g. base-image-version stamp present only on impure) can opt into a specific variant and `t.Skip` gracefully when the needed local image isn't loaded, instead of failing or silently testing the wrong variant.
- refactor(tests): `image()` in `helpers_test.go` now auto-discovers locally-built tags (`devcell-user:ultimate-pure`, `ghcr.io/dimmkirr/devcell:ultimate-local`) before falling back to a scratch bake — no more manually exporting `DEVCELL_TEST_IMAGE` after every `task image:*:build`, the local iteration loop just works.
- test(scraping): new `test/stealth_test.go` pins the two defensive patterns (chrome.runtime via Object.defineProperty + WebGL instance patch via HTMLCanvasElement.prototype.getContext) so reverts to plain assignment or prototype-only patching fail at L1.
- test(tests): new `test/variant_test.go` table-tests the variant selection logic across 8 cases (env override, local tag present/absent, unknown variant, empty-as-impure back-compat) — no docker dependency, runs in milliseconds.
… hosts — warm boots no longer pay for `mise reshim` and recursive chowns over the persistent `~/.local/share/mise` (~17k entries) when `~/.tool-versions` is unchanged - fix(entrypoint): move `mise reshim` and the two `chown -R` invocations inside the existing `.tv-global.sha` gate in `nixhome/modules/fragments/10-mise.sh` — warm cell launches skip the recursive walks the sha already proved redundant, cutting steady-state boot time by ~45s on macOS bind mounts; cold/changed boots still pay the full cost once and update the sha - test(mise): add two L1 structural tests (`TestMise_ReshimGatedByShaOnWarmBoot`, `TestMise_ChownsGatedByShaOnWarmBoot`) asserting reshim and both chowns sit after the sha-gate opens — guards against regressions that re-introduce the per-boot walk - chore(desktop): add `vollkorn` serif font to `nixhome/modules/desktop/default.nix` — extra body-text option available in desktop-enabled cells
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
nix profile addnow works inside a running cell04-nix-daemon.shdefaultsDEVCELL_NIX_DAEMONtotrueso the daemon spawns at boot, setsNIX_REMOTE=daemon, and the previously-gated fixups (chmod 1777 /tmp, setuid sudo,/nix/var/nix/*state dirs) now always run —nix profile add nixpkgs#scc,nix shell nixpkgs#hello, andnix-env -iA nixpkgs.fooall succeed from inside any cell out of the box; previously every invocation trippedcould not set permissions on '/nix/var/nix/profiles/per-user' to 755; opt out withDEVCELL_NIX_DAEMON=falsegnutar—taris back on$PATHafter the Debian-base slimdown left it missing, socurl … | tar xz,pip install, and binary-release extraction all work without the7zworkaroundUnified browser-state layout (
~/.chrome/<app>/+~/.playwright/)cell login <url>writes Chromium profile to~/.devcell/<session>/.chrome/<app>/and Playwright cookie snapshot to~/.devcell/<session>/.playwright/storage-state.json; fingerprint metadata moves to~/.devcell/<session>/.playwright/fingerprint.json— one canonical location for browser auth state instead of three scattered per-app dirs (.chrome/<app>/,.chrome-<app>/,.playwright-<app>/) that drifted out of sync$HOME/.playwright/storage-state.jsonand falls back to$HOME/.chrome/${APP_NAME};chromiumwrapper also opens$HOME/.chrome/${APP_NAME}— interactive chromium and MCP automation share one cookie jar so acell loginsession is immediately visible to MCP without going through the JSON snapshot20-homedir.shexportsCHROMIUM_PROFILE_PATHandPLAYWRIGHT_MCP_USER_DATA_DIRboth pointing at$HOME/.chrome/${APP_NAME}— eliminates the dash-vs-slash mismatch where cell login wrote to one path and the in-container chromium read from another22-chromium-singleton.sh(new) sweeps staleSingletonLock/SingletonCookie/SingletonSocketfiles from all~/.chrome/*/profile dirs at every container start — no moreFailed to acquire SingletonLockerrors or hung launches afterdocker kill/ OOM / container restart; PIDs from prior container generations are dead by namespace isolation so the sweep is unconditionally safecell login, SIGTERM patchright in every sibling running cell that shares the same cell-home bind mount — Claude's MCP client respawns it on the next browser tool call against the freshly written storage-state.json, so sibling cells stop running with the pre-login cookie jar still in their long-livedBrowserContextPRAGMA wal_checkpoint(TRUNCATE)on the Cookies sqlite db between Phase 1 exit and Phase 2 launch, and warn⚠ Cookies database wasn't modified during this session — did you actually log in?when its mtime predates the login session start — surfaces silent "closed the window without authenticating" without aborting extractionRDP / Chromium GUI works on first launch
fonts.fontconfig.defaultFontspopulated for serif / sans-serif / monospace / emoji — CSSfont-family: sans-serifresolves to a real face (IBM Plex Sans / Cascadia Code / Noto Color Emoji) instead of an empty alias chain, so chromium screenshots and Playwright renders stop shipping blank texthomeRootstages/opt/devcell/.nix-profile → home-manager-pathso fontconfig's<dir>$HOME/.nix-profile/share/fonts</dir>resolves; bridges/etc/fonts/conf.d → /opt/devcell/.config/fontconfig/conf.dso pkgs.fontconfig's default<include>resolves; fc-list goes from 1 face (DejaVu only) to 5,971config.EnvbakesFONTCONFIG_FILE,FONTCONFIG_PATH,LIBGL_*,GALLIUM_DRIVER,VK_ICD_FILENAMES— chromium spawned by the MCP server (not a user shell) inherits font + GPU-software-rendering setup that previously only existed in the shell rc fragment06-nix-ldpath.sh+05-shell-rc.shskip the Debian-eraLD_LIBRARY_PATHclosure injection when/etc/devcell/.image-built-with-nix2containeris present — pure-image binaries with correctDT_RUNPATHstop having different library versions injected ahead of their RPATH; uv stops crashing withundefined symbol: _rjem_malloc(jemalloc mismatch) and x11vnc stops crashing withGLIBC_2.42 not found(libgpg-error mismatch)50-gui.shwraps every gosu service launch withenv -u LD_LIBRARY_PATH -u _DEVCELL_LD_SET— Xvfb / fluxbox / x11vnc / feh / xrdp / xsetroot all start cleanly on pure images even when an inherited closure path lingers from a parent shell; RDP into the cell reaches the real desktop instead of failing the vnc-any backendcell buildcorrectnessrunner.Stack/runner.Modules/runner.PerSessionImageresolution moved toPersistentPreRunso every subcommand picks up the project's[cell].stackbefore itsRunEfires —cell buildfrom a project withstack = "ultimate"tagsdevcell-user:ultimate-pureinstead of alwaysdevcell-user:base-pureregardless of which stack actually builtnix.confshipbuild-users-group =(empty) — in-containercell build/home-manager switchno longer errorthe group 'nixbld' specified in 'build-users-group' does not exist(the pure image is single-user by design and doesn't stage nixbld1..10 accounts)Build provenance (
docker inspectshows real date + rev)org.opencontainers.image.created/.revisionlabels and a realCreatedfield via nix2container'screatedparameter, threaded in fromDEVCELL_BUILD_DATE/DEVCELL_BUILD_REVenv read withbuiltins.getEnvunder--impure—docker inspect <image>shows the real wall-clock build time and git rev instead of0001-01-01/unknownmetadataJsonderivation no longer interpolates per-build timestamp/rev into/etc/devcell/metadata.json— eliminates the ~3.9 GB customization-layer re-push that happened on everycell buildeven when no source changed; per-build provenance now lives only in the tiny OCI manifest blob, layer SHAs stay content-stablecell <agent>injects-e DEVCELL_BUILD_DATE=<value>/-e DEVCELL_BUILD_REV=<value>by reading OCI labels via a singledocker image inspect(replaces the olddocker run --rm cat /etc/devcell/metadata.jsonthat spawned a throwaway container) — startup faster (~ms vs container spawn) and the entrypoint's "User image: …" boot log shows real timestampsnix buildargv gains--impureandcmd.EnvcarriesDEVCELL_BUILD_DATE(RFC3339 now) +DEVCELL_BUILD_REV(resolved viagit rev-parse HEADwith-dirtysuffix when worktree unclean, env override wins) — the OCI labels above actually get populated at build timeDEVCELL_BUILD_DATE/DEVCELL_BUILD_REVenv over the placeholder JSON fields — users see real<commit> built <date> (tag: …)instead ofnix2container 1970-01-01T00:00:00ZTests
TestPersistentPreRun_SetsRunnerStackFromConfig+TestPersistentPreRun_NoConfig_LeavesDefaultspin thecell buildstack-tag fixTestSavePlaywrightFingerprint_*(2 cases) +TestReadPlaywrightFingerprint_ReadsFromPlaywrightSubdir+TestChromePaths_PlaywrightSubdirpin the new fingerprint / storage-state path layout + auto-mkdir behaviourTestKickMcpInCellsSharingCellHome_*covers the post-login sibling-MCP kick with fake docker plumbing (no docker shell-out from tests)TestFlushCookieDb_*covers the sqlite WAL checkpoint + freshness signal (cookies-missing, fresh, stale variants)TestImageMetadataFromInspect_*(3 cases) +TestImageVersions_Format(4 sub-cases) cover the new label-based metadata path via exported pure helpers (ImageMetadataFromInspectExport,FormatImageVersionUserExport)