Skip to content

fix(vm_workload_showroom): remap .ssh ownership for ansible-runner container (rootless podman uid mapping)#65

Merged
andrew-jones merged 4 commits into
mainfrom
fix-showroom-ssh-podman-uid-mapping
Apr 23, 2026
Merged

fix(vm_workload_showroom): remap .ssh ownership for ansible-runner container (rootless podman uid mapping)#65
andrew-jones merged 4 commits into
mainfrom
fix-showroom-ssh-podman-uid-mapping

Conversation

@prakhar1985

@prakhar1985 prakhar1985 commented Apr 22, 2026

Copy link
Copy Markdown
Contributor

What breaks

When showroom_ansible_runner_api: true is set, every lab that uses the ZT runner to execute solve/validate playbooks fails on every module with:

fatal: [node]: UNREACHABLE! => {
  "msg": "Failed to connect to the host via ssh: Can't open user config file /app/.ssh/config: Permission denied"
}

Solver and validator cannot connect to any lab node. Every click of Check or Solve in the showroom UI fails immediately.


Commits in this PR

1. Fix .ssh ownership for ansible-runner container (rootless podman uid mapping)

The .ssh directory is created owned by showroom:showroom (uid=1888, mode 700) and volume-mounted into the ansible-runner-api container at /app/.ssh.

The container runs as uid=1001 (default user in the UBI9 image). In rootless podman, the showroom user's UID namespace maps:

container uid host uid
0 1888 (showroom)
1 231072 (first subuid)
1001 232072 (first subuid + 1000)

The container process (uid=232072 on host) cannot read files owned by uid=1888 with 700 permissions. Every SSH attempt by Ansible fails before a connection is even opened.

Fix: after writing the SSH config, run podman unshare chown -R 1001:1001 on the .ssh directory. podman unshare executes inside the showroom user's user namespace, so uid=1001 inside translates to the correct host uid (232072) that the container process actually runs as.

Gated on showroom_ansible_runner_api — no-op for deployments not using the runner.


2. Fix volumes: indent crash and environment: Python dict rendering in ansible_runner_api_service.j2

Two bugs in the template introduced by the "add userdata to runner api" commit that crash podman-compose at startup with:

yaml.parser.ParserError: while parsing a block collection
  in "container-compose.yml", line 63, column 7
expected <block end>, but found '?'

Bug 1 — volumes: wrong indentation:
The {# dns_search #} Jinja2 comment left trailing whitespace that caused volumes: to be indented at the same level as the ports: list items (6 spaces instead of 4). YAML saw it as a non-sequence item inside a sequence and failed to parse.
Fix: remove the comment line entirely.

Bug 2 — environment: rendered as Python dict repr:
{{ f_user_data | combine({...}) }} is rendered by Jinja2 as a Python dict string "{'key': 'value', 'list': ['item']}" which is not valid YAML environment variable format — podman-compose cannot parse it.
Fix: iterate the dict and emit proper YAML key: "value" pairs, converting any list values to comma-separated strings.


Why this regressed in v1.6.8

This was not a problem in v1.6.6 because with showroom_ssh_method: password the sshkey block in 22-showroom-users-security.yml never ran, so no .ssh directory was explicitly created by the showroom role. The container either had an empty mount or no mount at all, and labs used password auth — the permission issue never triggered.

v1.6.7 (PR #63) replaced ansible_runner_api_service.j2 with service_runtime_automation.j2 and dropped the .ssh mount entirely — fully breaking any lab that needs SSH key auth in the runner.

v1.6.8 (PR #64) restored the .ssh mount and added a new task in 20-showroom-user-setup.yml to explicitly create the .ssh directory when showroom_ansible_runner_api: true — but created it owned by showroom:showroom which the container cannot read.


Tested on

  • ocpvdev01.rhdp.net ZeroTouch single-pod deployments, GUIDs fg8gg and 7xl28
  • After all three fixes: showroom starts cleanly, all modules connect to lab nodes, validation checks run correctly

cc @andrjone @miteshget

…ntainer

The ansible-runner-api container runs as uid=1001 inside rootless podman.
In the showroom user's user namespace, uid=1001 maps to a higher host uid
(showroom's first subuid + 1000, typically ~232072), not the showroom user
uid (1888).

The .ssh directory is created owned by showroom:showroom (700), which means
the container process gets 'Permission denied' on /app/.ssh/config and
cannot SSH to lab nodes. Solver and validator both fail immediately with:

  Failed to connect to the host via ssh:
  Can't open user config file /app/.ssh/config: Permission denied

Fix: after writing the SSH config, run podman unshare chown -R 1001:1001
on the .ssh directory. podman unshare executes in the showroom user's
user namespace so uid=1001 inside the namespace translates to the correct
host uid that the container process owns, giving it read access.

Only runs when showroom_ansible_runner_api is enabled.
@prakhar1985 prakhar1985 force-pushed the fix-showroom-ssh-podman-uid-mapping branch 2 times, most recently from ad5fa25 to 717914e Compare April 22, 2026 09:01
…onment

Two fixes to the environment: block:

1. Merge f_user_data into runner env vars so lab credentials
   (satellite_password, bastion_ssh_password, guid, etc.) are passed to
   the zt-runner and injected as Ansible extravars into solve/validate
   playbooks. showroom_runtime_automation_environment_variables can still
   override individual keys.

2. Remove | upper filter from key names — playbooks reference vars in
   lowercase (satellite_password, not SATELLITE_PASSWORD). Uppercasing
   broke ansible extravar injection silently.

3. Handle list values by joining with comma so env vars remain strings.
@prakhar1985 prakhar1985 force-pushed the fix-showroom-ssh-podman-uid-mapping branch from 717914e to 38c227a Compare April 22, 2026 09:02
@prakhar1985 prakhar1985 force-pushed the fix-showroom-ssh-podman-uid-mapping branch from 1c61e70 to ec4823d Compare April 22, 2026 09:36
…tty v2.7.4

- Switch podman unshare chown from shell to command (lint: command-instead-of-shell)
- Add changed_when: true (lint: no-changed-when)
- Set wetty image tag to v2.7.4 instead of v3.0

Made-with: Cursor
@andrew-jones andrew-jones merged commit 4ee741f into main Apr 23, 2026
1 check passed
@andrew-jones andrew-jones deleted the fix-showroom-ssh-podman-uid-mapping branch April 23, 2026 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants