Problem
--host-dirs mounts a path from the worker's local filesystem into the VM. This works great for persistent state, but it creates an implicit worker-affinity constraint that Orchard does not currently enforce.
When a worker reboots, --restart-policy OnFailure correctly restarts the VM — but if the worker is slow to reconnect (or another worker is idle first), Orchard may reschedule the VM onto a different worker where the host-dir path does not exist. The VM then fails with:
tart command failed: "Error Domain=VZErrorDomain Code=2 \"A directory sharing device configuration is invalid.\"
UserInfo={..., NSLocalizedFailureReason=A directory sharing device configuration is invalid.,
NSUnderlyingError=... {Error Domain=NSPOSIXErrorDomain Code=2 \"No such file or directory\"}}"
Because the path is worker-local, the VM is now stuck in a crash-loop on the wrong worker with no automated recovery path.
Impact
Any production use of --host-dirs for persistent state (agent workspaces, databases, checkpoints) is broken by this: the restart policy that is supposed to provide HA instead causes data loss or an outage whenever a worker reboots.
Proposed solutions (pick one or combine)
-
Worker affinity / node selector: allow orchard create vm --worker <name> (or a label selector) so a VM is always scheduled on — and only restarted on — a specific worker. This is the simplest fix and mirrors how Kubernetes nodeName / nodeSelector works.
-
Sticky scheduling: when a VM has --host-dirs and was last running on worker X, treat worker X as the preferred (or required) restart target. Only reschedule elsewhere if worker X is permanently removed from the cluster.
-
Cluster-wide host-dir paths: allow --host-dirs to reference a network-mounted path (NFS, SMB) that is identical across all workers, so worker identity doesn't matter. This is more of a documentation/integration story than a code change.
Current workaround
Operators must pause all other workers before creating a VM, wait until it shows running on the intended worker, and then resume. This is manual, error-prone, and offers no protection against post-creation rescheduling on OnFailure restarts.
Environment
- Orchard controller + workers on Mac Mini (macOS)
--restart-policy OnFailure + --host-dirs on Linux guest VMs (Ubuntu 24.04 aarch64 via Virtualization.framework)
Problem
--host-dirsmounts a path from the worker's local filesystem into the VM. This works great for persistent state, but it creates an implicit worker-affinity constraint that Orchard does not currently enforce.When a worker reboots,
--restart-policy OnFailurecorrectly restarts the VM — but if the worker is slow to reconnect (or another worker is idle first), Orchard may reschedule the VM onto a different worker where the host-dir path does not exist. The VM then fails with:Because the path is worker-local, the VM is now stuck in a crash-loop on the wrong worker with no automated recovery path.
Impact
Any production use of
--host-dirsfor persistent state (agent workspaces, databases, checkpoints) is broken by this: the restart policy that is supposed to provide HA instead causes data loss or an outage whenever a worker reboots.Proposed solutions (pick one or combine)
Worker affinity / node selector: allow
orchard create vm --worker <name>(or a label selector) so a VM is always scheduled on — and only restarted on — a specific worker. This is the simplest fix and mirrors how KubernetesnodeName/nodeSelectorworks.Sticky scheduling: when a VM has
--host-dirsand was last running on worker X, treat worker X as the preferred (or required) restart target. Only reschedule elsewhere if worker X is permanently removed from the cluster.Cluster-wide host-dir paths: allow
--host-dirsto reference a network-mounted path (NFS, SMB) that is identical across all workers, so worker identity doesn't matter. This is more of a documentation/integration story than a code change.Current workaround
Operators must pause all other workers before creating a VM, wait until it shows
runningon the intended worker, and then resume. This is manual, error-prone, and offers no protection against post-creation rescheduling onOnFailurerestarts.Environment
--restart-policy OnFailure+--host-dirson Linux guest VMs (Ubuntu 24.04 aarch64 via Virtualization.framework)