This document explains how Orva isolates user-supplied code from the host and from other functions. It is descriptive, not prescriptive — every claim below is grounded in a specific file/line that you can audit.
Orva is single-tenant: one operator, many functions, possibly many end-users hitting those functions. The platform's security goals, ordered by importance:
- A function cannot read or write another function's data
- A function cannot read or write the host filesystem outside its
own
/codemount and its private/tmp - A function cannot escalate to host root, even though it appears to run as UID 0 inside its sandbox (more on this below)
- A function cannot exhaust host resources beyond its declared memory / CPU / pid limits
- A function cannot make arbitrary network calls when network mode
is
none(default — isolated net namespace, loopback only). Functions that need outbound HTTPS opt in by settingnetwork_mode: egress
Orva is not designed to defend against:
- A malicious operator running on the same host (they own the keys)
- A kernel-level zero-day (no seccomp filter is bulletproof against
kernel exploits — see
nsjailupstream's stance) - Side-channel attacks (Spectre / Rowhammer / cache timing) — these are out of scope for any container-grade isolation
host kernel
└─ docker container "orva" ← UID 0 inside container; needs CAP_SYS_ADMIN
└─ orvad (Go server) to construct sandboxes
└─ nsjail process ← unshare(CLONE_NEWUSER) drops effective caps
└─ user namespace
├─ chroot to runtime rootfs (read-only)
├─ /code bind-mount ← read-only, function-private
├─ tmpfs /tmp ← private, wiped on worker exit
├─ cgroup v2: memory + CPU + pids
├─ seccomp filter ← ~150 syscalls blocked
└─ user code (node / python)
The gap between layers is enforced by the Linux kernel, not by Orva code. Orva configures the boundaries and trusts the kernel to keep them.
User code sees getuid() == 0. It is NOT root on the host.
The mechanism is CLONE_NEWUSER + UID mapping. nsjail's -Mo flag
(internal/sandbox/sandbox.go:154) creates a new user namespace where
inside-UID 0 is mapped to host-UID 65534 (nobody). All capability
checks happen against the user-namespace's UID, but actions that cross
the namespace boundary (writing to a file owned by host-root,
mounting a filesystem outside the chroot, sending a signal to a host
PID) are denied because the function has zero effective capabilities
outside its namespace.
Verify it yourself:
# Deploy a fn that introspects /proc/self/status
KEY=$(docker exec orva cat /var/lib/orva/.admin-key)
curl -X POST -H "X-Orva-API-Key: $KEY" -H 'content-type: application/json' \
http://localhost:8443/api/v1/functions \
-d '{"name":"whoami","runtime":"node","memory_mb":128,"cpus":1}'
FID=$(curl -s -H "X-Orva-API-Key: $KEY" http://localhost:8443/api/v1/functions \
| jq -r '.functions[] | select(.name=="whoami") | .id')
curl -X POST -H "X-Orva-API-Key: $KEY" -H 'content-type: application/json' \
http://localhost:8443/api/v1/functions/$FID/deploy-inline \
-d '{"code":"const fs=require(\"fs\");module.exports=async()=>{return fs.readFileSync(\"/proc/self/status\",\"utf8\").split(\"\\n\").filter(l=>l.startsWith(\"Cap\")||l.startsWith(\"Uid\"));}"}'
curl -X POST -H "X-Orva-API-Key: $KEY" \
http://localhost:8443/fn/${FID} -d '{}'Expected output (the meaningful lines):
Uid: 0 0 0 0
Gid: 0 0 0 0
CapInh: 0000000000000000 ← 0: no inheritable caps
CapPrm: 0000000000000000 ← 0: no permitted caps
CapEff: 0000000000000000 ← 0: no effective caps
CapBnd: 0000000000000000 ← 0: nothing in the bounding set
CapAmb: 0000000000000000 ← 0: nothing ambient
UID 0 with all 64 capability bits cleared. The function can do nothing root-on-the-host could do.
Configured at internal/sandbox/sandbox.go:154-160:
"-Mo", // mount mode: ownership-mapped userns
"--chroot", rootfs, // pivot_root into the runtime's rootfs
"-R", cfg.CodeDir + ":/code", // read-only bind mount of the function's code
"-T", "/tmp", // private tmpfs at /tmp- The runtime rootfs (
/var/lib/orva/rootfs/<runtime>/) is the only filesystem visible. It contains the language runtime + standard libs. /codeis a read-only bind mount of the function's code directory (<dataDir>/functions/<id>/current→ symlink toversions/<hash>/). Read-only means the function cannot write back to its own code dir, which prevents persistent compromise of the deployment artifact./tmpis a fresh tmpfs per spawn — wiped when the worker exits or is reaped. Functions can write here freely; nothing escapes.- No other host paths are visible. The procfs, sysfs, and cgroup filesystems are masked by the chroot.
The -Mo flag puts the function in a user namespace where it appears
to have UID 0 but holds zero capabilities outside that namespace.
nsjail does not need to call prctl(PR_SET_NO_NEW_PRIVS) separately —
the user namespace gives the same effect for cross-namespace operations.
Inside the namespace, the function's CapEff reads as 0
(/proc/self/status). This means even calls that the kernel checks
against effective caps (CAP_NET_ADMIN, CAP_SYS_PTRACE,
CAP_SYS_MODULE, CAP_SYS_BOOT, etc.) all fail.
internal/sandbox/seccomp.go defines a Kafel-syntax policy passed to
nsjail via --seccomp_string (sandbox.go:198). Three policies ship:
default(used by all functions unless overridden): allows the ~250 syscalls modern Node/Python need; blocks the rest. Notably blocks:mount,umount,pivot_root,unshare,clone(CLONE_NEWUSER),bpf,kexec_load,init_module,delete_module,iopl,swapon,reboot,setns,userfaultfd.strict: tightensdefaultfurther by blocking many*atcalls and most ioctls. Use for high-trust untrusted code.permissive: blocks only the unambiguously dangerous syscalls (init_module,kexec_load, etc.). Use during dev when you need to run something that the default policy refuses (e.g., a custom runtime that callsunshare).
The policy is compiled by BuildSeccompPolicy (seccomp.go:136) and
passed inline; it covers ~150 syscall denials in the default config.
Per-function caps are enforced in two places:
- Per-process:
--rlimit_as max(sandbox.go:158) caps virtual address space at the worker's per-cgroup memory budget. - Per-cgroup (when cgroup v2 delegation is available, sandbox.go:170-194):
memory.maxat 1.5× the declaredmemory_mb. The 0.5× headroom lets the kernel reclaim via PSI pressure before OOM-killing. The 1.5× factor matches the autoscaler's per-worker admission budget.pids.maxat the configuredMaxPids(default 64).cpu.maxviacgroup_cpu_ms_per_sec(e.g.,cpus: 0.5→ 500 ms of CPU per 1000 ms wall — fractional CPU as bandwidth, not affinity, so the scheduler can load-balance freely).
The host-wide concurrency cap (cfg.Sandbox.MaxConcurrent, see the
TOO_MANY_REQUESTS error) is enforced at the Go layer in
internal/sandbox/limiter.go — sandbox spawns wait or fail-fast there
before any nsjail process is created.
network_mode: none (default for new functions) sets up the function in
a fresh network namespace with no interfaces other than loopback.
Outbound HTTP from a handler fails with ENETUNREACH — DNS, TCP, UDP all
blocked. This is the safe default; flip a function to egress only if it
genuinely needs to call out.
network_mode: egress — opt-in per function. Adds nsjail's
--user_net flag, which gives the sandbox a userspace TCP/UDP stack
that NATs out via the host. Host network interfaces are still not
exposed; the function can dial outbound but can't see (or be seen by)
other tenants on the same node. Use this for handlers that talk to
Stripe, OpenAI, your DB, or any external API.
Switching the toggle drains the warm pool so the next invocation respawns with the new mode within seconds. The Functions list in the UI shows an "egress" badge on rows that have it on, so operators can audit at a glance which functions can talk to the network.
Future modes (egress+allowlist / private) would extend the same
field without another schema migration.
When a function is in egress mode, a global blocklist applies on top:
specific IPs/CIDRs/hostnames that no function can reach regardless of
the per-function toggle. The list is managed entirely from the UI at
/web/firewall (or via POST /api/v1/firewall/rules) — there is no
config file. Rules live in the egress_blocklist SQLite table and
orvad re-applies them via nftables on every change + every 10 s tick.
Coexisting with existing nftables rules. orvad creates and owns
exactly one nftables table named inet orva_firewall. It does not
touch any other table on the host. If you already run nftables for
your own purposes (host firewall, fail2ban, k8s CNI), our rules sit
alongside yours — Linux's nftables evaluates every matching rule in
the OUTPUT chain regardless of which table they're in.
sudo nft list ruleset
# Look for:
# table inet orva_firewall { ... }
# next to whatever else you're running.orvad will rebuild only its own table on each refresh; your tables are
untouched. The bare-metal install script (scripts/install.sh) detects
existing nftables config at install time and prints an info line so
you know they coexist.
When nftables is unavailable. If nft isn't installed or the
kernel can't load nf_tables (rare modern distros, OpenWrt, BSD), the
install script logs a warning and the firewall feature degrades:
- Per-function
network_mode: egressstill works — functions still get an isolated net namespace + nstun userspace networking. - The Firewall page still loads; you can manage rules in the DB.
- The page shows an amber "nftables unavailable" banner.
- No packets are filtered — host-level iptables / cloud security group / VPC firewall is your fallback.
This is intentional: orva will run regardless of host firewall state, and the operator gets a clear yes/no signal at install time.
The orvad process inside the orva container runs as UID 0 inside
the container. This is intentional and required: nsjail needs
CAP_SYS_ADMIN to set up the user namespace, mount namespace, and
cgroup hierarchy that isolate the function. Without those caps, the
sandbox can't be constructed and there's no isolation at all.
The container itself is launched with:
--pid host
--cgroupns host
--cap-add SYS_ADMIN
--cap-add NET_ADMIN
--security-opt seccomp=unconfined
--security-opt apparmor=unconfined
--security-opt systempaths=unconfined
--device /dev/net/tun
--pid host + --cgroupns host let nsjail enroll each sandbox PID in the
host cgroup hierarchy (per-function memory.max / cpu.max); without them
sandbox spawn fails. On the kata-clh runtime the guest kernel provides this
delegation, so those two are not needed there.
These apply only to the orva container, not to functions. Within the container, only the orvad process and its child nsjails benefit; user functions inherit nothing through nsjail's capability drop.
If you want to run orvad itself as a non-root UID inside its container, that's a separate hardening exercise (rootless Docker is the standard path). Tracking issue / future work — let us know if this matters for your deployment.
"I saw `WARNING: Running pip as the 'root' user' in the build log earlier. Was my function being installed as root?"
The pip command runs inside the orva container during the build
phase, NOT inside your function's sandbox. pip sees getuid() == 0
because orvad is UID 0 inside the container; the warning is pip
complaining about its own environment, not anything about your code.
Suppressed in current builds via --root-user-action=ignore. Your
function still runs in the user-namespace-isolated sandbox at runtime.
"Can a function read another function's secrets?"
No. Secrets are decrypted by orvad and injected as --env KEY=VAL
flags at sandbox spawn time (sandbox.go:204-209). Each sandbox sees
only its own function's secrets in its environment; the secret
material never leaves orvad and is never readable from any sandbox
filesystem.
"Can a function read another function's code?"
No. Each function's code lives at
<dataDir>/functions/<id>/versions/<hash>/. Only the active version's
directory is bind-mounted into the sandbox at /code, and the bind is
read-only. There is no shared /functions/ mount.
"What happens if a function tries to fork-bomb?"
cgroup pids.max (default 64) caps the process tree. Spawning past
that limit fails with EAGAIN inside the sandbox. The orvad scheduler
also tracks per-pool memory reservations and refuses to admit new
workers when host memory budget is exhausted (see
internal/pool/hostmem.go).
"What can the in-product AI assistant do, and where do its provider keys live?"
The assistant (the dashboard's AI section) is gated behind the
admin permission — every /api/v1/ai/* route requires it (the
dashboard session resolves to admin; a non-admin API key gets 401/403).
It operates the instance through the same operator tools the MCP
server exposes, so it can do anything an admin can do via the API, and
nothing more. Two gates sit in front of every mutation: the
per-conversation approval policy (all_writes / destructive_only
/ auto), which can pause a write for human approval before it runs,
and a code-enforced confirm=true requirement on destructive tools.
Conversations are a shared operator space: because the feature is single-operator and admin-only, all admin credentials see the same conversation list (they are not isolated per credential). BYO provider API keys are encrypted at rest with the same AES-256-GCM cipher as function secrets, are decrypted only in orvad at request time, and are never returned by the API or echoed into the chat (asserted by the e2e suite). The system prompt also instructs the model never to print secret/key/token values it encounters while operating the instance.
A provider's base_url is intentionally unrestricted (it may point at a
local endpoint such as Ollama or LM Studio — a first-class homelab use
case). Since configuring a provider is admin-only, the SSRF surface this
opens is an admin-only one and is accepted by design.
Orva already gives you defense in depth on a single host: a Docker container around orvad, nsjail-per-function inside it, seccomp + cgroup limits inside nsjail. Operators sometimes ask about wrapping the whole stack in another isolation layer.
Hypervisor-class (Kata Containers, Firecracker, full VMs). Compose cleanly with Orva — they put a real Linux kernel under the entire orvad container, so nsjail's namespace API works unchanged inside the guest. Cost is a guest kernel per Orva instance.
Kata is the configuration we've verified end-to-end (2026-05-13, Kata
3.30.0). Both QEMU and Cloud Hypervisor work; recommend
docker run --runtime=kata-clh … (smaller Rust-based hypervisor TCB,
~14 s faster container-start, throughput identical to QEMU within
noise). Orva itself is unchanged underneath. Full operator guide
including measured performance cost: docs/KATA.md.
gVisor (runsc) — does NOT work. End-to-end testing on 2026-05-13
(gVisor release-20260504.0, both ptrace and kvm platforms)
confirmed that nsjail's per-function sandbox setup needs
clone(CLONE_NEW…) for seven namespaces, and gVisor's user-space
kernel rejects the combination with EINVAL. The Orva daemon starts
under runsc and the HTTP API is reachable, but every function
invocation fails with WORKER_CRASHED. This is architectural, not a
bug — gVisor doesn't expose nested-namespace primitives, and nsjail
can't run without them. Full reproduction + alternatives in
docs/GVISOR.md.
internal/sandbox/sandbox.go— nsjail invocationinternal/sandbox/seccomp.go— syscall policyinternal/sandbox/limiter.go— host-wide concurrency capinternal/pool/hostmem.go— memory admission controlinternal/proxy/proxy.go— request → sandbox bridge
If you find a security issue please open a private security advisory on the GitHub repository (Settings → Security → Advisories) rather than a public issue.