-
Notifications
You must be signed in to change notification settings - Fork 0
Home
arielgoes edited this page May 11, 2026
·
2 revisions
OperAID is an open-source testbed for evaluating LLM agents as autonomous operators of 5G Core networks deployed on Kubernetes. It implements a closed-loop pipeline:
Fault Injection → Agentic Diagnosis → Remediation → Execution-Based Verification
The framework targets Open5GS by default (via openverso-charts) but is generalized through deployment profiles so any Kubernetes workload can be tested.
- Quick Start — install, env setup, first run
- Architecture — components and pipeline overview
- Fault Scenarios — S1 NetworkPolicy, S2 ConfigMap, S3 UPF Scale
- Deployment Profiles — the JSON contract that drives everything
- Diagnosis Engine — multi-turn LLM agent, prompts, retry logic
-
Diagnostic Tools — built-in
kubectl_*tools and custom-tool extension - Safety & Guardrails — command allowlists, dangerous-pattern filters, namespace pinning
-
Running Experiments —
run_experiment.shflow and CLI flags - Suite Configuration — YAML suites, experiment matrices
-
Results & Outputs — directory layout,
summary.csv,suite_statistics.json - Visualization — paper figures and stats regeneration
-
Configuration Reference —
config.envand environment variables
| Metric | Value |
|---|---|
| Overall LLM success rate | 36.0% |
| Average with tools | 70.7% |
| Average without tools | 7.1% |
| Best small model (3B active params) | Qwen3.5-35b-a3b — 93.3% with tools |
Tool access raises average success from 7.1% to 70.7% (+63.6 pp). The hardest scenario is S1 (NetworkPolicy) at 16.0%; the easiest is S3 (UPF scaled to 0) at 49.3%.
@inproceedings{operaid2026,
title={OperAID: Benchmarking LLM Agents for Autonomous Kubernetes Fault Remediation},
author={de Castro, Ariel G. and Vandikas, Konstantinos and Ferlin-Reiter, Simone and Chiesa, Marco and Rothenberg, Christian E.},
booktitle={IEEE NetSoft Trust 6G-Net Workshop},
year={2026}
}