Secured AI-Ops lab for chat-driven Linux diagnostics and remediation through a policy-based executor, human approval workflow, and full audit logging.
AI-Ops MVP provides a controlled execution layer between a conversational AI assistant and Linux target servers.
It is designed for safe experimentation with AI-assisted operations:
- no direct SSH from the assistant
- secured executor wrapper
- policy-based command classification
- automatic read-only diagnostics
- human approval for remediation
- friendly Approve / Deny workflow
- JSONL audit trail
- reusable Markdown knowledge base and runbooks
This project is intended as a lab/MVP foundation before moving toward stricter production policies.
AI-Ops MVP is a secured lab architecture for letting a chat-based AI assistant diagnose and remediate Linux server issues through a controlled executor.
The assistant does not connect directly to target servers.
All target commands must go through:
/opt/aiops/run_action.py
The executor enforces command policies, approval requirements, sudo restrictions, and audit logging.
This project demonstrates a safe AI-Ops workflow:
- Natural language request through Hermes or another chat gateway.
- AI assistant diagnoses the target system.
- Read-only commands can run automatically when allowed by policy.
- State-changing commands require explicit human approval.
- All commands are executed through a secured wrapper.
- Every action is audited.
Logical roles:
| Role | Logical name | Description |
|---|---|---|
| AI gateway | hermes | Host running Hermes and the AI-Ops executor |
| Target server | srv1 | Managed Linux target server |
Runtime path on the AI gateway:
/opt/aiops
Repository path:
/opt/aiops/repo
Knowledge files:
/opt/aiops/repo/knowledge
The Hermes host runs:
- Hermes chat gateway.
- AI-Ops executor wrapper.
- Secured local execution user.
- SSH key used only by the executor.
- Audit log.
Main runtime files:
/opt/aiops/config.yml
/opt/aiops/executor.py
/opt/aiops/run_action.py
/opt/aiops/audit.jsonl
/opt/aiops/runtime/
The target server contains:
- Restricted
aiopsuser. - Authorized public SSH key from the Hermes host.
- Limited sudoers rules for lab diagnostics and remediation.
The assistant must never use direct SSH.
The only valid command path is:
echo '{"target":"srv1","command":"hostname","requester":"manual","reason":"test"}' | sudo -u aiopsexec /opt/aiops/run_action.py
Security layers:
- Hermes user can only call the wrapper as
aiopsexec. aiopsexeccan only use the configured executor.- The executor applies policy rules.
- The target user has restricted sudo permissions.
- All actions are logged to
/opt/aiops/audit.jsonl.
README.md
prepare-hermes-host.sh
setup-aiops-mvp.sh
files/
executor.py
run_action.py
knowledge/
AI_OPS_RULES.md
EXECUTOR_USAGE.md
HERMES_BOOTSTRAP_PROMPT.md
INCIDENT_WORKFLOW.md
SYSTEMD_TROUBLESHOOTING.md
ENVIRONMENT.md
RUNBOOK_NGINX_BAD_CONFIG.md
RUNBOOK_BROKEN_DEMO_203_EXEC.md
Run on the future Hermes host as root:
cd /opt/aiops/repo
./prepare-hermes-host.sh
This script:
- Creates the
hermessystem user. - Creates
/opt/hermes. - Enables systemd linger for the
hermesuser. - Installs base packages.
- Prepares local directories.
It does not install Hermes itself.
After this step, install Hermes using the official Hermes installer as the hermes user.
Example:
su - hermes
# Run the official Hermes installer from the Hermes documentation.
Run as root on the Hermes host:
cd /opt/aiops/repo
./setup-aiops-mvp.sh hermes <TARGET_IP>
Example:
./setup-aiops-mvp.sh hermes <TARGET_IP>
This creates:
/opt/aiopsaiopsexeclocal user- SSH key pair for target access
/opt/aiops/config.yml/opt/aiops/executor.py/opt/aiops/run_action.py/opt/aiops/runtime/opt/aiops/audit.jsonl- sudoers rule allowing
hermesto call the wrapper asaiopsexec
At the end, copy the displayed public SSH key.
Run as root on the target server:
cd /opt/aiops/repo
./setup-aiops-mvp.sh srv1 "ssh-ed25519 <PUBLIC_KEY> aiops-hermes"
This creates:
aiopsuser- authorized SSH key
- restricted lab sudoers rules
- sudoers validation with
visudo
From the Hermes host:
echo '{"target":"srv1","command":"hostname","requester":"manual","reason":"fresh setup test"}' | sudo -u aiopsexec /opt/aiops/run_action.py
Expected result:
{
"status": "executed",
"risk": "auto_allowed",
"target": "srv1",
"command": "hostname"
}
The MVP uses lab mode:
unknown_default: approval_required
Meaning:
- Known safe read-only commands are
auto_allowed. - Explicitly dangerous commands are
forbidden. - Everything else requires human approval.
For production, use a stricter mode:
unknown_default: reject
Production sudoers rules should be restricted per command, service, and path.
When a command requires approval, the wrapper returns:
{
"status": "approval_requested",
"risk": "approval_required",
"message": "Approval required. Reply Approve or Deny."
}
Hermes should show:
Approval required
Target: srv1
Command: <command>
Reason: <reason>
Reply: Approve or Deny
If the user replies:
Approve
Hermes must execute:
echo "Approve" | sudo -u aiopsexec /opt/aiops/run_action.py
If the user replies:
Deny
Hermes must execute:
echo "Deny" | sudo -u aiopsexec /opt/aiops/run_action.py
The user should not manually provide an approval_id.
A simple git pull only updates the repository files under:
/opt/aiops/repo
It does not update the runtime files used by AI-Ops:
/opt/aiops/executor.py
/opt/aiops/run_action.py
/opt/aiops/runtime/
Standard update workflow:
cd /opt/aiops/repo
git pull
./setup-aiops-mvp.sh update-aiops
./setup-aiops-mvp.sh print-bootstrap
Then copy and paste the print-bootstrap output into Telegram or the active Hermes chat.
Run:
./setup-aiops-mvp.sh update-aiops
This updates only:
/opt/aiops/executor.py/opt/aiops/run_action.py/opt/aiops/runtime/
It does not modify:
/opt/aiops/config.yml- SSH keys
- known_hosts
- system users
- sudoers files
Run:
./setup-aiops-mvp.sh print-bootstrap
Copy the output into Telegram or the active Hermes chat.
This reloads:
- bootstrap prompt
- AI-Ops rules
- executor rules
- incident workflow
- systemd troubleshooting rules
- runbooks
- friendly approval workflow
After installation or update, send the bootstrap prompt to Hermes:
./setup-aiops-mvp.sh print-bootstrap
Hermes must confirm that:
- AI-Ops rules are loaded.
- Direct SSH is forbidden.
- Executor wrapper is required.
- Remediation requires approval.
- Friendly Approve/Deny workflow is active.
echo '{"target":"srv1","command":"uptime","requester":"manual","reason":"healthcheck"}' | sudo -u aiopsexec /opt/aiops/run_action.py
Expected:
risk: auto_allowed
echo '{"target":"srv1","command":"sudo systemctl restart nginx","requester":"manual","reason":"restart service"}' | sudo -u aiopsexec /opt/aiops/run_action.py
Expected:
status: approval_requested
risk: approval_required
Approve:
echo "Approve" | sudo -u aiopsexec /opt/aiops/run_action.py
Deny:
echo "Deny" | sudo -u aiopsexec /opt/aiops/run_action.py
All executor decisions are logged to:
/opt/aiops/audit.jsonl
Example:
tail -n 20 /opt/aiops/audit.jsonl
The audit log records:
- target
- command
- requester
- reason
- approval ID
- risk
- status
- exit code
- stdout
- stderr
- timestamp
Included validated lab runbooks:
RUNBOOK_NGINX_BAD_CONFIG.mdRUNBOOK_BROKEN_DEMO_203_EXEC.md
These documents describe known incident patterns and safe remediation flows.
This is a lab MVP.
Known limitations:
- Approval buttons are not yet integrated directly into Telegram.
- Hermes must reload knowledge through chat after repository updates.
- Lab sudoers permissions are intentionally permissive.
- Production mode requires stricter policies and sudoers rules.
- Multi-target inventory is basic.
- No Teams approval workflow yet.
Planned improvements:
- Native Telegram or Teams approval buttons.
- Multi-target inventory.
- Production policy mode.
- Per-service runbooks.
- Better structured incident reports.
- Optional Ansible playbook integration.
- OpenTelemetry or SIEM export for audit logs.