AI-Ops MVP

Secured AI-Ops lab for chat-driven Linux diagnostics and remediation through a policy-based executor, human approval workflow, and full audit logging.

Description

AI-Ops MVP provides a controlled execution layer between a conversational AI assistant and Linux target servers.

It is designed for safe experimentation with AI-assisted operations:

no direct SSH from the assistant
secured executor wrapper
policy-based command classification
automatic read-only diagnostics
human approval for remediation
friendly Approve / Deny workflow
JSONL audit trail
reusable Markdown knowledge base and runbooks

This project is intended as a lab/MVP foundation before moving toward stricter production policies.

AI-Ops MVP is a secured lab architecture for letting a chat-based AI assistant diagnose and remediate Linux server issues through a controlled executor.

The assistant does not connect directly to target servers.

All target commands must go through:

/opt/aiops/run_action.py

The executor enforces command policies, approval requirements, sudo restrictions, and audit logging.

Goals

This project demonstrates a safe AI-Ops workflow:

Natural language request through Hermes or another chat gateway.
AI assistant diagnoses the target system.
Read-only commands can run automatically when allowed by policy.
State-changing commands require explicit human approval.
All commands are executed through a secured wrapper.
Every action is audited.

Architecture

Logical roles:

Role	Logical name	Description
AI gateway	hermes	Host running Hermes and the AI-Ops executor
Target server	srv1	Managed Linux target server

Runtime path on the AI gateway:

/opt/aiops

Repository path:

/opt/aiops/repo

Knowledge files:

/opt/aiops/repo/knowledge

Components

Hermes host

The Hermes host runs:

Hermes chat gateway.
AI-Ops executor wrapper.
Secured local execution user.
SSH key used only by the executor.
Audit log.

Main runtime files:

/opt/aiops/config.yml
/opt/aiops/executor.py
/opt/aiops/run_action.py
/opt/aiops/audit.jsonl
/opt/aiops/runtime/

Target server

The target server contains:

Restricted aiops user.
Authorized public SSH key from the Hermes host.
Limited sudoers rules for lab diagnostics and remediation.

Security model

The assistant must never use direct SSH.

The only valid command path is:

echo '{"target":"srv1","command":"hostname","requester":"manual","reason":"test"}' | sudo -u aiopsexec /opt/aiops/run_action.py

Security layers:

Hermes user can only call the wrapper as aiopsexec.
aiopsexec can only use the configured executor.
The executor applies policy rules.
The target user has restricted sudo permissions.
All actions are logged to /opt/aiops/audit.jsonl.

Repository layout

README.md
prepare-hermes-host.sh
setup-aiops-mvp.sh
files/
  executor.py
  run_action.py
knowledge/
  AI_OPS_RULES.md
  EXECUTOR_USAGE.md
  HERMES_BOOTSTRAP_PROMPT.md
  INCIDENT_WORKFLOW.md
  SYSTEMD_TROUBLESHOOTING.md
  ENVIRONMENT.md
  RUNBOOK_NGINX_BAD_CONFIG.md
  RUNBOOK_BROKEN_DEMO_203_EXEC.md

Fresh installation workflow

1. Prepare the Hermes host

Run on the future Hermes host as root:

cd /opt/aiops/repo
./prepare-hermes-host.sh

This script:

Creates the hermes system user.
Creates /opt/hermes.
Enables systemd linger for the hermes user.
Installs base packages.
Prepares local directories.

It does not install Hermes itself.

After this step, install Hermes using the official Hermes installer as the hermes user.

Example:

su - hermes
# Run the official Hermes installer from the Hermes documentation.

2. Install AI-Ops runtime on Hermes

Run as root on the Hermes host:

cd /opt/aiops/repo
./setup-aiops-mvp.sh hermes <TARGET_IP>

Example:

./setup-aiops-mvp.sh hermes <TARGET_IP>

This creates:

/opt/aiops
aiopsexec local user
SSH key pair for target access
/opt/aiops/config.yml
/opt/aiops/executor.py
/opt/aiops/run_action.py
/opt/aiops/runtime
/opt/aiops/audit.jsonl
sudoers rule allowing hermes to call the wrapper as aiopsexec

At the end, copy the displayed public SSH key.

3. Install target-side AI-Ops account

Run as root on the target server:

cd /opt/aiops/repo
./setup-aiops-mvp.sh srv1 "ssh-ed25519 <PUBLIC_KEY> aiops-hermes"

This creates:

aiops user
authorized SSH key
restricted lab sudoers rules
sudoers validation with visudo

Validation

From the Hermes host:

echo '{"target":"srv1","command":"hostname","requester":"manual","reason":"fresh setup test"}' | sudo -u aiopsexec /opt/aiops/run_action.py

Expected result:

{
  "status": "executed",
  "risk": "auto_allowed",
  "target": "srv1",
  "command": "hostname"
}

Policy model

The MVP uses lab mode:

unknown_default: approval_required

Meaning:

Known safe read-only commands are auto_allowed.
Explicitly dangerous commands are forbidden.
Everything else requires human approval.

For production, use a stricter mode:

unknown_default: reject

Production sudoers rules should be restricted per command, service, and path.

Friendly approval workflow

When a command requires approval, the wrapper returns:

{
  "status": "approval_requested",
  "risk": "approval_required",
  "message": "Approval required. Reply Approve or Deny."
}

Hermes should show:

Approval required
Target: srv1
Command: <command>
Reason: <reason>
Reply: Approve or Deny

If the user replies:

Approve

Hermes must execute:

echo "Approve" | sudo -u aiopsexec /opt/aiops/run_action.py

If the user replies:

Deny

Hermes must execute:

echo "Deny" | sudo -u aiopsexec /opt/aiops/run_action.py

The user should not manually provide an approval_id.

Updating an existing AI-Ops installation

A simple git pull only updates the repository files under:

/opt/aiops/repo

It does not update the runtime files used by AI-Ops:

/opt/aiops/executor.py
/opt/aiops/run_action.py
/opt/aiops/runtime/

Standard update workflow:

cd /opt/aiops/repo
git pull
./setup-aiops-mvp.sh update-aiops
./setup-aiops-mvp.sh print-bootstrap

Then copy and paste the print-bootstrap output into Telegram or the active Hermes chat.

Runtime update

Run:

./setup-aiops-mvp.sh update-aiops

This updates only:

/opt/aiops/executor.py
/opt/aiops/run_action.py
/opt/aiops/runtime/

It does not modify:

/opt/aiops/config.yml
SSH keys
known_hosts
system users
sudoers files

Reload Hermes knowledge

Run:

./setup-aiops-mvp.sh print-bootstrap

Copy the output into Telegram or the active Hermes chat.

This reloads:

bootstrap prompt
AI-Ops rules
executor rules
incident workflow
systemd troubleshooting rules
runbooks
friendly approval workflow

Knowledge bootstrap

After installation or update, send the bootstrap prompt to Hermes:

./setup-aiops-mvp.sh print-bootstrap

Hermes must confirm that:

AI-Ops rules are loaded.
Direct SSH is forbidden.
Executor wrapper is required.
Remediation requires approval.
Friendly Approve/Deny workflow is active.

Example: read-only command

echo '{"target":"srv1","command":"uptime","requester":"manual","reason":"healthcheck"}' | sudo -u aiopsexec /opt/aiops/run_action.py

Expected:

risk: auto_allowed

Example: approval-required command

echo '{"target":"srv1","command":"sudo systemctl restart nginx","requester":"manual","reason":"restart service"}' | sudo -u aiopsexec /opt/aiops/run_action.py

Expected:

status: approval_requested
risk: approval_required

Approve:

echo "Approve" | sudo -u aiopsexec /opt/aiops/run_action.py

Deny:

echo "Deny" | sudo -u aiopsexec /opt/aiops/run_action.py

Audit log

All executor decisions are logged to:

/opt/aiops/audit.jsonl

Example:

tail -n 20 /opt/aiops/audit.jsonl

The audit log records:

target
command
requester
reason
approval ID
risk
status
exit code
stdout
stderr
timestamp

Runbooks

Included validated lab runbooks:

RUNBOOK_NGINX_BAD_CONFIG.md
RUNBOOK_BROKEN_DEMO_203_EXEC.md

These documents describe known incident patterns and safe remediation flows.

Current limitations

This is a lab MVP.

Known limitations:

Approval buttons are not yet integrated directly into Telegram.
Hermes must reload knowledge through chat after repository updates.
Lab sudoers permissions are intentionally permissive.
Production mode requires stricter policies and sudoers rules.
Multi-target inventory is basic.
No Teams approval workflow yet.

Roadmap

Planned improvements:

Native Telegram or Teams approval buttons.
Multi-target inventory.
Production policy mode.
Per-service runbooks.
Better structured incident reports.
Optional Ansible playbook integration.
OpenTelemetry or SIEM export for audit logs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Ops MVP

Description

Goals

Architecture

Components

Hermes host

Target server

Security model

Repository layout

Fresh installation workflow

1. Prepare the Hermes host

2. Install AI-Ops runtime on Hermes

3. Install target-side AI-Ops account

Validation

Policy model

Friendly approval workflow

Updating an existing AI-Ops installation

Runtime update

Reload Hermes knowledge

Knowledge bootstrap

Example: read-only command

Example: approval-required command

Audit log

Runbooks

Current limitations

Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
files		files
knowledge		knowledge
.gitignore		.gitignore
README.md		README.md
prepare-hermes-host.sh		prepare-hermes-host.sh
setup-aiops-mvp.sh		setup-aiops-mvp.sh

Folders and files

Latest commit

History

Repository files navigation

AI-Ops MVP

Description

Goals

Architecture

Components

Hermes host

Target server

Security model

Repository layout

Fresh installation workflow

1. Prepare the Hermes host

2. Install AI-Ops runtime on Hermes

3. Install target-side AI-Ops account

Validation

Policy model

Friendly approval workflow

Updating an existing AI-Ops installation

Runtime update

Reload Hermes knowledge

Knowledge bootstrap

Example: read-only command

Example: approval-required command

Audit log

Runbooks

Current limitations

Roadmap

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages