Skip to content

erille/ai-ops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI-Ops MVP

Status Platform Target OS Python Shell Security License

Secured AI-Ops lab for chat-driven Linux diagnostics and remediation through a policy-based executor, human approval workflow, and full audit logging.

Description

AI-Ops MVP provides a controlled execution layer between a conversational AI assistant and Linux target servers.

It is designed for safe experimentation with AI-assisted operations:

  • no direct SSH from the assistant
  • secured executor wrapper
  • policy-based command classification
  • automatic read-only diagnostics
  • human approval for remediation
  • friendly Approve / Deny workflow
  • JSONL audit trail
  • reusable Markdown knowledge base and runbooks

This project is intended as a lab/MVP foundation before moving toward stricter production policies.

AI-Ops MVP is a secured lab architecture for letting a chat-based AI assistant diagnose and remediate Linux server issues through a controlled executor.

The assistant does not connect directly to target servers.

All target commands must go through:

/opt/aiops/run_action.py

The executor enforces command policies, approval requirements, sudo restrictions, and audit logging.

Goals

This project demonstrates a safe AI-Ops workflow:

  • Natural language request through Hermes or another chat gateway.
  • AI assistant diagnoses the target system.
  • Read-only commands can run automatically when allowed by policy.
  • State-changing commands require explicit human approval.
  • All commands are executed through a secured wrapper.
  • Every action is audited.

Architecture

Logical roles:

Role Logical name Description
AI gateway hermes Host running Hermes and the AI-Ops executor
Target server srv1 Managed Linux target server

Runtime path on the AI gateway:

/opt/aiops

Repository path:

/opt/aiops/repo

Knowledge files:

/opt/aiops/repo/knowledge

Components

Hermes host

The Hermes host runs:

  • Hermes chat gateway.
  • AI-Ops executor wrapper.
  • Secured local execution user.
  • SSH key used only by the executor.
  • Audit log.

Main runtime files:

/opt/aiops/config.yml
/opt/aiops/executor.py
/opt/aiops/run_action.py
/opt/aiops/audit.jsonl
/opt/aiops/runtime/

Target server

The target server contains:

  • Restricted aiops user.
  • Authorized public SSH key from the Hermes host.
  • Limited sudoers rules for lab diagnostics and remediation.

Security model

The assistant must never use direct SSH.

The only valid command path is:

echo '{"target":"srv1","command":"hostname","requester":"manual","reason":"test"}' | sudo -u aiopsexec /opt/aiops/run_action.py

Security layers:

  1. Hermes user can only call the wrapper as aiopsexec.
  2. aiopsexec can only use the configured executor.
  3. The executor applies policy rules.
  4. The target user has restricted sudo permissions.
  5. All actions are logged to /opt/aiops/audit.jsonl.

Repository layout

README.md
prepare-hermes-host.sh
setup-aiops-mvp.sh
files/
  executor.py
  run_action.py
knowledge/
  AI_OPS_RULES.md
  EXECUTOR_USAGE.md
  HERMES_BOOTSTRAP_PROMPT.md
  INCIDENT_WORKFLOW.md
  SYSTEMD_TROUBLESHOOTING.md
  ENVIRONMENT.md
  RUNBOOK_NGINX_BAD_CONFIG.md
  RUNBOOK_BROKEN_DEMO_203_EXEC.md

Fresh installation workflow

1. Prepare the Hermes host

Run on the future Hermes host as root:

cd /opt/aiops/repo
./prepare-hermes-host.sh

This script:

  • Creates the hermes system user.
  • Creates /opt/hermes.
  • Enables systemd linger for the hermes user.
  • Installs base packages.
  • Prepares local directories.

It does not install Hermes itself.

After this step, install Hermes using the official Hermes installer as the hermes user.

Example:

su - hermes
# Run the official Hermes installer from the Hermes documentation.

2. Install AI-Ops runtime on Hermes

Run as root on the Hermes host:

cd /opt/aiops/repo
./setup-aiops-mvp.sh hermes <TARGET_IP>

Example:

./setup-aiops-mvp.sh hermes <TARGET_IP>

This creates:

  • /opt/aiops
  • aiopsexec local user
  • SSH key pair for target access
  • /opt/aiops/config.yml
  • /opt/aiops/executor.py
  • /opt/aiops/run_action.py
  • /opt/aiops/runtime
  • /opt/aiops/audit.jsonl
  • sudoers rule allowing hermes to call the wrapper as aiopsexec

At the end, copy the displayed public SSH key.

3. Install target-side AI-Ops account

Run as root on the target server:

cd /opt/aiops/repo
./setup-aiops-mvp.sh srv1 "ssh-ed25519 <PUBLIC_KEY> aiops-hermes"

This creates:

  • aiops user
  • authorized SSH key
  • restricted lab sudoers rules
  • sudoers validation with visudo

Validation

From the Hermes host:

echo '{"target":"srv1","command":"hostname","requester":"manual","reason":"fresh setup test"}' | sudo -u aiopsexec /opt/aiops/run_action.py

Expected result:

{
  "status": "executed",
  "risk": "auto_allowed",
  "target": "srv1",
  "command": "hostname"
}

Policy model

The MVP uses lab mode:

unknown_default: approval_required

Meaning:

  • Known safe read-only commands are auto_allowed.
  • Explicitly dangerous commands are forbidden.
  • Everything else requires human approval.

For production, use a stricter mode:

unknown_default: reject

Production sudoers rules should be restricted per command, service, and path.

Friendly approval workflow

When a command requires approval, the wrapper returns:

{
  "status": "approval_requested",
  "risk": "approval_required",
  "message": "Approval required. Reply Approve or Deny."
}

Hermes should show:

Approval required
Target: srv1
Command: <command>
Reason: <reason>
Reply: Approve or Deny

If the user replies:

Approve

Hermes must execute:

echo "Approve" | sudo -u aiopsexec /opt/aiops/run_action.py

If the user replies:

Deny

Hermes must execute:

echo "Deny" | sudo -u aiopsexec /opt/aiops/run_action.py

The user should not manually provide an approval_id.

Updating an existing AI-Ops installation

A simple git pull only updates the repository files under:

/opt/aiops/repo

It does not update the runtime files used by AI-Ops:

/opt/aiops/executor.py
/opt/aiops/run_action.py
/opt/aiops/runtime/

Standard update workflow:

cd /opt/aiops/repo
git pull
./setup-aiops-mvp.sh update-aiops
./setup-aiops-mvp.sh print-bootstrap

Then copy and paste the print-bootstrap output into Telegram or the active Hermes chat.

Runtime update

Run:

./setup-aiops-mvp.sh update-aiops

This updates only:

  • /opt/aiops/executor.py
  • /opt/aiops/run_action.py
  • /opt/aiops/runtime/

It does not modify:

  • /opt/aiops/config.yml
  • SSH keys
  • known_hosts
  • system users
  • sudoers files

Reload Hermes knowledge

Run:

./setup-aiops-mvp.sh print-bootstrap

Copy the output into Telegram or the active Hermes chat.

This reloads:

  • bootstrap prompt
  • AI-Ops rules
  • executor rules
  • incident workflow
  • systemd troubleshooting rules
  • runbooks
  • friendly approval workflow

Knowledge bootstrap

After installation or update, send the bootstrap prompt to Hermes:

./setup-aiops-mvp.sh print-bootstrap

Hermes must confirm that:

  • AI-Ops rules are loaded.
  • Direct SSH is forbidden.
  • Executor wrapper is required.
  • Remediation requires approval.
  • Friendly Approve/Deny workflow is active.

Example: read-only command

echo '{"target":"srv1","command":"uptime","requester":"manual","reason":"healthcheck"}' | sudo -u aiopsexec /opt/aiops/run_action.py

Expected:

risk: auto_allowed

Example: approval-required command

echo '{"target":"srv1","command":"sudo systemctl restart nginx","requester":"manual","reason":"restart service"}' | sudo -u aiopsexec /opt/aiops/run_action.py

Expected:

status: approval_requested
risk: approval_required

Approve:

echo "Approve" | sudo -u aiopsexec /opt/aiops/run_action.py

Deny:

echo "Deny" | sudo -u aiopsexec /opt/aiops/run_action.py

Audit log

All executor decisions are logged to:

/opt/aiops/audit.jsonl

Example:

tail -n 20 /opt/aiops/audit.jsonl

The audit log records:

  • target
  • command
  • requester
  • reason
  • approval ID
  • risk
  • status
  • exit code
  • stdout
  • stderr
  • timestamp

Runbooks

Included validated lab runbooks:

  • RUNBOOK_NGINX_BAD_CONFIG.md
  • RUNBOOK_BROKEN_DEMO_203_EXEC.md

These documents describe known incident patterns and safe remediation flows.

Current limitations

This is a lab MVP.

Known limitations:

  • Approval buttons are not yet integrated directly into Telegram.
  • Hermes must reload knowledge through chat after repository updates.
  • Lab sudoers permissions are intentionally permissive.
  • Production mode requires stricter policies and sudoers rules.
  • Multi-target inventory is basic.
  • No Teams approval workflow yet.

Roadmap

Planned improvements:

  • Native Telegram or Teams approval buttons.
  • Multi-target inventory.
  • Production policy mode.
  • Per-service runbooks.
  • Better structured incident reports.
  • Optional Ansible playbook integration.
  • OpenTelemetry or SIEM export for audit logs.

About

Secured AI-Ops lab for chat-driven Linux diagnostics and remediation through a policy-based executor, human approval workflow, and audit logging.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors