MMT-RCA — Montimage Root Cause Analysis Platform

Generic, multi-method anomaly detection and root cause analysis platform. Connects to any monitoring data source via config file. No code changes required per integration.

Architecture

Data Source (MQTT/Kafka/HTTP)
        │
        ▼
   [Collector]  ── reads config/mmt-rca.yml
        │           maps raw messages → observations
        ▼
  [Analysis API]  ── FastAPI (port 8000)
        │
        ├── Statistical anomaly detector (z-score)
        ├── Isolation Forest detector (unsupervised ML)
        ├── Similarity engine (adjusted cosine vs. knowledge base)
        ├── SHAP attribution (which attributes drove the result)
        └── LLM synthesis (Ollama + llama3.1 → root cause narrative)
        │
        ▼
  [PostgreSQL + TimescaleDB + pgvector]
  [Redis]  ── real-time pub/sub

Quick Start

Prerequisites

Docker + Docker Compose
8 GB RAM minimum (for llama3.1; use llama3.2:3b for lighter machines)

1. Configure environment

cp .env.example .env
# Edit .env if needed (DB password, model choice)

On macOS: Run Ollama natively for GPU (Metal) acceleration:

brew install ollama
ollama serve          # runs on localhost:11434
ollama pull llama3.1

Then set in .env:

OLLAMA_URL=http://host.docker.internal:11434

And remove the ollama and ollama-init services from docker-compose or override with docker-compose.dev.yml.

2. Start the stack

make up
# or: docker compose up -d

The first start downloads the llama3.1 model (~4.7 GB). Monitor with:

make logs          # all services
docker compose logs -f ollama-init   # model download progress

3. Verify health

curl http://localhost:8000/health
# {"status":"ok","db":true,"ollama":true,"ollama_model":"llama3.1"}

4. Test the analysis endpoint (no MQTT needed)

curl -X POST http://localhost:8000/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "project_id": "default",
    "observation": {
      "timestamp": "2024-01-15T14:31:42Z",
      "source_id": "gateway-01",
      "attributes": {
        "cpu": 0.97,
        "ram": 0.04,
        "nb_conn": 450,
        "ms_delay": 2850,
        "recv_rate": 0.02
      }
    }
  }'

Response:

{
  "event_id": "...",
  "event_type": "UNKNOWN",
  "anomaly_score": 0.82,
  "best_match": null,
  "top_k_matches": [],
  "contributing_attributes": {"ms_delay": 0.71, "recv_rate": 0.18, ...},
  "rca_narrative": "High message delay and near-zero receive rate suggest upstream network congestion or link failure.",
  "rca_confidence": "MEDIUM",
  "rca_actions": ["Check upstream ISP status", "Inspect gateway-01 network interface", "..."],
  "detector_results": [...]
}

Building the Knowledge Base

Step 1 — Record a normal baseline

# Start a learning session (type NORMAL)
SESSION=$(curl -s -X POST http://localhost:8000/learning/sessions \
  -H "Content-Type: application/json" \
  -d '{"project_id":"default","label":"Normal operation","event_type":"NORMAL"}' \
  | jq -r .id)

# Add observations manually, or let the collector run during normal operation
# Then stop the session — this triggers feature extraction and KB entry creation
curl -X POST http://localhost:8000/learning/sessions/$SESSION/stop

Step 2 — Record known incident patterns

# Trigger/simulate the incident on the monitored system, then:
SESSION=$(curl -s -X POST http://localhost:8000/learning/sessions \
  -H "Content-Type: application/json" \
  -d '{"project_id":"default","label":"DoS attack — HTTP flood","event_type":"INCIDENT",
       "description":"Multiple requests from several sources. Root cause: potential DDoS."}' \
  | jq -r .id)

# Wait while the incident runs, then stop:
curl -X POST http://localhost:8000/learning/sessions/$SESSION/stop

Step 3 — Monitor in real time

Configure config/mmt-rca.yml with your MQTT/Kafka source, then:

make restart-collector

The collector sends every observation to the analysis service, which now matches against the knowledge base.

Integrating a Data Source

Edit config/mmt-rca.yml:

project: my-client

inputs:
  - name: iot_sensors
    adapter: mqtt
    broker: "mqtt.client.example.com:1883"
    topics:
      - "sensors/+/data"
    feature_map:
      "$.cpu_pct": "cpu"
      "$.free_mem_mb": "mem_free"
      "$.rx_bytes_s": "recv_rate"
      "$.latency_ms": "ms_delay"
    group_by: "$.device_id"
    window_seconds: 30

Then make restart-collector. No code changes needed.

API Reference

Method	Path	Description
GET	`/health`	Service health + Ollama status
POST	`/analyze`	Analyze one observation → RCA report
GET	`/events/{project_id}`	List detected events (paginated)
GET	`/events/{project_id}/{id}`	Get single event detail
POST	`/projects`	Create a project
POST	`/learning/sessions`	Start a learning session
POST	`/learning/sessions/{id}/stop`	Stop + build KB entry
GET	`/learning/kb/{project_id}`	List knowledge base entries

Makefile Targets

make up              # start all services
make up-dev          # start with hot reload
make down            # stop all services
make logs            # tail all logs
make db-shell        # open psql
make ollama-pull     # manually pull a model
make restart-analysis
make restart-collector
make clean           # remove containers + build cache

Choosing a Model

Model	Size	Speed	Quality	Recommended for
`llama3.1` (8B)	4.7 GB	Medium	High	Production
`llama3.2:3b`	2.0 GB	Fast	Medium	Development / low RAM
`phi3:mini`	2.3 GB	Fast	Medium	Edge deployment
`mistral:7b`	4.1 GB	Medium	High	Alternative to llama3.1

Set model: OLLAMA_MODEL=llama3.2:3b in .env, then make ollama-pull.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
analysis		analysis
collector		collector
config		config
db		db
doc		doc
mosquitto		mosquitto
scripts		scripts
simulator		simulator
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MMT-RCA — Montimage Root Cause Analysis Platform

Architecture

Quick Start

Prerequisites

1. Configure environment

2. Start the stack

3. Verify health

4. Test the analysis endpoint (no MQTT needed)

Building the Knowledge Base

Step 1 — Record a normal baseline

Step 2 — Record known incident patterns

Step 3 — Monitor in real time

Integrating a Data Source

API Reference

Makefile Targets

Choosing a Model

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MMT-RCA — Montimage Root Cause Analysis Platform

Architecture

Quick Start

Prerequisites

1. Configure environment

2. Start the stack

3. Verify health

4. Test the analysis endpoint (no MQTT needed)

Building the Knowledge Base

Step 1 — Record a normal baseline

Step 2 — Record known incident patterns

Step 3 — Monitor in real time

Integrating a Data Source

API Reference

Makefile Targets

Choosing a Model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages