🛡 RootGuardians

Continuous Security Control Assurance Platform

Baseline-relative. Deterministic. Audit-ready.

Société Générale Hackathon 2026 — Problem Statement 02 Security Control Drift & Misconfiguration Detection

The Problem

40% of breaches come from misconfigured controls, not missing ones. The control was approved, deployed correctly — and then it silently changed. A "temporary" firewall rule stays open for two years. Audit logging is disabled for maintenance and never re-enabled. Encryption is quietly downgraded. Controls change daily. Nobody notices. RootGuardians does.

What RootGuardians Does

Detects drift, not gaps. It compares every live control against the baseline you approved and flags any deviation — with full evidence — in seconds.
Tells you why it matters. Every finding carries a deterministic risk score, a compliance-impact map across 7 frameworks, plain-English business impact, and copy-ready remediation.
Proves it works. It self-scores 100% detection / 0% false positives on the PS2 labeled dataset, logs every scan to a database, and exports an 11-page auditor-ready PDF in one click.

Demo Video

▶️ Watch the full walkthrough (docs/media/demo-video.mp4) — a complete tour: clean baseline, live drift injection, instant detection, attack path, and the one-click audit report.

If the video doesn't play inline on GitHub, click the link to download it. (Not committed yet? See docs/MEDIA_GUIDE.md for how to add demo-video.mp4 or swap in a hosted link.)

Live Demo

Clean baseline → emergency drift injected → score drops live → attack path detected → audit report

_{Healthy baseline — posture 100.0, every control compliant.}	_{Emergency drift — posture 53.5, attack path lit, severity cards red.}
_{Finding detail — confidence, compliance chips, remediation playbook.}	_{Audit report — one-click, auditor-ready, print-to-PDF.}

Key Numbers


🛡 16	Security controls monitored (SSH, firewall, audit, MySQL, Nginx, Docker, Fail2ban, AWS SG)
🎯 100%	Detection rate on the PS2 labeled dataset (1,000 events)
✅ 0%	False-positive rate
☁️ 2	Live Oracle Cloud VMs scanned over agentless SSH
⚡ < 1 sec	To classify the full 1,000-event dataset (≈10,000 events/sec)
📄 11-page	Auto-generated, auditor-ready PDF report
🗂 7	Compliance frameworks mapped (CIS · NIST · ISO 27001 · PCI-DSS · GDPR · RBI-CSF · NIST Zero Trust)

Architecture Overview

                         ┌─────────────────────────────────────────────┐
                         │            Your Laptop / Operator             │
                         │                                               │
   ┌─────────────┐  UI   │   React Frontend (Vite)  ── http :5173        │
   │   Browser   │◀──────┤        │  Dashboard · Drift Lab · VMs ·        │
   └─────────────┘       │        │  History · Evaluation                 │
                         │        ▼  REST / JSON                          │
                         │   FastAPI Backend  ── http :8000              │
                         │        │  deterministic engine + SQLite        │
                         └────────┼──────────────────────────────────────┘
                                  │  agentless SSH (read-only, Paramiko)
                  ┌───────────────┴────────────────┐
                  ▼                                 ▼
         ┌──────────────────┐              ┌──────────────────┐
         │  Oracle Cloud VM1 │              │  Oracle Cloud VM2 │
         │  instance-…0119   │              │  secondvm         │
         │  16 live controls │              │  16 live controls │
         └──────────────────┘              └──────────────────┘

   Optional outbound: OpenAI (AI chat / explanations) · EmailJS (alert emails)

The detection core is a pure deterministic pipeline — given the same inputs it always produces the same outputs. AI is an optional layer that only ever rewrites human-readable prose; it never touches a number.

Sources ─▶ Connectors ─▶ Normalizer ─▶ Drift Engine ─▶ Compliance ─▶ Risk + Confidence
                                                                          │
   Attack-Path ◀─ Explanation ◀─ Remediation ◀─ Exception (waiver) ◀──────┘
        │
        ▼
   Dashboard · SQLite History · Evaluation Scoreboard · PDF/CSV Audit Export

Features

Feature	Description	PS2 Requirement
Baseline registry	Versioned `baseline.json` of 16 approved controls (expected state, criticality, compliance).	Establish
Agentless SSH scanning	Read-only Paramiko collectors snapshot live Linux hosts; nothing is installed on the target.	Monitor
AWS Security-Group connector	Same engine assesses cloud SG ingress rules (sample dataset).	Monitor
Deterministic drift engine	Joins observations against the baseline; any deviation from `expected_state` is drift, with evidence.	Detect
Risk + posture scoring	Auditable `criticality × exposure × compliance` formula; baseline-relative posture (100 = perfect).	Detect / Alert
Compliance mapping	Every control tagged to 7 frameworks; chips render in UI and report.	Alert
Plain-English explanations	What changed, why it matters, business impact, fix — per finding (template, optional AI rewrite).	Alert
Confidence score	Per-finding 0–100 detection confidence from recurrence + exposure + compliance.	Alert
Attack-path analysis	Correlates 2+ co-located active risks into a blast-radius attack chain.	Track
Waiver governance	Time-bounded, auto-expiring exceptions; authorized change ≠ breach.	Track
SQLite scan history	Every scan persisted; posture timeline, drift events, top risks, operator risk.	Track
PDF / CSV audit export	11-page auditor report + CSV, filterable by VM and date range.	Track
Evaluation scoreboard	Self-scores against the labeled dataset: TP/FP/TN/FN, precision, recall, FPR.	Detect
Email alerts	EmailJS alerts on drift / waiver expiry, severity-filtered, deduplicated.	Alert
AI assistant	Context-aware GPT-4o-mini chat answering posture questions in plain English.	Alert
Drift Lab	Inject/reset drift on a live VM to demonstrate end-to-end detection.	Demo

Real-Time Alerts

RootGuardians pushes drift alerts to Email and Slack the moment a control drifts from baseline — severity-filtered and deduplicated so you only hear about what matters.

_{Email alert — drift pushed to your inbox the instant it's detected.}

_{Slack alert — the same drift posted to your channel via Block Kit.}

Waivers

An authorized change isn't a breach. When a drift is legitimate, grant a time-bounded waiver — with a reason, an approver, and an expiry — and that risk stops counting against your posture until it auto-expires. A live countdown keeps the exception honest: it can't be quietly forgotten.

Tech Stack

Layer	Technology
Frontend	React 18, Vite, hand-written CSS, EmailJS (alerts), OpenAI (chat via backend)
Backend	FastAPI, Python 3.10+, SQLite (scan history), Paramiko (agentless SSH), Pydantic v2
AI	GPT-4o-mini — chat + optional explanation rewriting · deterministic engine for all detection
Infrastructure	Oracle Cloud — 2× `VM.Standard.E2.1.Micro` (Ubuntu 20.04)

Quick Start

Prerequisites

Python 3.10+, Node.js 18+, Git
(Optional) an Ubuntu 20.04 VM reachable over SSH for live scanning — sample mode needs nothing.

Installation

git clone https://github.com/surajgojanur/RootGuardians.git
cd RootGuardians
cp .env.example .env          # then edit .env with your keys (optional for sample mode)

Running the Backend

pip install -r backend/requirements.txt
cd backend && uvicorn main:app --port 8000
# API on http://localhost:8000  ·  interactive docs at /docs

Running the Frontend

cd frontend
npm install
npm run dev                   # http://localhost:5173

Adding Your First VM

Open the VMs tab → Add a VM.
Enter host/IP, port, username, and choose your SSH private key file.
Click Add VM — the key is held in backend memory only (never written to disk), the first scan runs immediately, and the VM joins the fleet on the Dashboard.

Prefer the terminal? python drift_detector.py --profile drifted runs a deterministic scan over the sample data and prints the posture, summary, and findings table.

Project Structure

RootGuardians/
├── README.md                       # this file
├── CHANGELOG.md                    # release history
├── drift_detector.py               # PS2 CLI entry point (forwards to cli/controlguard_cli.py)
├── .env.example                    # environment template (safe to commit)
├── docs/
│   ├── README.md                   # documentation index (links to every guide)
│   ├── TECHNICAL.md                # architecture, detection methodology, scoring, schemas
│   ├── USER_GUIDE.md               # non-technical guide for managers / auditors
│   ├── SETUP.md                    # developer install + VM hardening + troubleshooting
│   ├── MEDIA_GUIDE.md              # screenshot / GIF capture checklist for contributors
│   ├── DEMO_SCRIPT.md              # exact 5-minute presentation script + judge Q&A
│   ├── demo.gif                    # recorded walk-through
│   ├── media/                      # screenshots + GIFs referenced by README / User Guide
│   └── screenshots/                # original capture set (+ capture README)
├── backend/
│   ├── main.py                     # FastAPI app (uvicorn main:app)
│   ├── requirements.txt            # fastapi, uvicorn, pydantic, paramiko, openai, dotenv
│   └── controlguard/
│       ├── orchestrator.py         # runs the full deterministic pipeline + posture math
│       ├── models.py               # Pydantic schemas: Observation, Finding, ScanResult, Waiver
│       ├── store.py                # JSON I/O — baseline, waivers, scans, templates
│       ├── api/
│       │   ├── routes.py           # all REST endpoints (thin layer over the engine)
│       │   ├── scan_history.py     # SQLite scan-history store + audit queries
│       │   ├── evaluation.py       # PS2 scoreboard: classify vs ground truth, metrics
│       │   ├── attack_path.py      # blast-radius / multi-vector attack chains
│       │   ├── drift_history.py    # read-only analytics over the labeled CSV dataset
│       │   ├── sg_data.py          # Société Générale provided-dataset analysis
│       │   └── waivers.py          # time-bounded waiver overlay
│       ├── connectors/
│       │   ├── base.py             # BaseConnector contract — collect() → Observations
│       │   ├── linux.py            # sample-file Linux collector (clean/drifted)
│       │   ├── linux_ssh.py        # live agentless SSH collector (Paramiko)
│       │   ├── aws_sg.py           # AWS security-group collector
│       │   ├── vm_registry.py      # multi-VM registry (keys in memory, metadata on disk)
│       │   └── vm_control.py       # Drift Lab control plane (inject/reset drift)
│       ├── engine/
│       │   ├── drift.py            # baseline-relative drift detection
│       │   ├── risk.py             # deterministic risk_score, severity + confidence
│       │   ├── compliance.py       # framework mapping
│       │   ├── exceptions.py       # waiver resolution (approved vs expired)
│       │   ├── remediation.py      # attaches remediation playbooks
│       │   ├── explanation.py      # builds explanation fields (+ optional AI)
│       │   └── ai.py               # cached AI rewrite — OpenAI/Anthropic, off by default
│       ├── report/                 # server-rendered HTML scan + evaluation reports
│       └── data/                   # baseline.json, waivers, samples, scan_history.db (gitignored)
├── frontend/
│   └── src/
│       ├── App.jsx                 # dashboard shell, 5 tabs, alert provider, AI chat
│       ├── api.js                  # fetch wrappers for the REST API
│       ├── components/             # Dashboard, DriftLab, VmManager, ScanHistory,
│       │                           #   AuditReport, EvaluationBoard, AiChat, AlertSettings …
│       ├── hooks/useCountdown.js   # live waiver-expiry countdown
│       └── utils/                  # exportPDF, exportCSV, formatters
├── cli/                            # controlguard_cli.py, warm_ai_cache.py
├── notebooks/                      # RootGuardians_SG_Analysis.ipynb (PS2 analysis)
├── sample_data/                    # PS2 synthetic labeled dataset (1,000 events)
└── sample_data_by_societegenerale/ # the provided SG dataset (1,000 events)

PS2 Deliverables Checklist

GitHub repo with drift_detector.py entry point
Jupyter notebook (notebooks/RootGuardians_SG_Analysis.ipynb)
20+ drifts flagged with explanations (115 anomalies detected on the labeled set, each explained)
Interactive dashboard (React, 5 tabs, live + sample modes)
Technical documentation (docs/TECHNICAL.md, USER_GUIDE.md, SETUP.md)
Audit report export (11-page PDF + CSV)
5-minute presentation (script ready in docs/DEMO_SCRIPT.md)

PS2 Success Criteria Results

Metric	Target	RootGuardians
Detection Rate	> 80%	100%
False Positive Rate	< 15%	0%
Time Lag	< 1 hour	< 10 seconds (on-demand rescan)
Explainability	Every alert	✅ AI + deterministic templates
Compliance Mapping	NIST / CIS / GDPR	✅ 7 frameworks

Evaluation Results

Measured by the built-in Evaluation tab against the labeled PS2 dataset (1,000 events). Ground truth is derived from the data itself (status + change type + severity); the classifier uses the same deterministic logic the product uses.

                            Predicted
                     Anomalous     Benign
        Anomalous       115          0          ← 0 false negatives
Actual
        Benign            0         885         ← 0 false positives

Precision: 100%   |   Recall: 100%   |   F1: 100%   |   Accuracy: 100%

Because the dataset's severity domain is exactly {Critical, High, Medium, Low, Info}, recognizing authorized change types (rollback / scheduled change) and low-severity noise as benign yields a perfect, honest separation — shown transparently in the UI (confusion matrix, per-severity breakdown, and the exact FP/FN lists, which are empty).

Security Design

Read-only collectors — the SSH connector only runs non-mutating commands (cat, stat, ss, systemctl is-active). A scan never writes to the target.
Keys in memory only — uploaded SSH keys live in the backend process memory; only non-secret VM metadata (name, host, port, username) is persisted. Keys never touch disk and never appear in logs or API responses.
Deterministic detection — no black-box model decides risk. Every score is a reproducible formula a regulator can audit.
Secrets via environment — OPENAI_API_KEY is read server-side only; the AI chat is proxied through the backend so the key never reaches the browser. .env, target.json, scan artifacts, and the SQLite DB are git-ignored.

API Reference

Base URL: http://localhost:8000 · Interactive docs: http://localhost:8000/docs

Method	Path	Description
`POST`	`/api/scan`	Run a sample (`clean`/`drifted`) or live (`linux_ssh`) scan.
`GET`	`/api/scans/latest`	Latest persisted scan result.
`POST`	`/api/vms`	Register a VM (multipart: host, port, username, key file).
`GET`	`/api/vms`	List registered VMs (key-free view; `key_missing` flag).
`POST`	`/api/vms/{id}/scan`	Scan one registered VM over SSH.
`POST`	`/api/vms/{id}/rekey`	Re-upload a key for a VM restored after a restart.
`POST`	`/api/vm/control`	Drift Lab control plane (status / drift / reset / per-control toggle).
`GET`	`/api/history/scans`	Persisted scan history (filter: `days`, `asset`).
`GET`	`/api/history/timeline`	Posture timeline points.
`GET`	`/api/history/assets`	Distinct scanned assets (for the export VM picker).
`GET`	`/api/history/top-risks`	Top 10 riskiest drift findings across history.
`GET`	`/api/evaluation`	PS2 scoreboard (TP/FP/TN/FN, precision, recall, FPR).
`POST`	`/api/ai/chat`	Context-aware security assistant (GPT-4o-mini, server-side key).
`GET` / `POST` / `DELETE`	`/api/waivers`	List / grant / revoke time-bounded waivers.

Example — run a drifted scan

curl -s -X POST http://localhost:8000/api/scan \
  -H "Content-Type: application/json" \
  -d '{"profile":"drifted"}'

{
  "scan_id": "scan-20260614-...",
  "posture_score": 53.5,
  "summary": { "total_controls": 16, "drift_count": 5, "active_risks": 3, ... },
  "findings": [ { "control_id": "ssh-password-auth-disabled", "drift_detected": true,
                  "risk_score": 9.0, "severity": "medium", "exception_status": "active_risk",
                  "compliance": [ {"framework":"CIS","id":"5.4.4"}, ... ] }, ... ]
}

Example — ask the AI assistant

curl -s -X POST http://localhost:8000/api/ai/chat \
  -H "Content-Type: application/json" \
  -d '{"message":"What should I fix first?",
       "context":{"posture_score":53.5,"drift_count":5,
                  "active_risks":[{"title":"SSH password auth","severity":"medium","risk_score":9.0}],
                  "asset":"instance-20260606-0119"}}'
# → {"reply": "Prioritise the AWS database port exposed to the internet ..."}

Controls Reference

All 16 controls live in backend/data/baseline/baseline.json.

Control ID	Title	System	Criticality	Compliance
`ssh-root-login-disabled`	SSH root login disabled	Linux	Critical	CIS 5.4.10 · ISO A.9.2.3 · PCI 8.2.1 · NIST AC-2/IA-2
`ssh-password-auth-disabled`	SSH password authentication disabled	Linux	High	CIS 5.4.4 · ISO A.9.4.2 · NIST IA-5/AC-17
`firewall-enabled`	Firewall enabled	Linux	High	CIS 3.5.1.1 · ISO A.13.1.1 · PCI 1.2.1 · NIST SC-7
`audit-logging-enabled`	Audit logging enabled	Linux	Medium	CIS 4.1.1.1 · ISO A.12.4.1 · PCI 10.2.1 · GDPR Art.32
`mysql-3306-not-exposed`	MySQL port 3306 not exposed publicly	Linux	Critical	CIS 3.5.1.2 · ISO A.13.1.3 · PCI 1.3.1 · GDPR Art.32
`docker-socket-not-exposed`	Docker socket not exposed	Linux	High	CIS 2.8 · ISO A.13.1.3 · NIST CM-7/AC-6
`sensitive-files-not-world-writable`	Sensitive files not world-writable	Linux	Medium	CIS 6.1.10 · ISO A.9.4.1 · PCI 7.1.1 · GDPR Art.25
`aws-sg-ssh-not-public`	AWS SG must not expose SSH (22) to 0.0.0.0/0	AWS	Critical	CIS 5.2 · ISO A.13.1.1 · PCI 1.2.1 · NIST SC-7
`aws-sg-db-not-public`	AWS SG must not expose DB (3306) to 0.0.0.0/0	AWS	High	CIS 5.2 · ISO A.13.1.3 · PCI 1.3.1 · GDPR Art.32
`mysql-root-password-set`	MySQL root account has a password set	Linux	Critical	CIS 4.1 · PCI 8.2.1 · ISO A.9.4.3
`mysql-bind-localhost`	MySQL binds to localhost only	Linux	Critical	CIS 6.1 · PCI 1.3.1 · ISO A.13.1.3 · NIST SC-7
`nginx-running`	Nginx web server is running	Linux	Medium	CIS 3.5 · ISO A.12.1.2
`nginx-default-page-disabled`	Nginx default page is disabled	Linux	Low	CIS 2.2.4 · ISO A.14.2.5
`docker-running`	Docker daemon is running	Linux	Medium	CIS 2.1 · ISO A.12.1.2
`docker-socket-permissions`	Docker socket is not world-readable	Linux	High	CIS 2.8 · NIST AC-6 · ISO A.9.4.1
`fail2ban-active`	Fail2ban brute-force protection is active	Linux	High	CIS 5.3.4 · NIST AC-7 · ISO A.9.4.2 · PCI 8.1.6

Limitations & Future Work

We are honest about what is MVP vs production:

Storage — JSON files + SQLite are perfect for the demo; production would use PostgreSQL / a time-series store.
Authentication — the console has no auth today; production needs OAuth2 / SSO + RBAC.
Scale — 2 live VMs are wired up; the registry architecture supports unlimited hosts (each scan is independent work).
AWS — the AWS security-group controls run against a sample dataset; production would call the real AWS EC2/VPC SDK.
Alerting — email via EmailJS today; production would add Slack / PagerDuty and server-side scheduled scans.

See docs/TECHNICAL.md for the full architecture and docs/DEMO_SCRIPT.md for the presentation flow.

Documentation

Guide	For
docs/TECHNICAL.md	Developers — architecture, detection methodology, scoring formulas, schemas.
docs/USER_GUIDE.md	End users & auditors — plain-English walkthrough of every feature, FAQ, glossary.
docs/SETUP.md	Operators — install, VM hardening, environment, troubleshooting.
docs/MEDIA_GUIDE.md	Contributors — screenshot & GIF capture checklist.
docs/DEMO_SCRIPT.md	Presenters — 5-minute script and judge Q&A.
notebooks/	Analysts — Jupyter analysis of the PS2 / Société Générale dataset.

A full index lives at docs/README.md.

Team

RootGuardians — built for the Société Générale Hackathon 2026, Problem Statement 02: Security Control Drift & Misconfiguration Detection.

Built by Suraj Gojanur and Deep Saha.

License

Released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github		.github
backend		backend
cli		cli
demo		demo
docs		docs
frontend		frontend
notebooks		notebooks
sample_data		sample_data
sample_data_by_societegenerale		sample_data_by_societegenerale
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTORS.md		CONTRIBUTORS.md
README.md		README.md
drift_detector.py		drift_detector.py

Folders and files

Latest commit

History

Repository files navigation

🛡 RootGuardians

Continuous Security Control Assurance Platform

Baseline-relative. Deterministic. Audit-ready.

The Problem

What RootGuardians Does

Demo Video

Live Demo

Key Numbers

Architecture Overview

Features

Real-Time Alerts

Waivers

Tech Stack

Quick Start

Prerequisites

Installation

Running the Backend

Running the Frontend

Adding Your First VM

Project Structure

PS2 Deliverables Checklist

PS2 Success Criteria Results

Evaluation Results

Security Design

API Reference

Example — run a drifted scan

Example — ask the AI assistant

Controls Reference

Limitations & Future Work

Documentation

Team

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages