Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
.wrangler
node_modules
.env
.env
.dev.vars
314 changes: 314 additions & 0 deletions PROJECT_REPORT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,314 @@
# AEGIS — Project Progress Report
**Autonomous Support Operations Agent | Construct × Techfluence 2026**
*Report last updated: 2026-04-21*

---

## Executive Summary

AEGIS is a **fully autonomous support operations agent** built on Cloudflare Workers + the Construct SDK. A user pastes one raw Slack message, clicks one button, and AEGIS runs a 9-stage AI pipeline end-to-end — classifying, deduplicating, ticketing, generating RAG-grounded replies, escalating, warp-speed debugging, and self-updating the knowledge base — with zero manual intervention.

The system has been developed across **5 major phases**, evolving from a broken placeholder prototype to a production-grade dashboard with a real-time dual-chat AI system powered by **Google Gemini 2.5 Flash**.

---

## Project Timeline & Changes

### Phase 0 — Initial State (Broken)
**Problem:** App had a hardcoded placeholder backend URL — all tools returned internal errors.

**Root Cause:** Every tool (`classify_message`, `check_duplicate`, etc.) made `fetch()` calls to a non-existent backend domain.

---

### Phase 1 — Self-Contained Worker
**Goal:** Make all tools work with zero external dependencies.

- Removed the `BACKEND` placeholder entirely
- Implemented all 9 tool handlers directly inside the Cloudflare Worker
- Added in-memory `tickets` store and `knowledgeBase` array
- Implemented `classifyText()` — keyword-based BUG/QUERY/FEATURE/BILLING + P0–P3
- Implemented `cosineSimilarity()` — bag-of-words cosine for duplicate detection
- Implemented `ragReply()` — KB-first, domain template fallback

---

### Phase 2 — AI-Powered Classification
**Goal:** Replace keyword `if/else` classification with real LLM intelligence.

- Added AI `fetch()` call (Cloudflare Worker compatible)
- Graceful fallback to keyword classifier if key absent or API fails
- Added `source` field to response: `"ai"` or `"fallback"`
- Category set expanded: `FEEDBACK` → `FEATURE` + added `BILLING`
- Created `.dev.vars` for local secret injection

---

### Phase 3 — Production Architecture Refactor
**Goal:** Transform hackathon code into clean, production-ready architecture.

**8 upgrade areas:**
- Consistent `ToolResult` format with `ok()` / `err()` helpers
- AI response validation (enum guard → fallback on invalid)
- Storage abstraction layer with `TODO(D1)` / `TODO(KV)` stubs
- Section comments and code organization (9 labeled sections)
- Improved RAG logic with `retrieveRelevantKnowledge()` + `generateReply()`
- Error handling with `try/catch` on all tools
- In-memory rate limiter (20 req/min per user_id)
- Manifest network permission notes

---

### Phase 4 — Full UI Rebuild & Autonomous 9-Stage Pipeline
**Goal:** Replace the basic SDK panel UI with a premium, fully autonomous dashboard.

#### 4.1 — Zero-Manual-Entry Dashboard
The entire `ui/index.html` was rebuilt from scratch. The user provides one input (a raw Slack message) and clicks one button. AEGIS runs all 9 stages autonomously:

| Stage | What happens |
|-------|-------------|
| **01 Ingest** | AI normalizes text, detects spam, extracts intent |
| **02 Dedupe** | AI computes semantic similarity vs existing tickets |
| **03 Classify** | AI assigns `BUG/QUERY/FEATURE/BILLING` + `P0–P3` priority |
| **04 Ticket** | AI generates descriptive 5–8 word title + SLA assignment |
| **05 RAG Reply** | AI writes a grounded Slack reply using the current KB |
| **06 Escalate** | AI decides escalation level and target channel |
| **07 Warp Debug** | AI infers probable cause + 3 recommended debug steps |
| **08 KB Update** | Auto-triggered on resolution — writes new KB entry |
| **09 Insights** | Trend aggregation, resolution rate, KB coverage |

#### 4.2 — Premium Dashboard Design
- **Dark theme** (`#0a0a0f`) with accent `#4F6EF7` and Inter/JetBrains Mono typography
- **Animated pipeline rail** — 9 nodes transition idle → pulsing → green sequentially
- **Active ticket card** with live SLA countdown timer, priority badge, and category badge
- **AEGIS Bot reply bubble** — Slack-style, with grounded KB source tags
- **Warp-Speed Debug panel** — collapsible, fires automatically on P0/L3 escalation
- **Intelligence dashboard** — KB article list, insights stats, live state JSON

#### 4.3 — Dual-Chat System
Two simultaneous AI-powered chat panels, both tied to the active ticket:

**User Chat (left panel — blue accent)**
- Audience: Non-technical end user following up on their issue
- AEGIS responds warmly, avoids jargon, surfaces KB resolutions
- Auto-detects sentiment (`positive / neutral / frustrated / angry`)
- Sentiment badge updates live on the ticket card
- `escalate_flag` → automatically pushes warning into Developer Chat
- `resolved_flag` → triggers Stage 8 (KB) + Stage 9 (Insights) automatically
- Pre-seeded with demo conversation for TKT-0003

**Developer Chat (right panel — green accent)**
- Audience: On-call engineer who has been escalated a ticket
- AEGIS is technical: SQL queries, wrangler commands, ticket pattern analysis
- Auto-populated on escalation with full Warp Debug context, probable cause, and recommended steps
- Markdown rendering for code blocks and bold text
- Supports slash commands:

| Command | Action |
|---------|--------|
| `/snapshot` | Displays the Warp Debug context snapshot |
| `/similar` | Lists tickets with >0.75 similarity |
| `/kb` | Lists all KB articles for the ticket category |
| `/resolve <cause>` | Marks ticket resolved + writes root cause to KB |
| `/escalate` | Manually triggers L3 escalation |

**Cross-Chat Events:**

| Trigger | User Chat | Developer Chat |
|---------|-----------|----------------|
| Escalation to L2/L3 | "Your ticket has been assigned to our team." | Full auto-populated context message |
| Developer marks resolved | "Great news! Issue resolved." | "Ticket closed. KB updated." |
| User confirms resolved | "Closing your ticket now! 🎉" | "User confirmed. Auto-closed by AEGIS." |
| User frustration detected | — | "⚠ User frustration — escalated by AEGIS." |

#### 4.4 — Shared Chat State
```javascript
aegisState.chats = {
user: { ticketId, history, escalateCount, isTyping },
dev: { ticketId, history, warpActive, isTyping }
};
```

---

### Phase 5 — Full AI Integration & Gemini Migration

#### 5.1 — Real AI Calls (Gemini 2.5 Flash)
All pipeline stages and both chat panels now make **live API calls** to Google Gemini 2.5 Flash (`gemini-2.5-flash-preview-04-17`). The mock `ClaudeAI` controller has been completely removed.

**`callGemini()` helper (in `ui/index.html`):**
- Passes API key from the in-page banner input
- Sends `system_instruction` + `contents` in Gemini format
- Uses `responseMimeType: 'application/json'` for structured output
- Strips markdown code fences before JSON parsing
- Full error surfacing to the pipeline UI on failure

**API call per stage** — each stage has its own focused system prompt returning strict JSON:
```
Stage 1: {"cleaned_text", "is_spam", "has_urgent_keywords", "extracted_intent"}
Stage 2: {"is_duplicate", "original_ticket_id", "similarity_score"}
Stage 3: {"category", "priority", "confidence", "needs_human_review", "reason"}
Stage 4: {"ticket_id", "title", "status", "sla_hours", "created_at"}
Stage 5: {"reply_text", "kb_sources_used", "grounded"}
Stage 6: {"should_escalate", "escalation_level", "target_channel", "activate_warp_debug"}
Stage 7: {"conversation_summary", "probable_cause", "recommended_first_steps"}
```

**Chat AI system prompts:**
- User Chat: ticket context + KB resolution + sentiment rules → JSON with `reply`, `resolved_flag`, `escalate_flag`, `sentiment`
- Developer Chat: full ticket + all tickets + KB + capabilities → JSON with `reply`, `resolved_flag`, `kb_update`, `kb_resolution_text`

#### 5.2 — Migration from Anthropic → Gemini

| Component | Before | After |
|-----------|--------|-------|
| **UI model** | `claude-sonnet-4-20250514` (Anthropic) | `gemini-2.5-flash-preview-04-17` |
| **UI API endpoint** | `https://api.anthropic.com/v1/messages` | `https://generativelanguage.googleapis.com/v1beta/models/...` |
| **UI request format** | `{model, max_tokens, system, messages}` | `{system_instruction, contents, generationConfig}` |
| **UI response path** | `data.content[0].text` | `data.candidates[0].content.parts[0].text` |
| **UI auth header** | `x-api-key` + `anthropic-version` | API key in URL query param |
| **Server model** | `gpt-4o-mini` (OpenAI) | `gemini-2.5-flash-preview-04-17` |
| **Server endpoint** | `https://api.openai.com/v1/chat/completions` | `https://generativelanguage.googleapis.com/v1beta/models/...` |
| **Server auth** | `Authorization: Bearer` | API key in URL query param |
| **Server response path** | `data.choices[0].message.content` | `data.candidates[0].content.parts[0].text` |
| **Env var** | `OPENAI_API_KEY` | `GEMINI_API_KEY` |
| **manifest.json network** | *(not set)* | `generativelanguage.googleapis.com` added |

---

## Current Architecture

```
aegis-app/
├── server.ts ~682 lines — complete Worker logic
│ ├── § TYPES
│ ├── § RATE LIMITER
│ ├── § STORAGE LAYER (D1/KV-ready)
│ ├── § UTILITIES
│ ├── § CLASSIFICATION ENGINE (Gemini + keyword fallback)
│ ├── § RAG ENGINE
│ └── § TOOLS (9 tools)
├── wrangler.toml Worker config + static asset binding
├── manifest.json Construct app metadata + Gemini network permission
├── .dev.vars Local secrets — GEMINI_API_KEY (gitignored)
├── .gitignore
├── package.json
└── ui/
├── index.html ~1300 lines — full autonomous dashboard
│ ├── API Key Banner (Gemini key input)
│ ├── Input Zone (Slack message ingest)
│ ├── Pipeline Rail (9 animated stages)
│ ├── Work Grid (ticket card + RAG reply + warp debug)
│ ├── Dual Chat System (User Chat + Developer Chat)
│ └── Intelligence Dashboard (KB + insights + state JSON)
└── icon.svg
```

---

## Tool Inventory

| # | Tool | AI? | Storage |
|---|------|-----|---------|
| 1 | `read_support_message` | — | None |
| 2 | `classify_message` | ✅ Gemini 2.5 Flash | None |
| 3 | `check_duplicate` | — | Read tickets |
| 4 | `create_ticket` | — | Write ticket |
| 5 | `thread_reply` | — | Read KB |
| 6 | `escalate_unresolved` | — | Update ticket |
| 7 | `generate_context_snapshot` | — | Read tickets + KB |
| 8 | `update_knowledge_base` | — | Update ticket + Write KB |
| 9 | `generate_insights` | — | Read all tickets |

---

## Classification System

### AI Path (when `GEMINI_API_KEY` is set in `.dev.vars`)
```
POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-preview-04-17:generateContent
responseMimeType: application/json
temperature: 0
Validation: category ∈ enum AND priority ∈ enum → null if invalid → triggers fallback
```

### Fallback Path (keyword classifier)
| Signal | Category | Priority |
|--------|----------|---------|
| billing, payment, invoice | BILLING | — |
| error, bug, crash, fail | BUG | outage, down → P0 |
| suggestion, feature | FEATURE | payment fail, login fail → P1 |
| how, what, guide | QUERY | slow, latency → P2 |

---

## SLA Mapping

| Priority | Meaning | SLA |
|----------|---------|-----|
| P0 | Critical outage | 1 hour |
| P1 | High impact | 4 hours |
| P2 | Medium | 24 hours |
| P3 | Low / informational | 72 hours |

---

## Environment & Secrets

| Variable | Where | Purpose |
|----------|-------|---------|
| `GEMINI_API_KEY` | `.dev.vars` | Server-side AI classification (Worker) |
| `GEMINI_API_KEY` | Wrangler secret | Production AI classification |
| Gemini API key | UI banner input | All 9 pipeline stages + both chat panels |

```powershell
# Add to .dev.vars for local dev (server-side)
GEMINI_API_KEY=AIzaSy_your_key_here

# Set secret for production deployment
npx wrangler secret put GEMINI_API_KEY

# Deploy
npx wrangler deploy
```

**UI key:** Paste your `AIza...` key into the banner input at the top of the running dashboard. It is never sent to the Worker — it goes directly to Google's API from the browser.

---

## Code Metrics

| File | Lines | Notes |
|------|-------|-------|
| `server.ts` | ~682 | 9 registered tools, Gemini classification |
| `ui/index.html` | ~1300 | Full autonomous dashboard, dual chat, Gemini AI |
| `wrangler.toml` | 10 | |
| `manifest.json` | 18 | Includes `generativelanguage.googleapis.com` |
| `package.json` | 20 | No AI SDK deps — uses direct `fetch()` |

---

## Known Limitations & Production Roadmap

| Limitation | Current | Production Fix |
|-----------|---------|----------------|
| Ticket store | In-memory (resets on restart) | Cloudflare D1 |
| Knowledge base | In-memory array | Cloudflare KV |
| RAG similarity | Bag-of-words cosine | Cloudflare Vectorize |
| Rate limiter | In-memory (per isolate) | Cloudflare Durable Objects |
| Slack posting | Simulated | Slack Web API |
| Auth / multi-tenant | None | API key middleware |
| Chat history | In-memory only | D1 or KV persistence |

---

## Immediate Next Steps

1. **Set `GEMINI_API_KEY`** in `.dev.vars` and test server-side AI classification
2. **Migrate tickets to Cloudflare D1** — swap `_tickets` using the pre-built storage functions
3. **Migrate KB to Cloudflare KV** — swap `_kb` using `upsertKnowledge` / `searchKnowledge`
4. **Add Vectorize embeddings** to replace bag-of-words cosine similarity in dedupe + KB search
5. **Wire Slack Web API** in `thread_reply` for real message posting
6. **Add Durable Objects** rate limiter for production-grade enforcement
7. **Route UI AI calls through the Worker** instead of calling Gemini directly from the browser (security improvement for production)
8. **Update `manifest.json`** author and owners fields before submission
69 changes: 69 additions & 0 deletions REPORT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# AEGIS Autonomous Support Agent
**Project Status Report**
**Date:** April 22, 2026

## 1. Executive Summary
The AEGIS platform has been successfully modernized into a fully autonomous, AI-driven support operations system. Originally envisioned as a standard triage tool, AEGIS is now a proactive intelligence layer that intercepts, analyzes, classifies, and resolves support requests entirely independently using Google's Gemini Models, all natively deployed on Cloudflare Workers and powered by a D1 database.

## 2. Architecture & Tech Stack
The project was designed for edge deployment, ensuring zero-latency triage and high resilience.

* **Compute:** Cloudflare Workers (`server.ts`)
* **Database:** Cloudflare D1 (`aegis-tickets`)
* **Intelligence:**
* Primary: Google Gemini Models (`gemini-2.5-flash` for dual-chat and fast classification)
* Fallback: In-memory keyword-matching fallback system guaranteeing uptime during API failure.
* **Frontend:** Vanilla JS & CSS (`ui/index.html`), hosted directly as a Worker Asset, functioning as the command-center dashboard.
* **Integrations:** Slack (Threaded Replies, Escalation Alerts) and Notion (Knowledge Base Sync, Ticket Mirroring).

## 3. Core Features Implemented

### 🤖 Autonomous Pipeline
1. **Ingest:** Receives support requests via dashboard / Slack events.
2. **Dedupe:** Applies cosine-similarity matching to identify and merge duplicate support queries.
3. **Classify:** Leverages AI to route the ticket (BUG, QUERY, FEATURE, BILLING) and assign a critical SLA priority (P0 to P3).
4. **RAG-Reply:** Uses historical resolution data from the Knowledge Base to instantly ground a contextual response to the user.
5. **Auto-Escalation:** Identifies P0 and P1 events and automatically pages the `#oncall-alerts` Slack channels.

### 💬 Dual-Persona Chat System
The web dashboard features a highly sophisticated dual-environment chat:
* **User Persona:** A warm, friendly support agent that shields the user from technical jargon and handles baseline empathy and updates.
* **Developer Persona (Warp Debug):** A specialized, highly technical agent that receives a "Snapshot" of active metrics upon escalation. It acts as an L2/L3 pairing partner, querying error logs and suggesting immediate diagnostic queries (e.g., checking Stripe webhooks, checking Redis locks).

### 📈 Dynamic Analytics
The hardcoded UI mockups have been fully replaced. The dashboard now queries the live `aegis-tickets` D1 database to produce AI-driven **Insights**, mapping exact resolution rates, ticket volume, and trend anomalies directly to the UI.

### ⏱️ Stale Ticket Chron-Job
A deployed Cron Trigger on the Cloudflare Worker routinely scans the database for tickets remaining in `open` state for more than 24 hours, automatically dispatching escalation notifications.

## 4. Workflows

```mermaid
sequenceDiagram
participant User
participant AEGIS
participant D1_DB
participant Engineer

User->>AEGIS: "System is down returning 500s"
AEGIS->>AEGIS: Dedupe & Classify (BUG, P0)
AEGIS->>D1_DB: Create Ticket
AEGIS->>User: RAG Reply (Friendly)

Note over AEGIS,Engineer: Auto-Escalation
AEGIS->>Engineer: Trigger Warp-Debug Chat
Engineer->>AEGIS: "Follow my diagnostic steps"
AEGIS->>Engineer: Acknowledged, awaiting instructions.

Engineer->>AEGIS: (Resolves Issue)
AEGIS->>D1_DB: Mark Resolved & Update KB
AEGIS->>User: Notify customer of resolution
```

## 5. Current State & Next Steps
The system is entirely decoupled from local mock data and is running correctly against the `aegis-app.shubhamvelip4.workers.dev` live worker for both API endpoints and visual rendering.

**Future Expansion Opportunities:**
1. **Webhook Subscriptions:** Fully wiring up Stripe / Datadog webhooks so AEGIS can ingest trace spans directly into the Warp Debug channel before a human reports the error.
2. **Authentication:** Integrating Cloudflare Access or a lightweight JWT system so the dashboard is secured behind engineering logins.
3. **KB Expanded Context:** Storing the generated KB summaries into a proper Vector Database (Cloudflare Vectorize) to improve the RAG pipeline's long-term intelligence as ticket volume grows.
Loading