A minimalist AI agent framework with Markdown-based skills and human-in-the-loop support. CaramelBot routes natural language messages to LLM-powered skill agents via a FastAPI server, using Telegram as the primary interface and Playwright (via MCP) for browser automation.
- Natural language routing — Send a message and CaramelBot automatically picks the right skill to execute
- Markdown-based skills — Define agent behaviors as simple
.mdfiles with YAML frontmatter - Human-in-the-loop — Agents can pause and ask the user for input (credentials, confirmations, decisions) via Telegram
- Browser automation — Skills can control a browser through Playwright via MCP (Model Context Protocol)
- Multi-LLM support — Uses litellm to work with OpenAI, Anthropic, Google, and other providers
- REST API + Telegram — Interact via HTTP endpoints or a Telegram bot
- Python 3.11+
- uv — Python package manager
- Node.js / npx — Required for Playwright MCP server (spawned automatically)
- An API key for at least one LLM provider (Anthropic, OpenAI, or Google)
git clone <repo-url>
cd caramelbotuv syncCreate a .env file in the project root:
# LLM provider (at least one API key required)
ANTHROPIC_API_KEY=sk-ant-...
# OPENAI_API_KEY=sk-...
# GEMINI_API_KEY=...
# LLM model (optional, defaults to anthropic/claude-sonnet-4-20250514)
# Uses litellm model format: provider/model-name
# DEFAULT_MODEL=anthropic/claude-sonnet-4-20250514
# Telegram bot (optional, needed for Telegram integration)
# TELEGRAM_BOT_TOKEN=123456:ABC-DEF...
# TELEGRAM_CHAT_ID=123456789
# Database (optional, defaults to sqlite:///caramelbot.db)
# DATABASE_URL=sqlite:///caramelbot.dbuv run caramelbotThe server starts on http://localhost:8000 with auto-reload enabled.
Alternatively:
uv run python -m app.main| Method | Endpoint | Description |
|---|---|---|
POST |
/chat |
Send a message with automatic skill routing |
POST |
/webhook/telegram |
Telegram webhook for bot integration |
GET |
/conversations/{id} |
Retrieve a conversation with its messages |
{
"message": "Emita uma nota fiscal para o cliente X",
"conversation_id": null
}| Method | Endpoint | Description |
|---|---|---|
GET |
/skills |
List all available skills |
POST |
/tasks/run |
Start a skill task directly |
POST |
/tasks/resume |
Resume a paused task with human input |
GET |
/tasks |
List tasks (optionally filter by status) |
GET |
/tasks/{id} |
Get a specific task |
Skills are Markdown files in the skills/ directory. Each file has YAML frontmatter defining metadata and a Markdown body with agent instructions.
---
name: gerador_nota_fiscal
description: Acessa o portal da prefeitura para emitir NFSe.
tools: [mcp_playwright]
---
# Instrucoes
1. Navegue ate o portal de emissao.
2. Se o login for solicitado, use a ferramenta 'ask_human'.
3. Preencha os dados do cliente e emita a nota.
4. Retorne o caminho completo do PDF resultante.| Field | Required | Description |
|---|---|---|
name |
Yes | Unique identifier for the skill (becomes the tool function name) |
description |
Yes | What the skill does (shown to the LLM for routing) |
tools |
No | List of tool sets to enable. Currently supports mcp_playwright |
ask_human— Always available. Pauses the agent and sends a question to the user via Telegram- Playwright browser tools — Enabled when
tools: [mcp_playwright]is set. Provides full browser control (navigate, click, fill, screenshot, etc.)
flowchart TD
TG["Telegram"] -->|webhook| FA["FastAPI Server"]
REST["REST /chat"] -->|POST| FA
FA --> ROUTER["Chat Router"]
SKILLS["Skills (.md)"] --> ROUTER
ROUTER <-->|read/write| DB["SQLite DB"]
ROUTER --> LLM["LLM (litellm)"]
LLM --> DEC{Skill?}
DEC -- No --> RESP["Chat Response"]
RESP -.->|send| TG_OUT["Telegram"]
DEC -- Yes --> AGENT["Agent Loop (20 iters)"]
AGENT -.->|result| TG_OUT
AGENT --> PW["Playwright MCP"]
AGENT --> AH["ask_human"]
AH --> AWAIT["AWAITING_INPUT"]
AWAIT -.->|resume| AGENT
style TG fill:#a5d8ff,stroke:#4a9eed
style REST fill:#a5d8ff,stroke:#4a9eed
style TG_OUT fill:#a5d8ff,stroke:#4a9eed
style FA fill:#d0bfff,stroke:#8b5cf6
style ROUTER fill:#d0bfff,stroke:#8b5cf6
style AGENT fill:#d0bfff,stroke:#8b5cf6
style SKILLS fill:#ffd8a8,stroke:#f59e0b
style DB fill:#c3fae8,stroke:#22c55e
style LLM fill:#fff3bf,stroke:#f59e0b
style DEC fill:#fff3bf,stroke:#f59e0b
style RESP fill:#b2f2bb,stroke:#22c55e
style PW fill:#b2f2bb,stroke:#22c55e
style AH fill:#ffc9c9,stroke:#ef4444
style AWAIT fill:#ffc9c9,stroke:#ef4444
- User sends a message via Telegram or the
/chatAPI - The chat router loads all skills as LLM tool definitions and calls the LLM
- The LLM either responds conversationally or invokes a skill
- If a skill is invoked, a background task runs the skill's agent loop with up to 20 iterations
- The agent uses its tools (browser, ask_human) to complete the task
- If
ask_humanis called, the task pauses (AWAITING_INPUT) and the user is notified via Telegram. The task resumes when the user responds viaPOST /tasks/resume - Results are sent back to the user via Telegram
- Create a bot with @BotFather and get the token
- Set
TELEGRAM_BOT_TOKENandTELEGRAM_CHAT_IDin your.env - Configure a webhook pointing to
https://your-domain/webhook/telegram
- FastAPI + uvicorn — HTTP server
- litellm — Multi-provider LLM abstraction
- SQLModel + SQLite — Persistence (conversations, tasks, messages)
- Playwright via MCP — Browser automation (stdio transport)
- httpx — Telegram Bot API client
