Build agents in Python. Ship them on a Rust runtime.
Skarta is a production-grade runtime for multi-agent systems. You write the agents. Skarta handles orchestration, validation, scheduling, streaming, sessions, budgets, telemetry, and security.
pip install skartaThat single install ships a Python SDK and a pre-built runtime binary for macOS (Apple Silicon) and Linux (x86_64, aarch64). No Rust toolchain. No separate server.
Platform support
- Windows is not natively supported in this release. The runtime uses Unix domain sockets for local IPC, which require a Unix-family host. Windows users can run Skarta today via WSL2. Native Windows wheels are on the roadmap.
- macOS Intel (x86_64) is not built by CI. Apple Silicon Macs only for the macOS wheel today. Intel-Mac users can run the manual release script on an x86_64 macOS host or wait for native support.
- Sandboxed code execution is Linux / Docker only for now. The capability sandbox that isolates untrusted or model-generated code in a subprocess (the
Sandboxallowlists) cannot yet launch an interpreter such as Python under the native macOS backend. Run such code on a Linux host (native sandbox) or via thesarthiai/skartaDocker image. Everything else, including runtime-enforced tool permissions and the network / filesystem / env allowlists, works on macOS as documented; only subprocess sandboxing of untrusted code is affected.
Client / server architecture (under testing)
A client / server shape is in the works (the
skarta-clientPyPI wheel and thesarthiai/skartaDocker image), where your Python code connects to a remote Skarta runtime over the network. That path is currently under testing and not officially released for public use. The documentation here covers the supportedpip install skartainstall only. Stick with it for anything you care about until the client / server path lands properly.
A working two-agent typed pipeline that calls a real LLM. One terminal, one Python file, one env var.
1. Install:
pip install skarta
export OPENAI_API_KEY=sk-...2. Save this as pipeline.py:
import asyncio
from pydantic import BaseModel
from skarta import LLM, Agent, Pipeline
class Topic(BaseModel):
subject: str
class Brief(BaseModel):
summary: str
class Headline(BaseModel):
text: str
llm = LLM(model="gpt-4o-mini")
summarize = Agent(
llm=llm,
instructions="Write one sentence on the given subject.",
input=Topic, output=Brief,
)
headline = Agent(
llm=llm,
instructions="Rewrite the brief as a punchy six-word headline.",
input=Brief, output=Headline,
)
async def main() -> None:
pipeline = Pipeline(summarize, headline)
result = await pipeline.execute(Topic(subject="production-grade agent runtimes"))
print(result.text)
if __name__ == "__main__":
asyncio.run(main())3. Run:
python pipeline.pyYou will see a real LLM-generated headline.
What just happened: Skarta started, picked up your OpenAI key, saw that summarize produces a Brief and headline consumes a Brief, worked out the order itself, called OpenAI for the first agent, checked the reply matched the shape you asked for, fed it into the second agent, called OpenAI again, checked that reply too, and gave you back a finished Headline object. You wrote no flow chart, no retry loop, no queue, no glue.
Prefer Anthropic? Swap the LLM declaration:
llm = LLM(model="claude-haiku-4-5") # picks up ANTHROPIC_API_KEYOpenRouter, Groq, Ollama, or any OpenAI-compatible endpoint: pass base_url and provider="openai_compat" to LLM(...). For richer config (multiple providers, pricing overrides, persistent storage), drop a framework.toml next to your script. See Configuring an LLM endpoint for the full setup.
Want streaming, sessions, tools, or multi-agent collaboration? Three more lines:
from skarta import Tool, Memory, Team, Schedule
@Tool
async def search(args): ... # tool the agent can call
agent = Agent(llm=llm, instructions="...", tools=[search])
agent = Agent(llm=llm, instructions="...", memory=Memory()) # cross-call memory
team = Team(members=[a, b, c], shape="brainstorm") # 3-agent dialogue
@Schedule("0 9 * * *") # cron-trigger an agent
async def daily_brief(): await agent.execute("...")The full surface is in Getting started and per-topic pages in docs/.
Eleven runnable files, each one a complete example. Each step adds one capability. Stop where Skarta does enough for what you're building. Every example assumes:
pip install skarta
export OPENAI_API_KEY=sk-...Single-agent, single LLM call.
A customer types a question, your agent answers. The shortest possible Skarta program.
import asyncio
from skarta import LLM, Agent
agent = Agent(
llm=LLM(model="gpt-4o-mini"),
instructions="Answer in one short paragraph.",
)
async def main():
print(await agent.execute("Why are Rust runtimes a good fit for agents?"))
asyncio.run(main())Skarta started, called OpenAI, returned the answer, and shut down on its own. Nine lines, no server to run.
Structured outputs with typed I/O.
The model returns a Python object with named fields, not a string of text you have to parse and pray over.
import asyncio
from pydantic import BaseModel
from skarta import LLM, Agent
class Question(BaseModel):
text: str
class Answer(BaseModel):
reply: str
confidence: float
agent = Agent(
llm=LLM(model="gpt-4o-mini"),
instructions="Reply briefly. Include a confidence score from 0 to 1.",
input=Question, output=Answer,
)
async def main():
out = await agent.execute(Question(text="Is the sky blue?"))
print(out.reply, out.confidence)
asyncio.run(main())If the model returns junk or skips a field, Skarta raises a clear error before the bad value reaches your code. You can act on out.confidence and out.reply knowing they are exactly the types you asked for.
Tool calling (also called function calling), with the tool loop on autopilot.
Hand the agent a Python function that hits your API, queries your database, or returns a price from your pricing service. The agent decides when to call it.
import asyncio
from pydantic import BaseModel
from skarta import LLM, Agent, Tool
class WeatherArgs(BaseModel):
city: str
@Tool
async def get_weather(args: WeatherArgs) -> dict:
return {"city": args.city, "temp_c": 23, "condition": "sunny"}
agent = Agent(
llm=LLM(model="gpt-4o-mini"),
instructions="If asked about weather, call get_weather.",
tools=[get_weather],
)
async def main():
print(await agent.execute("What's the weather in Tokyo right now?"))
asyncio.run(main())Skarta did the back-and-forth: the model asked to call get_weather, Skarta ran your function, fed the result back to the model, and the model wrote its final answer. None of that wiring was yours to write.
Conversational memory with pluggable session storage.
The agent picks up where the conversation left off. Same code works for a five-minute chat or one that spans a week.
import asyncio
from skarta import LLM, Agent, Memory
agent = Agent(
llm=LLM(model="gpt-4o-mini"),
instructions="You are a helpful note-taker.",
memory=Memory(),
)
async def main():
await agent.execute("My favourite colour is teal.", session="user-42")
print(await agent.execute("What is my favourite colour?", session="user-42"))
asyncio.run(main())Skarta loaded the earlier turns for session user-42, gave them to the model as context, and saved the new exchange back. Conversations live in a file by default. Switch to Memory(backend="db") and they ride along through restarts and across machines.
Conversation branching and named session checkpoints.
Conversations aren't write-once. Save the chat at any turn, test a tangent, and roll back to the saved turn if the new direction was worse.
import asyncio
from skarta import LLM, Agent, Memory
mem = Memory()
agent = Agent(
llm=LLM(model="gpt-4o-mini"),
instructions="You are a trip-planning assistant.",
memory=mem,
)
async def main():
await agent.execute("Let's plan a 3-day trip to Kyoto.", session="trip")
await mem.checkpoint("trip", name="after-outline")
# Try a tangent on the same thread.
await agent.execute("Actually, swap day 2 for a day-trip to Osaka.", session="trip")
# Decide we preferred the original; rewind to the pinned point.
await mem.restore("trip", name="after-outline")
print(await agent.execute("OK keep Kyoto only. What's day 1?", session="trip"))
asyncio.run(main())Skarta saved the conversation at the moment you named, let the agent try the Osaka tangent, then dropped that tangent and reset to the Kyoto plan. Use mem.fork(...) instead of restore to keep both versions alive and switch between them later.
Agent pipeline with auto-DAG: Skarta derives the order from your data shapes.
A research question comes in. One agent plans the bullet points, the next writes a draft, the third compresses it to a one-page brief. You write the three agents; Skarta works out the order from your data and runs them back to back.
import asyncio
from pydantic import BaseModel
from skarta import LLM, Agent, Pipeline
class Question(BaseModel):
text: str
class Outline(BaseModel):
points: list[str]
class Draft(BaseModel):
body: str
class OnePager(BaseModel):
summary: str
llm = LLM(model="gpt-4o-mini")
planner = Agent(llm=llm, instructions="List three bullet points to research.",
input=Question, output=Outline)
writer = Agent(llm=llm, instructions="Write a short answer from these points.",
input=Outline, output=Draft)
condenser = Agent(llm=llm, instructions="Compress the draft to about 80 words.",
input=Draft, output=OnePager)
async def main():
out = await Pipeline(planner, writer, condenser).execute(
Question(text="How does a Rust agent runtime differ from a Python library?")
)
print(out.summary)
asyncio.run(main())No flow charts, no edge lists. Skarta saw that planner produces an Outline, writer expects an Outline and produces a Draft, condenser expects a Draft, worked out the order, and checked every handoff so a malformed reply from one agent can't slip into the next.
Auto-orchestrated DAG with conditional branching, retries, parallel fan-out, and racing.
A support ticket arrives. The triage agent decides severity; low-priority tickets get a quick acknowledgement, the urgent ones get a detailed apology and remediation plan. Only the chosen handler runs.
import asyncio
from pydantic import BaseModel
from skarta import LLM, Agent, Process
class Ticket(BaseModel):
text: str
llm = LLM(model="gpt-4o-mini")
triage = Agent(
llm=llm,
instructions="Classify severity. Reply with exactly one word: low or high.",
input=Ticket, output=str,
)
short_reply = Agent(llm=llm, instructions="Write a one-line acknowledgement.")
long_reply = Agent(llm=llm, instructions="Write a detailed apology and remediation plan.")
p = Process.branch(
triage,
routes={"low": short_reply, "high": long_reply},
)
async def main():
print(await p.execute(Ticket(text="My laptop is on fire.")))
asyncio.run(main())Skarta ran the triage agent, sent the ticket to the matching handler, and recorded why the other handler was skipped. Want the agent to try again on flaky calls? Add retry=Retry(max_attempts=3). Want to process a list of tickets at once? Use Process.for_each. Want two agents to race? Process.race. All one line each.
Multi-agent collaboration: agent teams and rooms with structured turn-taking.
Sometimes one agent isn't enough. A product manager, an engineer, and a customer-support agent argue over whether to ship a feature. They keep talking until they agree or the turn cap hits.
import asyncio
from skarta import LLM, Agent, Team
llm = LLM(model="gpt-4o-mini")
product = Agent(llm=llm, instructions="You are a product manager.")
eng = Agent(llm=llm, instructions="You are an engineer focused on feasibility.")
support = Agent(llm=llm, instructions="You are customer support; raise user pain.")
team = Team(
members=[product, eng, support],
shape="brainstorm",
terminate_on={"max_turns": 6},
)
async def main():
result = await team.execute(
"Should we ship feature X next quarter? Give a one-paragraph rationale."
)
print(result.final)
asyncio.run(main())Pick shape="decision" and the agents take a structured vote. Pick shape="status-sync" and they go round-robin like a daily stand-up. Promote the team to a Room when you need named guests, silent observers, or one Skarta serving many customers in isolation.
Scheduled agents and webhook-triggered agents, with in-runtime HTTP ingress and replay dedupe.
Your morning brief at 9 a.m. every day. Your refund agent the moment Stripe POSTs a charge.dispute. Your incident agent the moment PagerDuty fires. Each is a decorator.
import asyncio
from skarta import LLM, Agent, App, Schedule, Webhook
llm = LLM(model="gpt-4o-mini")
brief = Agent(llm=llm, instructions="Write a 3-bullet morning brief.")
@Schedule("0 9 * * *", tz="UTC") # every day at 09:00 UTC
async def daily_brief():
print(await brief.execute("Today's top AI stories?"))
@Webhook("github_issue_opened") # POST /webhooks/github_issue_opened
async def on_issue(payload):
print(await brief.execute(f"Summarise this issue: {payload}"))
async def main():
async with App() as _:
await asyncio.Event().wait() # keep the runtime alive
asyncio.run(main())Skarta read the cron line, opened the HTTP port, dropped any duplicate POSTs Stripe or GitHub retried, and ran your handler. No Flask, no FastAPI, no separate web server, no system cron. Same install.
Event-driven agents on a typed outbound bus (pub/sub).
Your refund agent finished. Your audit log, your Slack notifier, and your CRM all need to know. Don't wire each one to each other; publish a named event and let anyone interested subscribe.
import asyncio
from pydantic import BaseModel
from skarta import LLM, Agent, App, Event, Tool, current_context
class RefundArgs(BaseModel):
order_id: str
amount_usd: float
@Tool
async def issue_refund(args: RefundArgs) -> dict:
ctx = current_context()
await ctx.publish(
"billing.refund_issued",
{"order_id": args.order_id, "amount_usd": args.amount_usd},
idempotency_key=f"refund-{args.order_id}",
)
return {"status": "refunded", "order_id": args.order_id}
refund_agent = Agent(
llm=LLM(model="gpt-4o-mini"),
instructions="Process refund requests with the issue_refund tool.",
tools=[issue_refund],
)
# Another part of the system (could be a different process, even a different language) reacts:
@Event("billing.refund_issued")
async def audit(envelope):
print(f"audit: refund issued for order {envelope['payload']['order_id']}")
async def main():
async with App() as _:
await refund_agent.execute("Please refund order #1234 for $19.99.")
asyncio.run(main())Skarta checked the event name, dropped any duplicate the same refund-1234 key had already fired, and delivered the message to every listener. Metrics on how often each event fires land in your Prometheus dashboard automatically. The audit listener doesn't have to live in this file; it can run in a different process, a different language, or a different deploy and still receive the same event.
Guardrails, hard budgets, sandboxing, HITL (human-in-the-loop).
Hard spend caps, human approval on risky tools, an allowlist of hosts the agent may call. The lines you'd otherwise write yourself, or skip and regret. Skarta hands them to you as constructor arguments.
import asyncio
from pydantic import BaseModel
from skarta import LLM, Agent, App, Approval, Budget, Permissions, Tool
class RefundArgs(BaseModel):
order_id: str
amount_usd: float
@Tool(requires_approval=True)
async def issue_refund(args: RefundArgs) -> dict:
return {"status": "refunded", "order_id": args.order_id}
agent = Agent(
llm=LLM(model="gpt-4o-mini"),
instructions="Help the user. Use issue_refund when justified.",
tools=[issue_refund],
budget=Budget(max_cost_usd=0.50),
)
async def main():
async with App(
approval=Approval.slack("https://hooks.slack.com/services/..."),
permissions=Permissions(
network=["api.openai.com"],
env_vars=["OPENAI_API_KEY"],
),
) as _:
print(await agent.execute("Please refund order #1234 for $19.99."))
asyncio.run(main())Skarta cut off model spend at $0.50, paused the refund tool call and posted it to Slack for a human to approve, and refused any outbound HTTP except to api.openai.com, even if the agent was told to call elsewhere. Point SKARTA_DATABASE_URL at Postgres and conversations, spend caps, and the audit log live in your database.
- Levels 1 to 4 docs/agents.md
- Level 5 Sessions in docs/agents.md, docs/conversations.md
- Levels 6 and 7 docs/processes.md
- Level 8 docs/teams.md, docs/rooms.md
- Level 9 docs/triggers.md, docs/schedules.md, docs/webhooks.md
- Level 10 docs/outbound-events.md
- Level 11 docs/hitl.md, docs/sandbox.md, docs/configuration.md
Everything below ships in the same pip install skarta. Nothing is paid, nothing is an add-on, nothing needs a separate server install.
Six ways to put agents to work: single agent through multi-agent orchestration
- Single agent with tool calling and structured outputs. One agent answering a customer, drafting an email, calling your APIs, looking up an order.
Agent - Agent pipeline with auto-DAG. A line of agents that hands the work along in order: read the email, classify the intent, draft the reply, polish the tone. You write the four agents; Skarta derives the DAG (the order of work) from the shape of the data and runs them back to back.
Pipeline - Auto-orchestrated DAG. A workflow that isn't a straight line. A support ticket gets classified, then refunds go to one agent and complaints to another (conditional branching). A research task splits into ten sub-questions, runs them at the same time (parallel fan-out), then a writer merges the answers. A draft gets rewritten in a loop until the quality score crosses a bar. Two models compete on the same question and you keep whichever answered first (race + cancel losers).
Process - Multi-agent collaboration. Several agents talking among themselves. A product manager, an engineer, and a support rep argue over whether to ship a feature. Pick
brainstormfor free exploration,decisionfor a structured vote,status-syncfor a stand-up round. The conversation stops on its own when they agree or the turn cap hits.Team - Multi-agent rooms. A named multi-agent conversation with a guest list. Mix a single agent, a whole team, and a multi-step workflow in the same room. Add a chair to keep order, observers who watch but don't speak, and agents that join in the middle.
Room - LLM-driven agent orchestration (planner-as-tool). Let an LLM design the workflow itself. You hand it the goal and a menu of agents and tools; it picks who runs in what order; Skarta executes the plan. Useful when the right workflow depends on the request. You can keep separate planners for support, sales, and research running side by side.
Orchestrator
Triggers and event-driven integrations
- Scheduled agents (cron). An agent that runs every morning at 9 a.m., every fifteen minutes, or on any cron expression. Skarta keeps the clock.
Schedule - Webhook-triggered agents with in-runtime HTTP ingress. An agent that wakes up when Stripe charges a card, GitHub opens an issue, or Slack receives a message. Skarta hosts the URL, drops duplicate POSTs (replay dedupe), and answers either right away (
202 Accepted) or after the agent finishes. New webhook URLs go live via one API call, no redeploy (dynamic webhooks).Webhook - Outbound events (pub/sub). When one agent finishes and another part of your system needs to know. Your refund agent emits
billing.refund_issued; an audit logger written in Rust and a Slack notifier written in Python both pick it up; replays are dropped so the audit row is written once.Event
Memory, skills, and HITL (human-in-the-loop)
- Conversational memory with branching and checkpoints. Conversations that survive restarts. Save a chat at the moment the customer said "yes", try a different reply on the side, snap back to the saved moment if you don't like the new one. Keep sessions in a file on disk, in SQLite, in Postgres, or in any pluggable session store you write.
Memory - Agent Skills with progressive disclosure. Drop a folder of instructions next to your agent and it picks them up only when the work calls for them. A refund-policy folder, a brand-voice folder, a regulatory-checklist folder. Same format Claude, Cursor and others read; the open Agent Skills standard.
Skill - HITL approval gates. About to refund $5,000, sign a contract, or run a destructive database query? Skarta pauses the agent (
requires_approval=True), pings a human in Slack or email, and only continues on a thumbs-up. Hand the approval call to your own webhook if you have one.Approval
Production guardrails the runtime enforces for you
- Hard budget caps (call / agent / workflow). Cap a single agent's spend at $0.50, cap a full workflow at $50, cap a customer at $5,000 a day. Skarta refuses the next model call the moment a cap is hit. A runaway prompt can't burn your OpenAI bill.
Budget - Typed retry policies with backoff. Anthropic returned a 429? OpenAI timed out? Skarta tries again with the wait pattern you choose, gives up cleanly so callers always see the same outcome. Pick which error codes count as retryable.
Retry - Tool permissions and per-tool sandboxing, runtime-enforced. An agent that should only call
api.stripe.comcan't reach anywhere else, even if a prompt injection tells it to. Declare the file paths, hostnames, and environment variables an agent may touch; the runtime blocks the rest.PermissionsandSandbox - Idempotency for replay-safe steps. Stripe replays the same webhook three times after a glitch; your charge agent runs once. Mark the step
@idempotent, point Skarta at a database-backed idempotency store, and duplicate work is dropped even if Skarta restarted in between.@idempotent - Lifecycle hooks for observability and policy. Run code before every model call (redact PII from prompts, log to Datadog) or after every tool call (audit, block on a policy). Six hook points, decorator-style.
@before_model,@after_tool, and friends
What you won't find in any Python-library agent framework
- Rust runtime, single binary, one install. One
pip install skartaships a Rust runtime under your Python code. No second framework, no "make it production-ready" project. - Auto-DAG from data flow. No flow charts to wire, no
depends_onfields, no edge declarations. Tell Skarta which agent takes what kind of data and produces what kind, and it derives the DAG itself. Change a data shape and the order updates with no code change. - Cycle detection at submit. A circular workflow gets caught before Skarta makes a single model call. You see exactly which step is looping back to which.
- LLM-driven orchestration (planner-as-tool), multiple orchestrators co-existing. Let GPT or Claude design the workflow. Hand the LLM your goal and the list of agents available; it picks who runs in what order; Skarta runs the plan. Keep several different planners (support, sales, research) under one runtime, each with its own goals and tools.
- Race + cancel losers in one binding. Send the same hard question to a fast cheap model and a slow smart one. Take whichever finished first. Skarta cancels the loser so you don't pay for the answer you threw away.
- Schema validation at five gates, schema-evolution warnings on re-register. Every handoff between agents is type-checked. A model that hallucinates a missing field gets caught before its reply ever reaches the next agent. Re-deploy a worker with a new input shape and Skarta warns the moment an old caller is about to break, before the next workflow runs.
- In-runtime HTTP ingress for webhooks, with replay dedupe and dynamic URLs. The webhook URLs that Stripe and GitHub call are hosted inside Skarta itself. No Flask, no FastAPI, no nginx in front. Duplicate POSTs are dropped for you; new URLs go live without a redeploy.
- Hot worker reload. Ship a bug fix to one agent and the runtime swaps the worker in place; in-flight workflows keep going.
- Observability in the same wheel. Prometheus metrics. OpenTelemetry traces to Datadog / Honeycomb / Grafana.
/healthand/readyzfor Kubernetes liveness probes. All in the samepip install.
If a workflow can be described in plain language and split into steps, you can ship it on Skarta. The primitives above (Agent, Pipeline, Process, Team, Room, Orchestrator, Schedule, Webhook, Event, Memory, Skill, Approval) are the LEGO bricks. Below is a sample of what teams are building with them, organised by where the agent earns its keep. It's illustrative, not exhaustive. The ceiling is your imagination.
Customer-facing agents
- Agentic customer-support automation with HITL. A triage agent reads each ticket, refunds go to one handler and complaints to another, a draft reply waits for a human's thumbs-up in Slack, and the agent never spends more than a few cents per ticket.
- Sales and lead qualification. An agent reads inbound leads from your CRM, scores them against BANT or MEDDIC, drafts a personalised first-touch email, and routes high-value leads to a human SDR with spend capped per lead.
- Customer-success copilots with persistent sessions. Conversational memory per customer, full history surviving restarts. The copilot picks up exactly where the conversation ended a week or a month ago.
- Streaming chat copilots in your product. Tokens stream to your UI as the model thinks. Conversations survive page refreshes and server restarts.
Internal-team copilots
- HR and policy Q&A. Drop your HR handbook in as an Agent Skill (progressive disclosure) and the agent answers employee questions citing the exact policy, with no fine-tuning required.
- Onboarding walkthroughs. A copilot that walks new hires through their first-week tasks, calling your provisioning APIs as
Tools, pausing for a human's thumbs-up before granting elevated access (Approval). - Internal knowledge bases. Per-team budget caps so one team's experiments can't burn the shared OpenAI bill.
Engineering and DevOps
- Code-review agents. GitHub fires the moment a PR opens; the agent reviews the diff, comments inline, flags risk for a human reviewer. Replay-safe so a redelivered GitHub webhook doesn't post duplicate comments (
@idempotent). - Incident-response agents. PagerDuty fires; an event-driven agent pulls related metrics from Datadog, summarises the incident, drafts a status-page update, and pings on-call in Slack.
- Release-note generators. A nightly cron agent scans merged PRs, drafts release notes grouped by feature, posts them for review.
- Migration and refactor agents. A multi-agent team (planner + executor + verifier) proposes database migrations, runs them in a sandbox, rolls back on any test failure.
Data, analytics, and reporting
- Document extraction at scale. PDF in, structured JSON out, validated against your Pydantic schema. Parallel fan-out lets you process 500 invoices at once.
- Cross-source reporting. A
Processpulls from your data warehouse, your CRM, and your billing system in parallel; a writer agent fuses the answers into a one-page weekly brief on aSchedule. - Anomaly explanation. A metrics-watching agent wakes up on a Datadog alert, pulls related logs and traces, writes a one-paragraph hypothesis for the on-call engineer.
- Deep-research agents with parallel fan-out and quality loops. One agent breaks a question into sub-questions, ten agents research them in parallel, a writer fuses the findings, a critic iterates the draft until the quality score crosses a bar.
Finance and operations
- Invoice processing. An OCR tool extracts line items, an agent matches them to purchase orders, anything over $5,000 goes to a human in Slack (
Approval), and the final approval emits aninvoice.approvedevent your ERP listens for. - Reconciliation. A daily agent walks ledger entries against bank statements, flags mismatches, drafts journal corrections for human approval.
- Fraud-flagging. Every transaction triggers an event-driven agent that scores it against historic patterns and either lets it through or pauses it for review.
- Compliance and audit. Agents check documents against regulations loaded as
Skills; every model call lands in the audit log automatically.
Content, marketing, and growth
- Editorial pipelines. Brief → outline → draft → SEO check → image-prompt generation → publish. Each step is an
Agent; Skarta derives the auto-DAG from your data shapes. - Pricing intelligence. A scheduled agent scrapes competitor pricing, compares to yours, writes a daily Slack summary with recommendations.
- Personalised outreach. Per-customer drafts that pull from a
Memoryof every prior conversation and aSkillfolder of your brand-voice rules. - SEO and content audits. Multi-agent teams that crawl, score, and propose changes across thousands of pages with budget caps so the AWS bill doesn't run away.
Platform and infrastructure
- Multi-language agent extensions. Your Python agents talk to a Rust agent owned by your platform team. Both deploy independently. Neither restart takes the other down.
- Self-improving agent loops. A draft agent and a critic agent in a
Process.loop, iterating until the quality score crosses a bar. Autonomous quality control without manual review. - Event-driven workflows across your stack. Agents that wake up when Stripe charges a card, GitHub opens a PR, or Slack receives a message. New webhook URLs created at runtime, no redeploy.
Skarta gives you the engine: scheduling, validation, budgets, sessions, observability. What you put on top is up to you.
Skarta isn't a library you import and patch. Your code runs in its own process and talks to Skarta over a typed connection. That means a teammate can write a new agent in Rust, plug it in tomorrow, and your Python agents call it as if it were local. No fork. No upstream PR. No co-ordinated deploy.
The Python SDK has two layers, both supported, both documented.
The everyday surface (what 95% of users write). One import per idea:
| Building blocks | What you reach for them to do |
|---|---|
LLM, Agent, Tool, Step |
A single agent with tool calling and structured outputs (Agent), or a deterministic non-LLM step (Step) like a database lookup. |
Pipeline, Process |
Chain agents in a fixed order with auto-DAG (Pipeline), or build an auto-orchestrated workflow with conditional branching, parallel fan-out, race, retries, and loops (Process). |
Team, Room |
Multi-agent collaboration: free-form discussion (Team), or a named multi-agent conversation with guests, observers, and a chair (Room). |
Orchestrator |
LLM-driven orchestration (planner-as-tool). Hand the LLM your goal and the agents available; it picks who runs in what order. |
Memory, Skill |
Conversational memory with branching and named checkpoints (Memory). Agent Skills with progressive disclosure (Skill). |
Schedule, Webhook, Event |
Scheduled agents (cron), webhook-triggered agents (in-runtime HTTP ingress), and event-driven agents (outbound pub/sub bus). |
Approval |
HITL approval gates on any tool: Slack, email, or a custom delivery webhook. |
Budget, Retry |
Hard budget caps at call, agent, and workflow level. Typed retry policies with backoff. |
Permissions, Sandbox |
Runtime-enforced tool permissions and per-tool sandboxing. Filesystem, network, and env-var allowlists. |
App |
Long-lived service lifecycle. Wrap everything in async with App() as app: .... |
@idempotent + App.idempotency_store(...) |
Idempotent steps for replay-safe webhook handlers. Durable dedupe across restarts when backed by a database store. |
before_model, after_tool, ... |
Lifecycle hooks for observability and policy (log to Datadog, redact PII, block suspicious calls). |
Going deeper. When the everyday surface doesn't cover a case, drop one layer. Eight places you can plug into the runtime from any supported language:
- Tools Functions you write in Python or Rust that an LLM can call: look up an order, hit your API, write to your database, send a Slack message.
- Workers Custom agents that need more control than the
Agentwrapper gives. - Orchestrators Your own LLM planner when the built-in one doesn't fit your domain. Run several different planners side by side under one Skarta.
- Interceptors Run code at six points in an agent's life: before and after each model call, before and after each tool call, on context changes, on compaction. Log, redact, or refuse the operation.
- Context providers Inject extra messages or system text into the model's context. Think per-customer personas pulled from a database, or session-specific guardrails.
- Skills Drop a folder of instructions next to your agent. The agent picks it up only when the work calls for it. Same format Claude, Cursor and others read, the open Agent Skills standard.
- Webhooks Skarta hosts the URLs Stripe, GitHub, or your backend POSTs to. It drops duplicate POSTs, sends the request to your handler, and replies right away or after the work finishes, your call. New customer URLs go live without a redeploy.
- Session storage Swap the default file or database backend for Redis, S3, or any other store you can write a thin adapter for.
If an agent crashes, only that agent restarts. Ship a fix to one agent while the rest of the service keeps running. Skarta itself stays the same Rust binary; everything above is yours to swap, add, or replace. See docs/concepts.md for the full surface.
Most agent frameworks are Python libraries. They are great for the first demo and painful the day a real customer uses them. Every workflow lives inside one Python process, a single crash takes the whole thing down, a single retry forgets where it was, observability is something you bolt on, spend tracking is whatever the SDK happens to print.
Skarta is a different shape. The runtime is one Rust binary that owns the hard parts: scheduling, validation, budgets, sessions, persistence, telemetry, and access control. Your agent code lives in its own process, in any supported language, and talks to the runtime over a typed wire protocol.
- Crash isolation per extension. An agent crashing doesn't take the runtime down or lose your other in-flight workflows.
- Hot reload. Ship a fix to one agent without restarting the rest of the service.
- High-throughput parts written once, in Rust. Scheduling, validation, and the wire protocol aren't re-implemented per language.
Most agent frameworks are Python libraries. You bring the type system, the orchestration, the permissions, and the cost caps. Skarta ships them all.
| Skarta | LangGraph | CrewAI | AutoGen | Pydantic AI | OpenAI Agents | |
|---|---|---|---|---|---|---|
| Foundation | ||||||
| Rust runtime, real parallelism | ● | ○ | ○ | ○ | ○ | ○ |
| OpenTelemetry (OTLP) traces + Prometheus metrics built in | ● | ○ | ○ | ○ | ○ | ○ |
| Structured I/O | ||||||
| Schema-typed worker I/O (schemars / Pydantic) | ● | ◐ | ◐ | ◐ | ● | ◐ |
| Validated at 5 gates (submit, register, dispatch, output, complete) | ● | ○ | ○ | ○ | ◐ | ○ |
| Schema-evolution warnings on re-register | ● | ○ | ○ | ○ | ○ | ○ |
| Typed protocol errors | ● | ◐ | ○ | ◐ | ◐ | ◐ |
| Orchestration | ||||||
| Auto DAG from code, no explicit edges | ● | ○ | ○ | ○ | ○ | ○ |
| Auto dependency resolution from data flow | ● | ○ | ◐ | ○ | ○ | ○ |
| Cycle detection at submit | ● | ○ | ○ | ○ | ○ | ○ |
| LLM emits the DAG (planner-as-tool) | ● | ○ | ◐ | ◐ | ○ | ○ |
| Multiple orchestrators co-exist | ● | ◐ | ○ | ◐ | ○ | ○ |
| Multi-team: workers as global pool, teams as views | ● | ○ | ○ | ○ | ○ | ○ |
| Reliability primitives | ||||||
| Race + cancel losers (one binding) | ● | ○ | ○ | ○ | ○ | ○ |
| Loops with previous-iteration binding | ● | ◐ | ○ | ○ | ○ | ○ |
| Conditions, retries, timeouts, on_failure, fan-out | ● | ◐ | ◐ | ○ | ◐ | ○ |
| Validation policy ladder (permissive override) | ● | ○ | ○ | ○ | ○ | ○ |
| Skipped + cancelled nodes surfaced with reason | ● | ○ | ○ | ○ | ○ | ○ |
| Webhooks (HTTP ingress) | ||||||
| In-runtime HTTP ingress for inbound webhooks (no FastAPI in front) | ● | ○ | ○ | ○ | ○ | ○ |
| Replay dedupe + sync/async response modes built in | ● | ○ | ○ | ○ | ○ | ○ |
| Dynamic webhooks: URLs created at runtime via admin RPC, persisted | ● | ○ | ○ | ○ | ○ | ○ |
| Cost and permissions | ||||||
| Hard cost caps (call / agent / workflow) | ● | ○ | ○ | ○ | ○ | ○ |
| Tool permissions (path / network / env, runtime-enforced) | ● | ○ | ○ | ○ | ○ | ◐ |
| State, hooks, skills, tooling | ||||||
| Session branching + named checkpoints | ● | ◐ | ○ | ○ | ○ | ○ |
| Lifecycle hooks (model call, context mutation, compaction) | ● | ◐ | ○ | ◐ | ◐ | ○ |
| On-demand skills (progressive disclosure) | ● | ○ | ○ | ○ | ○ | ○ |
| Dependency viz (Mermaid / DOT / JSON / ASCII) | ● | ◐ | ○ | ○ | ○ | ○ |
| Model registry RPC | ● | ○ | ○ | ○ | ○ | ○ |
Legend:
- ● in the box
- ◐ via developer code or paid tier
- ○ not available
Frameworks evolve. Open an issue if any cell drifts.
| Capability | What you get |
|---|---|
| Persistence | Start in-memory on day one, perfect for development. Flip one env var and every conversation, budget, and audit row gets saved to SQLite or Postgres. Your code doesn't change. |
| Observability | /health and /readyz for Kubernetes liveness probes. /metrics in Prometheus format. Structured JSON logs. OpenTelemetry (OTLP) traces to Datadog, Honeycomb, or Grafana Tempo. |
| Permissions | Every agent declares the file paths, hostnames, and environment variables it may touch. The runtime blocks everything else. A prompt-injected agent still can't reach a system you didn't authorise. |
| Budgets | Tokens and dollar cost tracked per worker, per workflow, per call. Soft ceilings warn; hard ceilings stop the next model call before it goes out the door. |
| Hot reload | Ship a bug fix to one agent and Skarta swaps the worker in place; in-flight workflows keep running. Schema-evolution warnings fire the moment a new version's input shape changes, before old callers silently break. |
| Webhooks | In-runtime HTTP ingress hosts the URLs Stripe, GitHub, and your backend POST to. Replay dedupe by a header you name. Sync (await the DAG) or async (202 Accepted) response modes. New URLs go live without a redeploy. |
| Sync and async | Every RPC ships on both skarta.Client (sync) and skarta.AsyncClient (async/await). Pick whichever fits your codebase. |
pip install skarta # runtime + Python SDK in one wheelCustom OpenAI-compatible endpoints (Ollama, vLLM, Azure OpenAI, etc.) work out of the box via LLM(model=..., api_key=..., base_url=..., provider="openai_compat").
Two reading lanes. Pick Default if you are building something with Skarta. Pick Advanced if the wrapper does not give you the control you need or if you are reading SDK internals. docs/README.md is the full index.
Default surface (start here):
- Quickstart Three-line hello agent, ten-line typed pipeline, twenty-five-line multi-agent team.
- Getting started First real workflow, walked end-to-end: LLM calls, streaming, sessions.
- Agents Full
Agentreference. - Pipelines and Processes
Pipeline/Process, special-shape constructors, branch hooks. - Teams and Rooms Free-form and declarative multi-agent collaboration.
- Triggers
Schedule,Webhook,Eventdecorators with one runnable example each. - Idempotency
Process.add(idempotent=...)shorthand andApp.idempotency_store(...)configurator. - HITL approval Slack / email / custom approval gates for any tool.
- Concepts The core ideas plus the Default vs Advanced framing.
- Configuration The full env-var +
framework.tomlreference. - Architecture Runtime + extensions model, the wire protocol.
Advanced / lower-level control:
- Python SDK reference Comprehensive reference. The high-level wrapper table is at the top; the lower-level
@worker/@tool/Client/Extension/bindings_dsl/FrameworkErrorreference is fenced under the trailing Advanced section. - Rust SDK reference For native Rust extensions to Skarta.
- Every Default page above also carries a trailing
## Advanced (lower-level reference)section that documents the lower-level surface for that topic.
Designed, developed, and maintained by Chirotpal
