| sidebar_position | 1 |
|---|---|
| title | Overview |
| description | The Conversational AI Python SDK — install, concepts, and examples. |
The Conversational AI Python SDK lets you build voice-powered AI agents that combine speech recognition, large language models, text-to-speech, and optional digital avatars — all routed through a global real-time network.
The Python SDK provides two client classes:
AgentClient— synchronous client backed byhttpx.Client. Use this in scripts, Flask apps, or anywhere synchronous code is natural.AsyncAgentClient— asynchronous client backed byhttpx.AsyncClient. Use this inasyncioapplications, FastAPI, or any async framework.
Both clients expose the same capabilities. Choose whichever fits your application's concurrency model.
Speech-to-text transcribes audio, a text LLM generates a response, and text-to-speech renders it as audio. This is the most flexible flow — you can mix and match vendors for each stage.
A multimodal model handles audio input and output end-to-end, with no separate STT/TTS step. (Not yet available in the domestic API.)
+--------------------------------------------------+
| Developer API |
| Agent · AgentSession · Vendors · Token | <- agentkit/ (hand-written)
+--------------------------------------------------+
| AgentClient / AsyncAgentClient + Pool | <- pool_client.py (hand-written)
+--------------------------------------------------+
| Fern-generated Client Core |
| AgentsClient · TelephonyClient · TypeSystem | <- generated, for advanced use
+--------------------------------------------------+
The agentkit layer (agent.agentkit) is the primary developer-facing API. It provides the Agent builder, AgentSession lifecycle, typed vendor classes, and token helpers. You rarely need to use the Fern-generated layer directly, but it is accessible via session.raw when needed.
| Section | What you will learn |
|---|---|
| Installation | Install the SDK and prerequisites |
| Authentication | Configure client credentials |
| Quick Start | Build and run your first agent |
| Architecture | Understand the SDK layers and client types |
| Agent | Configure agents with the fluent builder |
| AgentSession | Manage the agent lifecycle |
| Vendors | Browse all LLM, TTS, STT, and Avatar providers |
| Cascading Flow | Build an ASR → LLM → TTS pipeline |
| MLLM Flow | Multimodal flow (reserved for future) |
| Avatars | Add a digital avatar with Sensetime |
| Regional Routing | Route requests to the nearest region |
| Error Handling | Handle API errors with ApiError |
| Pagination | Iterate over paginated list endpoints |
| Advanced | Raw response, retries, timeouts, custom httpx client |
| Low-Level API | Direct client.agents.start() without the builder |
| Client Reference | Full AgentClient/AsyncAgentClient API |
| Agent Reference | Full Agent builder API |
| Session Reference | Full AgentSession/AsyncAgentSession API |
| Vendor Reference | Constructor options for all vendor classes |