声网对话式 AI 引擎支持自定义大语言模型(LLM)功能,您可以参考此项目代码自定义实现大语言模型功能。
本文档是实现自定义大语言模型功能的 Python 示例代码
- Python 3.10+
创建 Python 虚拟环境:
python3 -m venv venv
source venv/bin/activatepip install -r requirements.txtpython3 custom_llm.py当前服务器开始运行, 你将会看到下面的输出:
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)使用下面的命令来测试服务器:
curl -X POST http://localhost:8000/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_LLM_API_KEY" \
-d '{"messages": [{"role": "user", "content": "Hello, how are you?"}], "stream": true, "model": "gpt-4o-mini"}'测试服务器时,我们建议使用如 ngrok 等隧道工具将本地服务器暴露到互联网。
flowchart LR
Client-->|POST Request|Server
subgraph Server[Custom LLM Server]
Basic["chat/completions"]
RAG["rag/chat/completions"]
Audio["audio/chat/completions"]
end
Server-->|SSE Response|Client
Server-->|API call|OpenAI[OpenAI API]
OpenAI-->|Stream Response|Server
subgraph Knowledge
KB[Knowledge Base]
end
RAG-.->|Retrieval|KB
关于三个 API 接口端点及其请求流程的更多详情,请参阅请求流程图部分。
要成功接入声网对话式 AI 引擎,你的自定义大模型服务必须提供一个与 OpenAI Chat Completions API 兼容的接口。
/chat/completions API 端点实现基本的聊天补全功能。
如果您希望提高代理响应的准确性和相关性,可以使用检索增强生成(RAG)功能。这使您的自定义大模型能够从特定知识库中检索信息,并将检索结果作为上下文提供给大模型生成答案。
/rag/chat/completions API 端点展示了使用基于内存的知识存储库实现的简单 RAG 功能。
多模态大语言模型可以处理和生成文本、图像和音频内容。
/audio/chat/completions API 端点模拟带有文本和音频数据块的音频响应。
sequenceDiagram
participant Client
participant Server as Custom LLM Server
participant OpenAI
Client->>Server: POST /chat/completions
Note over Client,Server: With messages, model, stream params
Server->>OpenAI: Create chat.completions stream
loop For each chunk
OpenAI->>Server: Streaming chunk
Server->>Client: SSE data: chunk
end
Server->>Client: SSE data: [DONE]
sequenceDiagram
participant Client
participant Server as Custom LLM Server
participant KB as Knowledge Base
participant OpenAI
Client->>Server: POST /rag/chat/completions
Note over Client,Server: With messages, model params
Server->>Client: SSE data: "Waiting message"
Server->>KB: Perform RAG retrieval
KB->>Server: Return relevant context
Server->>Server: Refactor messages with context
Server->>OpenAI: Create chat.completions stream with context
loop For each chunk
OpenAI->>Server: Streaming chunk
Server->>Client: SSE data: chunk
end
Server->>Client: SSE data: [DONE]
sequenceDiagram
participant Client
participant Server as Custom LLM Server
participant FS as File System
Client->>Server: POST /audio/chat/completions
Note over Client,Server: With messages, model params
alt Files exist
Server->>FS: Read text file
FS->>Server: Return text content
Server->>FS: Read audio file
FS->>Server: Return audio data
Server->>Client: SSE data: transcript
loop For each audio chunk
Server->>Client: SSE data: audio chunk
Note over Server,Client: With small delay between chunks
end
else Files not found
Server->>Server: Generate simulated response
Server->>Client: SSE data: simulated transcript
loop For simulated chunks
Server->>Client: SSE data: random audio data
Note over Server,Client: With small delay between chunks
end
end
Server->>Client: SSE data: [DONE]
- 📖 查看我们的 对话式 AI 引擎文档 了解更多详情
- 🧩 访问 Agora SDK 示例 获取更多教程和示例代码
- 👥 在 Agora 开发者社区 探索开发者社区管理的优质代码仓库
如果您在集成过程中遇到任何问题或有改进建议:
- 🤖 可通过 声网支持 获取智能客服帮助或联系技术支持人员
本项目采用 MIT 许可证 (The MIT License)。