From 7e36dfa17423ec1691d633598eee415fb766ec6a Mon Sep 17 00:00:00 2001 From: chenyuguo Date: Wed, 25 Mar 2026 16:54:40 +0800 Subject: [PATCH 1/2] feat: convoai integration improve --- .gitignore | 1 + skills/voice-ai-integration/SKILL.md | 44 +- skills/voice-ai-integration/intake/README.md | 144 ++--- skills/voice-ai-integration/intake/convoai.md | 302 ----------- .../references/conversational-ai/README.md | 185 ++----- .../advanced-feature-routing.md | 123 +++++ .../conversational-ai/common-errors.md | 2 +- .../conversational-ai/convoai-restapi.md | 49 -- .../conversational-ai/generation-rules.md | 39 ++ .../conversational-ai/quickstart-intake.md | 500 ++++++++++++++++++ .../conversational-ai/request-modes.md | 109 ++++ .../conversational-ai/sample-repos.md | 22 +- 12 files changed, 909 insertions(+), 611 deletions(-) delete mode 100644 skills/voice-ai-integration/intake/convoai.md create mode 100644 skills/voice-ai-integration/references/conversational-ai/advanced-feature-routing.md delete mode 100644 skills/voice-ai-integration/references/conversational-ai/convoai-restapi.md create mode 100644 skills/voice-ai-integration/references/conversational-ai/generation-rules.md create mode 100644 skills/voice-ai-integration/references/conversational-ai/quickstart-intake.md create mode 100644 skills/voice-ai-integration/references/conversational-ai/request-modes.md diff --git a/.gitignore b/.gitignore index 39cf32a..8039fc6 100644 --- a/.gitignore +++ b/.gitignore @@ -4,6 +4,7 @@ # Editor .vscode/ .idea/ +.kiro/ # Local testing *.local.md diff --git a/skills/voice-ai-integration/SKILL.md b/skills/voice-ai-integration/SKILL.md index f87c487..606513a 100644 --- a/skills/voice-ai-integration/SKILL.md +++ b/skills/voice-ai-integration/SKILL.md @@ -27,38 +27,30 @@ This file is the documentation index — all doc lookups depend on it. Do NOT proceed to Step 1 until this file exists or the download has been attempted. If download fails, proceed with local reference docs and fallback URLs. -### Step 1: Collect kickoff information +### Step 1: Analyze the user's need and choose the product module -Use [intake](intake/README.md) to collect kickoff information. +Use [intake](intake/README.md) only for lightweight needs analysis and product routing. Ask only for details the user has not already provided. -Collect only the details needed to remove implementation blockers: -- User's use case / target solution -- Main Shengwang product -- Platform or client stack -- Backend language if relevant -- Any key technical details already known that affect routing or implementation +Collect only the details needed to determine: +- the user's use case / target solution +- the primary Shengwang product +- any supporting Shengwang products +- one remaining routing blocker, if the product is still unclear Use a conversational flow: - Infer obvious context from the user's request when it is safe to do so -- Ask only for missing details that block routing or implementation -- Stop asking as soon as there is enough information to continue +- Ask only for missing details that change product routing +- Do not ask product-specific configuration questions in the root router +- Stop asking as soon as the correct product module is clear -ConvoAI has a special intake mode: -- If ConvoAI is clearly the primary product, switch to the consolidated ConvoAI intake in [intake/convoai.md](intake/convoai.md) -- Ask for all unresolved kickoff fields plus unresolved ConvoAI provider/config fields in one message -- Show numbered choices for each unresolved field and ask for a one-line numeric reply -- Do not repeat fields the user already answered -- For ConvoAI implementation, prefer the official sample repo, `agent-server-sdk` on the server side, and `agora-agent-client-toolkit` on the client side when the target stack supports it, over building directly from the REST spec -- Treat raw REST as a fallback only for unsupported operations, debugging, or when the user explicitly asks for a REST-first implementation +ConvoAI has a dedicated product module: +- If ConvoAI is clearly the primary product, route to [references/conversational-ai/README.md](references/conversational-ai/README.md) +- The ConvoAI module handles its own internal routing through `request-modes.md` and the appropriate sub-flow +- Do not duplicate ConvoAI-specific quickstart, advanced-feature, or debugging logic in the root router -For ConvoAI, the user must still explicitly answer or confirm any unresolved `Other` follow-up value before implementation. - -For unresolved ConvoAI fields with defaults, keep them visible and treat omission as an explicit default confirmation. This includes `Platform = Web` and `Backend = Python`. -If the first consolidated reply is incomplete, ask only a narrow follow-up for the unresolved mandatory blocker. - -If the user already gave enough information, do not repeat questions. -Produce a lightweight kickoff recap, then continue automatically unless a required detail is still missing. +If the product mapping is already clear, do not ask extra questions. +Produce a lightweight routing recap, then continue automatically unless one routing blocker is still missing. ### Step 2: Start with local references @@ -72,7 +64,7 @@ If the available information is sufficient, begin implementation using the exist | Credentials, AppID, REST auth | [general](references/general/credentials-and-auth.md) | | Download SDK, sample project, Token Builder, GitHub repo | Route to the relevant product module | | Generate Token, token server, AccessToken2, RTC/RTM auth | [token-server](references/token-server/README.md) | -| ConvoAI operation (with details already known) | [conversational-ai](references/conversational-ai/README.md) for SDK/sample-first guidance; use REST docs only as fallback reference | +| ConvoAI voice agent work | [conversational-ai](references/conversational-ai/README.md) for module entry, internal routing, and SDK/sample-first guidance | | RTC SDK integration | [rtc](references/rtc/README.md) | | RTM messaging / signaling | [rtm](references/rtm/README.md) | | Cloud Recording | [cloud-recording](references/cloud-recording/README.md) | @@ -95,7 +87,7 @@ Once Step 3 provides enough information, proceed with implementation. ## Download Rules -- Use `git clone --depth 1 ` — GitHub URLs must be repo root only (no branch/subdirectory paths) +- Use `git clone --depth 1 ` with an HTTPS repo URL by default — GitHub/Gitee URLs must be repo root only (no branch/subdirectory paths) - On any download failure: report the error, provide the URL for manual download, never silently skip ## Links diff --git a/skills/voice-ai-integration/intake/README.md b/skills/voice-ai-integration/intake/README.md index 80c2eb1..113425b 100644 --- a/skills/voice-ai-integration/intake/README.md +++ b/skills/voice-ai-integration/intake/README.md @@ -1,46 +1,43 @@ -# Shengwang Intake — Kickoff Information Collection +# Shengwang Intake — Needs Analysis & Product Routing -First entry point for requests that still need a small amount of information -before implementation can begin. +First entry point for requests that still need lightweight product routing before +implementation begins. > **Note:** Step 0 doc-index setup is defined in [SKILL.md](../SKILL.md). -> If you are here, Step 0 has already been handled and the root router needs -> a lightweight kickoff summary before moving into implementation research. +> If you are here, Step 0 has already been handled and the root router now needs +> only enough information to decide which Shengwang product module should take over. --- ## Goal -Collect only the minimum missing information needed to proceed. -Do not run a broad discovery interview. Do not ask the user to confirm a full -solution design before continuing. +Use this intake to do **needs analysis and product routing**, not product-specific +solution design. -Ask only for unanswered details that materially affect routing or implementation: -- Use case / target solution -- Main Shengwang product -- Platform or client stack -- Backend language if relevant -- Any key details already known that affect the next step +The top-level intake should answer only these questions: +- What is the user trying to build? +- Which Shengwang product is primary? +- Which supporting products are likely needed? +- Is there one remaining blocker that is still required to choose the right module? -Once those details are gathered, produce a short kickoff summary and continue -to Step 2 automatically unless a required field is still missing. +Do **not** use this layer to collect provider choices, auth strategy, SDK details, +project structure, vendor configuration, or other product-internal implementation +choices unless one of those details is the only remaining routing blocker. -When ConvoAI is clearly the primary product, replace turn-by-turn kickoff with -the consolidated ConvoAI intake in [convoai.md](convoai.md). In that mode, the -assistant should gather unresolved kickoff fields and unresolved ConvoAI provider -choices in one message, then convert the reply into the structured spec. +If the primary product is already obvious from the user's request, do not ask extra +questions here — route directly to the product module. ## Interaction Style -The intake should stay concise and targeted. +The intake should stay concise and routing-focused. - Prefer natural wording over an interview script -- Ask only for missing information -- For non-ConvoAI flows, ask in priority order and stop early once there is enough information -- For ConvoAI-primary flows, send one consolidated checklist covering unresolved fields, including kickoff fields and optional-default provider fields -- Do not ask "nice to have" questions during kickoff +- Ask only for missing information that changes product routing +- Ask at most one routing blocker at a time when the product is still unclear +- Do not ask product-specific configuration questions at the top level +- Do not propose project structures, implementation plans, or framework choices here - If a detail is obvious from the user's message, infer it instead of asking again -- After each answer, decide whether to continue or route onward +- As soon as the product mapping is clear, route onward ## Product Routing Aid @@ -52,6 +49,8 @@ Use this only to map the user's use case to the likely product set. | RTM | Real-time messaging / signaling | "聊天", "消息", "chat", "signaling", "notification" | | ConvoAI | AI voice agent (ASR→LLM→TTS over RTC) | "AI语音", "voice bot", "对话式AI", "AI agent" | | Cloud Recording | Record RTC sessions server-side | "录制", "recording", "存档" | +| Token generation | Generate RTC / RTM tokens | "token", "鉴权", "token server" | +| Credentials / Auth | Console credentials, REST auth, service activation | "App ID", "Customer Key", "REST auth", "开通服务" | ### Common combinations @@ -67,56 +66,33 @@ Use this only to map the user's use case to the likely product set. ## Intake Flow -### Step 1: Ask only for missing kickoff details +### Step 1: Determine product routing Start from the user's existing message. Do not repeat information they already gave. -Use the shortest set of prompts needed to fill the gaps. - Priority order: -- Use case -- Main product, if unclear -- Platform / client stack, if relevant -- Implementation mode, when a matching ConvoAI sample repo exists and the user explicitly wants to opt out of the default sample-aligned path -- Backend language, if relevant -- One additional blocker only if it materially affects implementation - -ConvoAI exception: -- If ConvoAI is clearly the primary product, do not stretch kickoff across multiple turns -- Route immediately to [convoai.md](convoai.md) and ask for all unresolved kickoff and ConvoAI provider fields in one checklist-style message -- Include kickoff fields only if still missing, such as use case, platform, backend language, or implementation mode -- Mention that ConvoAI prefers the official sample path, `agent-server-sdk` on the server side, and `agora-agent-client-toolkit` on the client side when possible +- Use case / target solution +- Primary product, if still unclear +- Supporting product, if the use case clearly requires one +- One routing blocker only if it is still needed to choose the right module -Short prompt examples: +ConvoAI handoff: +- If ConvoAI is clearly the primary product, route directly to [conversational-ai](../references/conversational-ai/README.md) +- Do not expand ConvoAI-specific quickstart, provider, auth, or sample questions here +- Let the ConvoAI module handle `request-modes.md` and its internal sub-flows +Short prompt examples: - Use case: - ZH: "你想做什么场景?" - EN: "What are you trying to build?" -- Main product: +- Primary product: - ZH: "你主要想用 RTC、RTM、ConvoAI,还是录制?如果不确定我可以帮你判断。" - EN: "Are you mainly using RTC, RTM, ConvoAI, or recording? If you're not sure, I can infer it." -- Platform / client stack: - - ZH: "目标平台是什么,比如 Web、iOS、Android?" - - EN: "What platform are you targeting, such as Web, iOS, or Android?" -- Implementation mode, when a matching ConvoAI sample repo exists: - - ZH: "默认会按官方 quickstart / sample 结构走;如果你想改成最小化自定义实现,再告诉我。" - - EN: "I’ll default to the official quickstart/sample structure unless you specifically want a minimal custom implementation." -- Backend language, when relevant: - - ZH: "服务端准备用什么语言?" - - EN: "What backend language are you using?" - -Ask follow-up only when a missing detail affects routing or implementation. +- Supporting product: + - ZH: "除了主链路外,还需要聊天、录制,或者 token 服务吗?" + - EN: "Besides the main flow, do you also need chat, recording, or a token service?" -### Step 2: Determine product mapping - -From the user's answers, determine: -- Primary product -- Supporting products, if required -- Any remaining gaps that block implementation - -Use the routing aid above to infer combinations. - -### Step 3: Produce kickoff summary +### Step 2: Produce routing summary Present a short progress recap in the user's language: @@ -127,9 +103,7 @@ Present a short progress recap in the user's language: 场景: [use case] 主要产品: [primary product] 配套产品: [supporting products / 无] -平台: [platform / client stack] -服务端语言: [backend language / 不涉及] -下一步: [go to implementation research / ask one blocker] +下一步: [route to the product module / ask one routing blocker] ───────────────────────────── ``` @@ -140,35 +114,27 @@ What I have so far Use case: [use case] Primary: [primary product] Supporting: [supporting products / none] -Platform: [platform / client stack] -Backend: [backend language / not needed] -Next: [go to implementation research / ask one blocker] +Next: [route to the product module / ask one routing blocker] ───────────────────────────── ``` Do not stop for a separate confirmation step. -- If no required detail is missing -> continue automatically to Step 2 in the root workflow. -- If a required detail is still missing -> ask only for that blocker, then continue. - -For ConvoAI-primary flows, the kickoff summary may be merged into the ConvoAI -spec output if that is clearer than producing two separate recaps. - -### Step 4: Route onward +- If the product mapping is clear -> continue automatically to the product module +- If one routing blocker is still missing -> ask only for that blocker, then continue -For each identified product, route to its detail collection: +### Step 3: Route onward -| Product | Detail intake | Product module | -|---------|--------------|---------------| -| ConvoAI | [intake/convoai.md](convoai.md) | [conversational-ai](../references/conversational-ai/README.md) | -| RTC SDK | — | [rtc](../references/rtc/README.md) | -| RTM | — | [rtm](../references/rtm/README.md) | -| Cloud Recording | — | [cloud-recording](../references/cloud-recording/README.md) | -| Credentials / Auth | — | [general](../references/general/credentials-and-auth.md) | -| Token generation | — | [token-server](../references/token-server/README.md) | +For each identified product, route to its product module: -> Products without a detail intake (marked "—") go directly to the product module. -> The module itself should only collect product-specific missing details. +| Product | Product module | +|---------|---------------| +| ConvoAI | [conversational-ai](../references/conversational-ai/README.md) | +| RTC SDK | [rtc](../references/rtc/README.md) | +| RTM | [rtm](../references/rtm/README.md) | +| Cloud Recording | [cloud-recording](../references/cloud-recording/README.md) | +| Credentials / Auth | [general](../references/general/credentials-and-auth.md) | +| Token generation | [token-server](../references/token-server/README.md) | -When multiple products are needed, run the primary product's intake first, +When multiple products are needed, route to the primary product first, then address supporting products in order. diff --git a/skills/voice-ai-integration/intake/convoai.md b/skills/voice-ai-integration/intake/convoai.md deleted file mode 100644 index b28aef0..0000000 --- a/skills/voice-ai-integration/intake/convoai.md +++ /dev/null @@ -1,302 +0,0 @@ -# ConvoAI Detail Collection - -Reached from [intake](README.md) after ConvoAI is identified as the primary product. -This file is for ConvoAI-specific follow-up only. - -## Language Detection - -Detect the user's language from their most recent message: -- If the user writes in **Chinese** → use the **ZH** prompts below -- If the user writes in **English** (or any other language) → use the **EN** prompts below - -Maintain the detected language consistently throughout the entire intake flow. - -## Prerequisites - -Before starting, the user should have: -- Completed the main kickoff intake -- A clear use case description -- Platform / client-stack context already collected if relevant -- Backend language already collected if relevant - -## Questions - -Use a friendly but explicit follow-up flow: -- Ask for all unresolved required fields plus unresolved optional-default fields in one consolidated message -- Keep the message short enough to scan, but complete enough to finish intake in one reply -- Skip anything the user already answered -- Show the available options and recommended default for each unresolved field shown in the prompt -- If the user leaves a blocker unresolved, ask only a narrow repair follow-up for that field - -Defaults policy: -- Platform recommended default: `Web` -- Backend recommended default: `Python` (skip this field entirely for native platforms: iOS, Android, Flutter, Windows, macOS) -- ASR vendor recommended default: `fengming` -- ASR language recommended default: `en-US` for clearly English scenarios, otherwise `zh-CN` -- LLM recommended default: `deepseek` -- TTS recommended default: `bytedance` - -Blocking rule: -- Any selected `Other` value must be clarified in a narrow follow-up -- Platform and Backend are optional when shown with defaults -- **LLM, TTS, ASR vendor, and ASR language are MANDATORY confirmation fields** — they MUST be shown to the user and the agent MUST wait for the user's explicit reply before proceeding to implementation, even if defaults exist. The user may choose the default, but the agent cannot assume it on their behalf. - -Confirmation gate: -- The consolidated intake message MUST always be sent to the user when any of the mandatory confirmation fields (LLM, TTS, ASR vendor, ASR language) have not been explicitly answered by the user. -- Do NOT skip the intake message. Do NOT silently apply defaults for these fields. -- For defaultable fields that are NOT mandatory confirmation fields (Platform, Backend), omission counts as explicit confirmation to use the default. -- For mandatory confirmation fields, omission in the user's reply to the intake message counts as explicit confirmation to use the default — but the intake message itself must have been shown first. - -Ask the full unresolved-fields checklist first. Skip any question the user already answered during main intake -or in the user's initial request. -Doc index status is already determined by the main intake — do not re-check here. - -## Consolidated Intake Message - -When ConvoAI is the clear primary product, combine the unresolved kickoff fields and -the unresolved ConvoAI-specific questions into one message. - -Message requirements: -- Use the user's language consistently -- Start with a one-line recap that ConvoAI prefers the official sample path, `agent-server-sdk` on the server side, and `agora-agent-client-toolkit` on the client side when possible -- Ask only about unresolved fields, including optional-default fields that are still unresolved -- Under each unresolved field, show the supported options inline to reduce prompt height -- Number only the currently visible unresolved fields, starting from `1` -- Mark fields with defaults as optional -- Ask the user to reply once with numeric codes such as `1A 4B 6A` -- Do not mix this with a `key=value` quick-reply example in the same prompt - -If the user already provided enough detail for some fields, do not restate those -questions. Keep the option list only for the unresolved fields. - -Numbering rules: -- Renumber based only on the fields shown in the current prompt -- Do not use stable global IDs across turns -- If a field is already known, omit it and do not reserve its number -- Platform and backend should also be shown whenever they are unresolved, even though they are optional -- LLM, TTS, ASR, and ASR language should still be shown whenever they are unresolved, even though they are optional -- If a visible field has a default, its number may be omitted from the reply - -Parsing rules: -- Parse numeric answers against the current prompt's visible numbering -- Accept sparse one-line replies such as `1A 4B 6A` -- If a visible optional field is omitted, apply its default automatically -- If a visible mandatory field is omitted, ask only for that field -- If a selected option is `Other`, ask a narrow follow-up only for that field -- If a code is invalid or incomplete, ask only for the unresolved item - -Suggested shape: - -**ZH:** -```text -我还缺这几项信息,确认完我就可以继续: -1. [field 1](可选,留空=默认) - A. ... B. ... C. 用默认(...) -2. [field 2] - A. ... B. ... C. 其他,直接写代码 - -补充说明: -- ConvoAI 默认优先走官方 sample;服务端优先用 `agent-server-sdk` -- 客户端优先用 `agora-agent-client-toolkit`,如果目标栈不适配再直接用 RTC SDK 入会 -- Native 平台(iOS / Android / Flutter / Windows / macOS)走多平台 sample repo,客户端直接调 ConvoAI REST API,不需要 `agent-server-sdk` 和 `agora-agent-client-toolkit`,也不需要配套服务端 -- 可选题如果不写,就自动用默认值 -- 你回一行就行,例如:2B 4A;没写出来的可选题会自动用默认 -- 如果你的目标不是 Web,而是 iOS / Android / Electron,也一起按编号回复 -``` - -**EN:** -```text -I still need these details before I continue: -1. [field 1] (optional, blank=default) - A. ... B. ... C. Use default (...) -2. [field 2] - A. ... B. ... C. Other, specify the code - -Notes: -- ConvoAI should usually follow the official sample path, use `agent-server-sdk` on the server side, and use `agora-agent-client-toolkit` on the client side when possible instead of building from the REST spec from scratch -- If the client toolkit is not a fit for the target stack, the client should still join with the RTC SDK directly -- Native platforms (iOS / Android / Flutter / Windows / macOS) use the multi-platform sample repo, call the ConvoAI REST API directly from the client, and do not need `agent-server-sdk`, `agora-agent-client-toolkit`, or a separate server -- If you omit an optional question, I will apply its default automatically -- Reply in one line, for example: `2B 4A`; omitted optional numbers will use defaults -- If your target is not Web, but iOS / Android / Electron, include that choice by number as well -``` - -### Q2 — LLM - -Include this question only if the LLM provider has not already been confirmed. - -**ZH:** -> "LLM(可选,留空=默认 DeepSeek)" -> 选项(内联展示): -> A. 阿里云(aliyun) B. 字节跳动(bytedance) C. 深度求索(deepseek) D. 腾讯(tencent) E. 用默认的就行(deepseek) - -**EN:** -> "LLM (optional, blank=default DeepSeek)" -> Options (inline): -> A. Alibaba Cloud (aliyun) B. ByteDance (bytedance) C. DeepSeek (deepseek) D. Tencent (tencent) E. Use the default (deepseek) - -**Default:** deepseek - -### Q3 — TTS - -Include this question only if the TTS provider has not already been confirmed. - -**ZH:** -> "TTS(可选,留空=默认 bytedance)" -> 选项(内联展示): -> A. 字节跳动 / 火山引擎(bytedance) B. 微软(microsoft) C. MiniMax(minimax) D. 阿里 CosyVoice(cosyvoice) E. 腾讯(tencent) F. 阶跃星辰(stepfun) G. 用默认的就行(bytedance) - -**EN:** -> "TTS (optional, blank=default bytedance)" -> Options (inline): -> A. ByteDance / Volcengine (bytedance) B. Microsoft (microsoft) C. MiniMax (minimax) D. Alibaba CosyVoice (cosyvoice) E. Tencent (tencent) F. StepFun (stepfun) G. Use the default (bytedance) - -**Default:** bytedance (Volcengine TTS) - -### Q4 — ASR Vendor - -Include this question only if the ASR provider has not already been confirmed. - -**ZH:** -> "ASR(可选,留空=默认 fengming)" -> 选项(内联展示): -> A. 声网凤鸣(fengming) B. 腾讯(tencent) C. 微软(microsoft) D. 科大讯飞(xfyun) E. 科大讯飞大模型(xfyun_bigmodel) F. 科大讯飞方言(xfyun_dialect) G. 用默认的就行(fengming) - -**EN:** -> "ASR (optional, blank=default fengming)" -> Options (inline): -> A. Shengwang Fengming (fengming) B. Tencent (tencent) C. Microsoft (microsoft) D. iFlytek (xfyun) E. iFlytek BigModel (xfyun_bigmodel) F. iFlytek Dialect (xfyun_dialect) G. Use the default (fengming) - -**Default:** fengming - -### Q5 — ASR Language - -Include this question only if the ASR language has not already been confirmed. - -Choose the recommended default from the use case: -- English use case -> `en-US` -- Chinese or unspecified use case -> `zh-CN` - -If the question is shown and the user omits it, apply the recommended default automatically. - -**ZH:** -> "ASR 语言(可选,留空=默认 [zh-CN / en-US])" -> 选项(内联展示): -> A. 中文(zh-CN,支持中英混合) B. 英文(en-US) C. 其他,直接写代码 D. 用默认的就行 - -**EN:** -> "ASR language (optional, blank=default [zh-CN / en-US])" -> Options (inline): -> A. Chinese (zh-CN, supports Chinese-English mix) B. English (en-US) C. Other, specify the code D. Use the default - -**Default:** `en-US` for clearly English scenarios, otherwise `zh-CN` - -Prompt rendering rule: -- In the actual user-facing prompt, render each visible question as two lines only: - - line 1: question number + field name - - line 2: all options inline, separated by two spaces -- Example: - - `2. LLM(可选,留空=默认)` - - ` A. aliyun B. bytedance C. deepseek D. tencent E. 用默认(deepseek)` -- Keep the detailed reference blocks below in vertical form; only the emitted prompt should be compact - -### Platform Question - -Include this question whenever platform is still missing. - -**ZH:** -> "目标平台是什么?(可选,留空=默认 Web)" -> 选项(内联展示): -> A. Web B. iOS C. Android D. Electron E. 其他,直接写平台 F. 用默认的就行(Web) - -**EN:** -> "What is the target platform? (optional, blank=default Web)" -> Options (inline): -> A. Web B. iOS C. Android D. Electron E. Other, specify the platform F. Use the default (Web) - -**Default:** Web - -### Backend Question - -Include this question whenever backend language is still missing. -Skip this question entirely if the user's confirmed platform is a native platform (iOS, Android, Flutter, Windows, macOS) — native ConvoAI apps are self-contained and call the REST API directly, no separate server needed. Record backend as "不涉及" / "not needed" in the spec. - -**ZH:** -> "服务端准备用什么语言?(可选,留空=默认 Python)" -> 选项(内联展示): -> A. Python B. Go C. Java D. Node.js E. 其他,直接写语言 F. 用默认的就行(Python) - -**EN:** -> "What backend language are you using? (optional, blank=default Python)" -> Options (inline): -> A. Python B. Go C. Java D. Node.js E. Other, specify the language F. Use the default (Python) - -**Default:** Python - ---- - -## Output: Structured Spec - -After the user replies, normalize the answers immediately into this spec. Do not -ask for a separate confirmation turn if every blocking field is resolved. - -**ZH:** -``` -ConvoAI 需求规格 -───────────────────────────── -场景: [use case] -主要产品: [ConvoAI] -配套产品: [RTC SDK / RTC SDK + RTM / RTC SDK + Cloud Recording / 无] -平台: [Web (default applied) / iOS / Android / Electron / other platform] -实现方式: [sample-aligned / minimal-custom / 未指定] -服务端语言: [Python (default applied) / Go / Java / Node.js / other backend / 不涉及] -ASR: [fengming (default applied) / tencent / microsoft / xfyun / xfyun_bigmodel / xfyun_dialect] -ASR 语言: [zh-CN (default applied) / en-US (default applied) / ja-JP / ko-KR / ...] -LLM: [aliyun / bytedance / deepseek (default applied) / tencent] -TTS: [bytedance (default applied) / minimax / tencent / microsoft / cosyvoice / stepfun] -───────────────────────────── -``` - -**EN:** -``` -ConvoAI Spec -───────────────────────────── -Use case: [use case] -Primary: [ConvoAI] -Supporting: [RTC SDK / RTC SDK + RTM / RTC SDK + Cloud Recording / none] -Platform: [Web (default applied) / iOS / Android / Electron / other platform] -Implementation: [sample-aligned / minimal-custom / unspecified] -Backend: [Python (default applied) / Go / Java / Node.js / other backend / not needed] -ASR: [fengming (default applied) / tencent / microsoft / xfyun / xfyun_bigmodel / xfyun_dialect] -ASR Language: [zh-CN (default applied) / en-US (default applied) / ja-JP / ko-KR / ...] -LLM: [aliyun / bytedance / deepseek (default applied) / tencent] -TTS: [bytedance (default applied) / minimax / tencent / microsoft / cosyvoice / stepfun] -───────────────────────────── -``` - -## Defaults - -| Field | Default | Notes (ZH) | Notes (EN) | -|-------|---------|------------|------------| -| Supporting product | `RTC SDK` | ConvoAI 默认需要 RTC SDK 作为客户端配套,除非用户已明确是纯服务端讨论 | ConvoAI normally needs RTC SDK as the client-side companion unless the user is discussing a server-only topic | -| Platform | `Web` | 推荐默认值;如果用户省略该可选题,则按 `default applied` 记录 | Recommended default; if the user skips this optional question, record it as `default applied` | -| Backend | `Python` | 推荐默认值;如果用户省略该可选题,则按 `default applied` 记录 | Recommended default; if the user skips this optional question, record it as `default applied` | -| ASR vendor | `fengming` | 推荐默认值;如果用户省略该可选题,则按 `default applied` 记录 | Recommended default; if the user skips this optional question, record it as `default applied` | -| ASR language | `zh-CN` / `en-US` | 推荐默认值;英文场景优先 `en-US`,其他场景优先 `zh-CN`;省略时按默认记录 | Recommended default; prefer `en-US` for clearly English use cases, otherwise `zh-CN`; apply it when omitted | -| LLM vendor | `deepseek` | 推荐默认值;如果用户省略该可选题,则按 `default applied` 记录 | Recommended default; if the user skips this optional question, record it as `default applied` | -| TTS vendor | `bytedance` | 推荐默认值;如果用户省略该可选题,则按 `default applied` 记录 | Recommended default; if the user skips this optional question, record it as `default applied` | - -> ASR/TTS/LLM valid values come from the /join API docs — see [convoai-restapi/start-agent.md](../references/conversational-ai/convoai-restapi/start-agent.md) for the /join schema and vendor params. Do not invent values. - -## Route After Collection - -Pass the structured spec to [conversational-ai](../references/conversational-ai/README.md). -The product module will inspect the matching sample repo first, prefer `agent-server-sdk` on the server and `agora-agent-client-toolkit` on the client when possible, then fetch only the missing docs and generate code. - -Key routing hints: -- If a matching sample repo exists → inspect `sample-repos.md` first and keep `sample-aligned` as the default implementation mode -- If the sample repo or target stack supports `agent-server-sdk` and `agora-agent-client-toolkit` → keep those as the default server/client libraries -- For native platforms (iOS, Android, Flutter, Windows, macOS) → route to the multi-platform native client sample repo, no server needed, client calls ConvoAI REST API directly -- If the sample repo does not answer a required API or vendor detail → fetch the missing REST docs for the confirmed backend language -- If the user explicitly asks for raw REST or the capability is unsupported by the sample/SDK path → use the REST quick start and endpoint docs directly -- If fetch fails → use Generation Rules + fallback URL diff --git a/skills/voice-ai-integration/references/conversational-ai/README.md b/skills/voice-ai-integration/references/conversational-ai/README.md index c78b660..7375295 100644 --- a/skills/voice-ai-integration/references/conversational-ai/README.md +++ b/skills/voice-ai-integration/references/conversational-ai/README.md @@ -13,148 +13,63 @@ User Device ◄── audio ── RTC Channel ◄── ConvoAI Agent - Client should prefer `agora-agent-client-toolkit` when it fits the target stack; otherwise use the RTC SDK directly to join the channel - `POST /join` makes the agent join the same RTC channel -## Default Integration Path +## Start Here + +Always start ConvoAI work by classifying the request mode with +[request-modes.md](request-modes.md). + +- `quickstart` and `integration` without a proven working baseline → start with + [quickstart-intake.md](quickstart-intake.md), which now contains the full quickstart flow: + product intro → technical path → credential checkpoint → provider choices +- `advanced-feature`, `debugging`, and `ops-hardening` → use + [advanced-feature-routing.md](advanced-feature-routing.md) + +## Flow Map + +```text +request-modes.md + ├─ quickstart / integration → quickstart-intake.md + │ ├─ product intro + │ ├─ technical path + │ ├─ project readiness + │ ├─ provider confirmation + │ └─ sample-repos.md → code generation + └─ advanced / debugging / ops → advanced-feature-routing.md + ├─ common-errors.md + └─ convoai-restapi/index.mdx or endpoint docs +``` + +## Architecture Defaults Use this order unless the user explicitly asks for something else: -1. Follow the matching official ConvoAI sample repo and preserve its structure -2. On the server side, prefer `agent-server-sdk` -3. On the client side, prefer `agora-agent-client-toolkit` when the target stack supports it; otherwise fall back to the RTC SDK directly -4. Use fetched Shengwang docs to fill in missing product details -5. Use raw REST directly only for unsupported operations, debugging, or explicit REST-first requests +1. If a matching official ConvoAI sample repo exists, offer the sample-aligned path first and inspect that repo after the user accepts the default technical path or explicitly asks for sample-aligned implementation +2. Preserve the sample repo structure and keep `sample-aligned` as the default path unless the user explicitly asks for `minimal-custom` +3. On the server side, prefer `agent-server-sdk` +4. On the client side, prefer `agora-agent-client-toolkit` when the target stack supports it; otherwise fall back to the RTC SDK directly +5. Use fetched Shengwang docs to fill in missing product details after the sample path has been inspected +6. Use raw REST directly only for unsupported operations, debugging, or explicit REST-first requests Do not treat the REST quick start or endpoint index as the default architecture for a new ConvoAI integration when a matching sample or official SDK path already exists. -## Auth +## Auth Snapshot + +- ConvoAI quickstart assumes a Shengwang project with `App ID`, `App Certificate`, and ConvoAI service activation already in place +- Quickstart uses RTC Token as the fixed auth path +- The `token` field in `/join` is for the RTC channel, not for REST auth +- Detailed credential rules → [../general/credentials-and-auth.md](../general/credentials-and-auth.md) +- Token generation → [../token-server/README.md](../token-server/README.md) + +## Entry Navigation -ConvoAI REST API 支持两种鉴权方式,推荐使用 RTC Token: - -1. **RTC Token(推荐)**:使用声网对话式 AI 引擎项目的 RTC Token - - 传参示例:`Authorization: agora token="007abcxxxxxxx123"` - - 测试环境:从[声网控制台](https://console.shengwang.cn/)生成临时 Token(有效期 24 小时) - - 生产环境:部署 [token-server](../token-server/README.md) 生成 Token - - 优势:只需要 `APP_ID` + `APP_CERTIFICATE`,与客户端 token 共用同一套凭证,无需额外配置 Customer Key/Secret - -2. **Basic Auth(备选)**:使用 `SHENGWANG_CUSTOMER_KEY` + `SHENGWANG_CUSTOMER_SECRET` 生成 Base64 编码 - - 传参示例:`Authorization: Basic NDI1OTQ3N2I4MzYy...YwZjA=` - - 参考[实现 HTTP 安全认证](https://doc.shengwang.cn/doc/convoai/restful/user-guides/http-basic-auth) - - 仅在无法使用 RTC Token 时使用(例如纯服务端场景、无 App Certificate 等) - -鉴权方式选择规则: -- 默认使用 RTC Token 方式,减少环境变量数量,与客户端 token 生成逻辑统一 -- 仅在用户明确要求 Basic Auth、或项目不具备 App Certificate 时才使用 Basic Auth -- 使用 `agent-server-sdk` 时,SDK 内部自动处理 token 鉴权,无需手动构造 Authorization header - -其他注意事项: -- ConvoAI requires separate activation in [Shengwang Console](https://console.shengwang.cn/) — 403 without it -- The `token` field in `/join` body is for the RTC channel, NOT for REST auth: - - App Certificate not enabled → `""` - - App Certificate enabled → generate via [token-server](../token-server/README.md) -- Do not ask about App Certificate during ConvoAI intake by default; confirm token handling later only if implementation is blocked or the user explicitly asks -- Credentials → [general/credentials-and-auth.md](../general/credentials-and-auth.md) - -## Sample Repos - -For reference projects and starter layouts, use [sample-repos.md](sample-repos.md). - -When a matching ConvoAI sample repo exists for the requested stack, it is the default implementation reference. - -Required workflow: -- Pick the relevant entry from `sample-repos.md` -- Clone the repo on demand with `git clone --depth 1 ` -- Inspect the current stack, folder map, key files, env template files, and API surface -- Inspect the sample repo's actual env template files before coding, such as `.env.example`, `.env.local.example`, and similar sample-provided files -- Prefer `agent-server-sdk` on the server side and `agora-agent-client-toolkit` on the client side when the sample repo or target stack supports them, instead of building a direct REST integration from scratch -- Keep the implementation aligned with the sample repo's architecture, env var names discovered from those template files, dependency choices, and API shape -- Use Shengwang doc fetching only for missing API or product details that the sample repo does not cover -- Keep raw REST calls narrowly scoped to capabilities that are not covered by the chosen SDK or sample architecture - -Multi-platform repo handling: -- If the matched sample repo is a multi-platform monorepo (indicated by "Multi-platform" in Default Stack): - 1. Clone the repo to a temporary path (e.g. `/tmp/convoai-native-full`) - 2. Read the repo's `AGENTS.md` to discover the directory layout and per-platform entrypoints - 3. Based on the user's confirmed platform, locate the corresponding subdirectory - 4. Copy only that subdirectory to a clean temporary path (e.g. `/tmp/convoai-native-ios`), then delete the full clone - 5. Inspect the extracted subdirectory's complete demo code (architecture, env templates, API calls, dependencies) — same as the Web workflow - 6. If the user's workspace already contains a project with build/project files for the target platform, write business code directly into the user's existing project. Otherwise, generate the project scaffolding as described in the repo's `AGENTS.md` - 7. Referencing the complete demo code, write business code in the user's project — keep project/build files untouched, only write/modify the business code files indicated by the repo's `AGENTS.md` - 8. Apply the user's confirmed provider choices with a minimal diff, same as the Web workflow -- The repo's `AGENTS.md` is the source of truth for which directory maps to which platform and which files are business code vs project files -- If the repo's `AGENTS.md` does not list the requested platform, fall back to Shengwang doc fetching and `minimal-custom` mode - -Implementation modes: -- `sample-aligned` is the default mode whenever a matching sample repo exists -- `minimal-custom` may only be used if the user explicitly asks for a minimal demo or says not to follow the sample repo - -Alignment rules: -- Preserve the sample repo's env var names from the inspected env template files unless the user explicitly asks to rename or normalize them -- Preserve the sample repo's folder structure and backend/frontend boundaries unless the user explicitly asks for a redesign -- Preserve the sample repo's dependency choices and API shape by default; only swap what is necessary for the user's confirmed provider choices -- Prefer `agent-server-sdk` for server integration and `agora-agent-client-toolkit` for client integration when they cover the required behavior -- Use direct REST only for unsupported capability gaps, debugging, or when the user explicitly asks for raw REST -- Do not invent env names from memory or from this skill's static docs when the sample repo provides template files - -Diff budget rule: -- Make only the minimum necessary changes for the user's confirmed provider choices -- Optional modules may be removed if they are not needed -- Do not redesign env naming, folder structure, and API shape all at once unless the user explicitly asks for a custom implementation - -Before editing code, state: -- which sample repo is being followed -- whether `agent-server-sdk` and `agora-agent-client-toolkit` are being followed, or why a different SDK path or direct REST is required -- which env template files were inspected -- what exact differences will be introduced - -REST docs are still the low-level reference for request/response schemas and unsupported operations, but they are not the default starting point when the sample repo or official libraries already cover the needed flow. - -When the user asks how to integrate ConvoAI in general, recommend the sample path plus `agent-server-sdk` on the server and `agora-agent-client-toolkit` on the client when possible. Only propose a from-scratch REST build when the user explicitly asks for it or when the required capability is not covered by the sample and official libraries. - -Keep repo URLs in `sample-repos.md` only so future URL changes stay centralized. - -## Quick Start Docs - -Fetch docs using the doc fetching script (see [doc-fetching.md](../doc-fetching.md)) only after checking the sample repo and official SDK path first: - -| Language | Command | -|----------|---------| -| Python / JS / curl | `bash skills/voice-ai-integration/scripts/fetch-doc-content.sh "docs://default/convoai/restful/get-started/quick-start"` | -| Go | `bash skills/voice-ai-integration/scripts/fetch-doc-content.sh "docs://default/convoai/restful/get-started/quick-start-go"` | -| Java | `bash skills/voice-ai-integration/scripts/fetch-doc-content.sh "docs://default/convoai/restful/get-started/quick-start-java"` | - -API endpoint index → [convoai-restapi.md](convoai-restapi.md) - -## Generation Rules - -Stable constraints that do NOT change with API updates. Always apply when generating code. - -### Field Types (common pitfalls) -- `agent_rtc_uid`: STRING `"0"`, not int `0` -- `remote_rtc_uids`: array `["*"]`, not `"*"` -- `name`: unique per project — use `agent_{uuid[:8]}` -- `agent_rtc_uid` must not collide with any human participant's UID - -### Create Agent (`POST /join`) -- `token`: `""` if no App Certificate; otherwise RTC token -- `agent_rtc_uid`: `"0"` for auto-assign -- `remote_rtc_uids`: `["*"]` unless user specifies UIDs - -### Update Agent (`POST /update`) -- `llm.params` is FULLY REPLACED — always send complete object -- Only `token` and `llm` are updatable; everything else is immutable - -### Terminology -- `agentId` in URL paths = `agent_id` in JSON bodies -- `/join` returns `agent_id` (snake_case); use it as path param - -### Error Handling -- 409: extract existing `agent_id` or generate new name, retry -- 503/504: exponential backoff, max 3 retries -- Always parse `detail` + `reason` from error responses -- Full diagnosis → [common-errors.md](common-errors.md) - -## Demo Projects - -See [sample-repos.md](sample-repos.md) for the maintained ConvoAI sample registry. +- Request mode routing → [request-modes.md](request-modes.md) +- ConvoAI question-driven quickstart flow → [quickstart-intake.md](quickstart-intake.md) +- Existing-project features / debugging / ops → [advanced-feature-routing.md](advanced-feature-routing.md) +- Sample repos and sample workflow → [sample-repos.md](sample-repos.md) +- Stable generation constraints → [generation-rules.md](generation-rules.md) +- REST endpoint index → [convoai-restapi/index.mdx](convoai-restapi/index.mdx) +- Common diagnosis → [common-errors.md](common-errors.md) +- Doc fetching guide → [../doc-fetching.md](../doc-fetching.md) ## Docs Fallback diff --git a/skills/voice-ai-integration/references/conversational-ai/advanced-feature-routing.md b/skills/voice-ai-integration/references/conversational-ai/advanced-feature-routing.md new file mode 100644 index 0000000..8c9c93c --- /dev/null +++ b/skills/voice-ai-integration/references/conversational-ai/advanced-feature-routing.md @@ -0,0 +1,123 @@ +# ConvoAI Advanced Feature Routing + +Use this file for ConvoAI requests that already have a working baseline or that are clearly about +debugging or production work rather than first-run onboarding. + +This file exists so users with working code do **not** get forced back through the full +quickstart-style intake. + +## Use This File For + +- `advanced-feature` +- `debugging` +- `ops-hardening` + +Use [request-modes.md](request-modes.md) first if the mode is still unclear. + +## Goal + +Keep the questions tightly scoped to the requested capability or problem. + +Do not re-run the full quickstart flow unless the user turns out to be blocked on a foundational +prerequisite that was never actually satisfied. + +## Step 1: Confirm Baseline + +Confirm the smallest useful baseline in one sentence: +- What is already working? +- Which platform or repo is this in? +- What exactly needs to change? + +Examples: +- "Current status: Web ConvoAI sample is already running; next task is adding MCP tools." +- "Current status: Existing project can create and join agents; next task is debugging TTS vendor auth." + +## Step 2: Route by Request Type + +### A. Advanced feature implementation + +### Feature Map + +| Feature | Primary local doc | Fetch if needed | +|---------|-------------------|-----------------| +| MCP / tools | `convoai-restapi/start-agent.md` (`mcp_servers`) | ConvoAI MCP user guide | +| history | `convoai-restapi/get-history.md` | - | +| interrupt | `convoai-restapi/agent-interrupt.md` | - | +| speak | `convoai-restapi/agent-speak.md` | - | +| update LLM | `convoai-restapi/agent-update.md` | - | +| template variables | `convoai-restapi/start-agent.md` (`template_variables`) | - | +| status query | `convoai-restapi/query-agent-status.md` | - | +| stop agent | `convoai-restapi/stop-agent.md` | - | + +Examples: +- MCP / tools +- history / interrupt / update / status APIs +- template variables / prompt work +- recording linkage +- multi-agent or orchestration behavior + +Routing: +- Start from [README.md](README.md) for current architecture rules +- Use [sample-repos.md](sample-repos.md) if the feature must stay aligned with the official sample structure +- Use the relevant REST endpoint docs only for the exact unsupported or low-level operation + +### B. Debugging + +Examples: +- `403` +- vendor auth or param failures +- Agent `FAILED` +- token or channel mismatch +- join / update / leave behavior issues + +Routing: +- Start from [common-errors.md](common-errors.md) +- Fetch endpoint-specific docs only for the failing operation when needed +- If the issue is provider-specific, run a **partial** provider check using the supported-provider and baseline rules in [quickstart-intake.md](quickstart-intake.md) + +### C. Ops / hardening + +Examples: +- auth strategy +- token generation path +- retries / backoff +- quota and concurrency handling +- monitoring and operational policy + +Routing: +- Shared credentials / auth → [../general/credentials-and-auth.md](../general/credentials-and-auth.md) +- Token generation → [../token-server/README.md](../token-server/README.md) +- ConvoAI architecture constraints → [README.md](README.md) + +## Step 3: Ask Only Targeted Questions + +Ask only the minimum questions that unblock the specific request. + +Examples: +- For MCP / tools: current server stack, current sample repo, desired tools, and whether the baseline is already running +- For provider switch: which stage is changing, current working provider, target provider, and whether required secrets already exist +- For 403: which auth path is being used, whether ConvoAI is enabled, and whether App ID matches the credentials project + +Do not ask the user to reconfirm unrelated ASR / LLM / TTS stages when only one stage is changing. + +## Partial Preflight Rule + +Run only the narrowest possible validation for the touched scope: +- auth issue → auth only +- TTS issue → TTS only +- LLM change → LLM only +- sample alignment question → sample repo only + +Escalate to the full quickstart preflight only if the conversation reveals there is no real working +baseline after all. + +## Exit Rule + +After the targeted questions, produce a short recap: + +```text +ConvoAI mode: [advanced-feature / debugging / ops-hardening] +Working baseline: [one sentence] +Focus area: [feature or issue] +Next reference: [which local file or doc path will be used] +``` diff --git a/skills/voice-ai-integration/references/conversational-ai/common-errors.md b/skills/voice-ai-integration/references/conversational-ai/common-errors.md index 5c30429..ad78080 100644 --- a/skills/voice-ai-integration/references/conversational-ai/common-errors.md +++ b/skills/voice-ai-integration/references/conversational-ai/common-errors.md @@ -114,7 +114,7 @@ Agent is RUNNING but not responding to user speech: ## Additional Error Lookup For errors not covered here, fetch the relevant endpoint doc URL from -[convoai-restapi.md](convoai-restapi.md) for response schemas, or search +[convoai-restapi/index.mdx](convoai-restapi/index.mdx) for response schemas, or search `references/docs.txt` for broader doc lookup: ``` diff --git a/skills/voice-ai-integration/references/conversational-ai/convoai-restapi.md b/skills/voice-ai-integration/references/conversational-ai/convoai-restapi.md deleted file mode 100644 index 89dfeb8..0000000 --- a/skills/voice-ai-integration/references/conversational-ai/convoai-restapi.md +++ /dev/null @@ -1,49 +0,0 @@ -# ConvoAI REST API Reference - -Endpoint index with local documentation. For the full API overview, see [convoai-restapi/index.mdx](convoai-restapi/index.mdx). - -## Base URL - -``` -https://api.agora.io/cn/api/conversational-ai-agent/v2/projects/{SHENGWANG_APP_ID} -``` - -## Authentication - -支持两种鉴权方式(任选其一): -- **RTC Token**:`Authorization: agora token="{RTC_TOKEN}"` -- **Basic Auth**:`Authorization: Basic base64("{SHENGWANG_CUSTOMER_KEY}:{SHENGWANG_CUSTOMER_SECRET}")` - -详见 [README.md](README.md#auth) 和 [general/credentials-and-auth.md](../general/credentials-and-auth.md)。 - -## Endpoints - -| Method | Path | Local Doc | -|--------|------|-----------| -| POST | `/join` | [start-agent.md](convoai-restapi/start-agent.md) | -| POST | `/agents/{agentId}/leave` | [stop-agent.md](convoai-restapi/stop-agent.md) | -| POST | `/agents/{agentId}/update` | [agent-update.md](convoai-restapi/agent-update.md) | -| GET | `/agents/{agentId}` | [query-agent-status.md](convoai-restapi/query-agent-status.md) | -| GET | `/agents` | [get-agent-list.md](convoai-restapi/get-agent-list.md) | -| POST | `/agents/{agentId}/speak` | [agent-speak.md](convoai-restapi/agent-speak.md) | -| POST | `/agents/{agentId}/interrupt` | [agent-interrupt.md](convoai-restapi/agent-interrupt.md) | -| GET | `/agents/{agentId}/history` | [get-history.md](convoai-restapi/get-history.md) | - -All endpoints index: [convoai-restapi/index.mdx](convoai-restapi/index.mdx) - -## Error Response Format - -All non-200 responses: -```json -{ - "detail": "error description", - "reason": "ErrorCode" -} -``` - -Error diagnosis → [common-errors.md](common-errors.md) - -## Docs Fallback - -If fetch fails, use README.md Generation Rules + ask the user to verify against: -https://doc.shengwang.cn/doc/convoai/restful/get-started/quick-start diff --git a/skills/voice-ai-integration/references/conversational-ai/generation-rules.md b/skills/voice-ai-integration/references/conversational-ai/generation-rules.md new file mode 100644 index 0000000..8af0444 --- /dev/null +++ b/skills/voice-ai-integration/references/conversational-ai/generation-rules.md @@ -0,0 +1,39 @@ +# ConvoAI Generation Rules + +Stable constraints that do NOT change with API updates. Always apply when generating code. + +## Field Types + +- `agent_rtc_uid`: STRING `"0"`, not int `0` +- `remote_rtc_uids`: array `["*"]`, not `"*"` +- `name`: unique per project — use `agent_{uuid[:8]}` +- `agent_rtc_uid` must not collide with any human participant's UID + +## Create Agent (`POST /join`) + +- `token`: `""` if no App Certificate; otherwise RTC token +- `agent_rtc_uid`: `"0"` for auto-assign +- `remote_rtc_uids`: `["*"]` unless the user specifies UIDs + +## Update Agent (`POST /update`) + +- `llm.params` is FULLY REPLACED — always send the complete object +- Only `token` and `llm` are updatable; everything else is immutable + +## Terminology + +- `agentId` in URL paths = `agent_id` in JSON bodies +- `/join` returns `agent_id` (snake_case); use it as the path parameter + +## Error Handling + +- `409`: extract the existing `agent_id` or generate a new name and retry +- `503/504`: exponential backoff, max 3 retries +- Always parse `detail` and `reason` from error responses +- Full diagnosis → [common-errors.md](common-errors.md) + +## Related References + +- ConvoAI module entry → [README.md](README.md) +- REST endpoint index → [convoai-restapi/index.mdx](convoai-restapi/index.mdx) +- Common diagnosis → [common-errors.md](common-errors.md) diff --git a/skills/voice-ai-integration/references/conversational-ai/quickstart-intake.md b/skills/voice-ai-integration/references/conversational-ai/quickstart-intake.md new file mode 100644 index 0000000..9c66dc3 --- /dev/null +++ b/skills/voice-ai-integration/references/conversational-ai/quickstart-intake.md @@ -0,0 +1,500 @@ +# ConvoAI Quickstart Flow + +Use this file as the ConvoAI-internal quickstart intake after ConvoAI has already been selected as the primary product. +This is the first user-visible onboarding step for ConvoAI quickstarts and unproven integrations. +This file keeps the question-driven intake flow inside the ConvoAI module to avoid cross-directory back-and-forth. + +## Scope + +Use this file only for: +- `quickstart` +- `integration` when the user does **not** yet have a proven working ConvoAI baseline + +Do **not** use this file as the default intake for every ConvoAI request. + +Before using this file, classify the request mode with +[request-modes.md](request-modes.md). + +If the request is `advanced-feature`, `debugging`, or `ops-hardening`, route to +[advanced-feature-routing.md](advanced-feature-routing.md) +instead of running the full quickstart-style intake. + +For `quickstart`, or `integration` without a proven working baseline, use the project-readiness and provider-guardrail sections in this file before any provider choices are confirmed. + +## Language Detection + +Detect the user's language from their most recent message: +- If the user writes in **Chinese** → use the **ZH** prompts below +- If the user writes in **English** (or any other language) → use the **EN** prompts below + +Maintain the detected language consistently throughout the entire intake flow. + +## User-Visible Sequence + +For ConvoAI quickstarts and unproven integrations, follow this exact user-visible order: +1. Product intro in plain language +2. Technical-path confirmation +3. Project-readiness checkpoint (`App ID`, `App Certificate`, ConvoAI activation, RTC Token path) +4. Provider confirmation +5. Detailed provider checklist only if customization is still needed + +Do not skip ahead. Do not surface a later stage before the earlier stage is resolved. + +## Project Readiness Rules + +For the ConvoAI quickstart path used by this skill: +- `App ID`, `App Certificate`, and ConvoAI activation are fixed prerequisites +- RTC Token is the fixed auth path +- If any of those items are unclear, resolve them with the compact credential prompt before provider confirmation + +## Questions + +Use a friendly but explicit follow-up flow: +- Ask for a single decision group per turn. Do not combine technical-path, credential, and provider decisions in the same visible message. +- Use a consolidated provider checklist only when detailed provider selection is actually needed, and only after the technical path and credential status are already clear. +- Keep the message short enough to scan, but complete enough to finish the current decision in one reply +- Skip anything the user already answered +- Default to `sample-aligned` when the request clearly matches the official sample path +- Treat the default technical path and the default provider baseline as two separate decisions +- Do not propose a bespoke frontend/backend project structure before preflight and intake are complete +- Do not ask React / Vue / HTML or other custom stack questions when the user only asked for a generic Web quickstart and has not opted out of the official sample path +- If the user explicitly names a native platform such as Android / iOS / Flutter / Windows / macOS, infer that platform immediately instead of re-asking it +- For native-platform quickstart requests, skip backend, token-server, and server-stack questions unless the user explicitly says they want a custom architecture or an existing-project integration +- Do not ask about development experience or whether a ConvoAI Agent has already been created during quickstart intake; those are not routing blockers +- Do not announce concrete framework names such as Next.js / FastAPI until the user has accepted the default technical path or explicitly asked for sample details +- Before asking about providers, give a one- or two-sentence plain-language intro to what ConvoAI does in the chosen app path +- Do not lead with terms like `三段式`, `ASR`, `LLM`, `TTS`, or `provider` before that intro has been given +- Explain `App ID` and `App Certificate` in plain language before asking whether the user has them +- If the user explicitly names a provider, validate it against the supported provider list in this file immediately +- If the named provider is unsupported, resolve that blocker before asking any further credential or provider questions +- Show the available options and recommended default only for the fields that are still truly unresolved +- If the user leaves a blocker unresolved, ask only a narrow repair follow-up for that field + +Defaults policy: +- Platform recommended default: `Web` +- Backend recommended default: `Python` (skip this field entirely for native platforms: iOS, Android, Flutter, Windows, macOS) +- ASR vendor recommended default: `fengming` +- ASR language recommended default: `en-US` for clearly English scenarios, otherwise `zh-CN` +- LLM recommended default: `aliyun` +- TTS recommended default: `bytedance` + +## Supported Provider Guardrails + +- LLM: `aliyun`, `bytedance`, `deepseek`, `tencent` +- TTS: `bytedance`, `microsoft`, `minimax`, `cosyvoice`, `tencent`, `stepfun` +- ASR: `fengming`, `tencent`, `microsoft`, `xfyun`, `xfyun_bigmodel`, `xfyun_dialect` +- Current first-success baseline: `aliyun` + `bytedance` + `fengming` +- If a named provider is outside the supported list for its stage (for example `openai` or `azure openai` for LLM), treat it as unsupported and resolve that blocker before any further setup questions + +Blocking rule: +- Any selected `Other` value must be clarified in a narrow follow-up +- Platform and Backend are optional when shown with defaults +- **LLM, TTS, ASR vendor, and ASR language still require explicit confirmation**, but that confirmation can come from either: + - a single compact confirmation of the default provider baseline, or + - a detailed provider-by-provider selection + +Confirmation gate: +- If any mandatory provider fields (LLM, TTS, ASR vendor, ASR language) are unresolved, do NOT proceed silently. +- If the technical path is still unresolved and the official sample path is a good fit, ask only the compact technical-path confirmation prompt first. +- After the user answers that, if they have already named an unsupported provider, ask only the compact unsupported-provider prompt next. +- After the user answers that, if project readiness (`App ID`, `App Certificate`, ConvoAI activation) is still unclear, ask only the compact credential prompt next. +- After the user answers that, if provider fields are still unresolved, ask only the compact default-provider confirmation prompt, and only when the default baseline keys are available or explicitly confirmed by the user. +- If the user explicitly confirms the default provider baseline, treat that as explicit confirmation for all mandatory provider fields. +- Only expand into the full provider checklist if the user asks to customize, rejects the default provider baseline, has missing default-provider keys, or has already supplied partial non-default provider choices. +- For defaultable fields that are NOT mandatory confirmation fields (Platform, Backend), omission counts as explicit confirmation to use the default. +- For the compact default-provider prompt, the user must explicitly choose the default provider baseline or ask to customize; silence does not count. + +If the technical path is unresolved and the official sample path fits, ask the compact technical-path prompt first and stop there. +If the user has already named an unsupported provider after that, ask the compact unsupported-provider prompt next and stop there. +If project readiness (`App ID`, `App Certificate`, ConvoAI activation) is unresolved after that, ask the compact credential prompt next and stop there. +If provider fields are unresolved after that, ask the compact default-provider prompt next and stop there. +Ask the full unresolved-fields checklist only after the user asks to customize, or when some provider fields are already non-default and still unresolved. +Do not show two separate option blocks such as `A/B` and `C/D` in the same turn. +Skip any question the user already answered during main intake or in the user's initial request. +Doc index status is already determined by the main intake — do not re-check here. + +## Compact Technical-Path Prompt + +When the request clearly matches the official quickstart path and the user has not asked for a custom stack, +prefer a compact technical-path confirmation prompt first. + +Prompt rules: +- Keep it to one short intro line, one choice line, and at most three short notes +- Confirm only the technical path here; do not mix credential or provider choices into the same message +- If the user accepts the default technical path, only then move on to sample inspection and provider confirmation +- If the user asks to customize the stack, expand only the still-unresolved architecture questions +- For native-platform quickstart requests, use the native sample wording and do not mention backend or token-server by default + +Suggested shape: + +**ZH:** +```text +我建议先按官方 Web quickstart 路径继续,这样最快: +A. 用默认技术路径(官方 Web sample) +B. 我想自定义技术栈 + +说明: +- 如果你选 A,我会先检查官方 sample 路径,再确认 provider 配置 +- 如果你选 B,我再问前后端栈 +``` + +**EN:** +```text +I suggest starting with the official Web quickstart path to keep setup short: +A. Use the default technical path (official Web sample) +B. I want a custom stack + +Notes: +- If you choose A, I will inspect the official sample path and then confirm the provider setup +- If you choose B, I will ask the frontend/backend stack questions +``` + +Native-platform variant: + +**ZH:** +```text +我建议直接按官方 Android / Native sample 路径继续,这样最快: +A. 用默认原生路径(官方 native sample) +B. 我有现有工程 / 我想自定义集成 + +说明: +- 如果你选 A,我会先检查官方 sample 路径,再确认 provider 配置 +- 如果你选 B,我再问和现有工程相关的问题 +``` + +**EN:** +```text +I suggest going straight with the official Android / native sample path to keep setup short: +A. Use the default native path (official native sample) +B. I have an existing project / I want a custom integration + +Notes: +- If you choose A, I will inspect the official sample path and then confirm the provider setup +- If you choose B, I will ask the existing-project questions +``` + +## Compact Unsupported-Provider Prompt + +Use this prompt when the user has already named a provider that is outside the supported local enum list for the requested stage. + +Prompt rules: +- Keep it to one short intro line, one choice line, and at most three short notes +- Name the unsupported provider explicitly +- Tell the user that this quickstart path does not support it +- Do not pile on credential, sample, or other setup questions in the same turn + +Suggested shape: + +**ZH:** +```text +你刚才提到的 provider 当前不在这个 quickstart 的支持列表里: +- 当前不支持:OpenAI +- 我可以改成支持的 LLM,或者先把支持列表给你 + +A. 直接改用支持的默认 LLM +B. 先给我看支持的 LLM 列表 +``` + +**EN:** +```text +The provider you named is not in the supported list for this quickstart path: +- Currently unsupported here: OpenAI +- I can switch to a supported default LLM, or show you the supported LLM list first + +A. Switch to the supported default LLM +B. Show me the supported LLM list first +``` + +## Compact Credential Prompt + +Use this prompt when the user has not yet been told what `App ID`, `App Certificate`, and ConvoAI activation mean, or when that project-readiness status is still unclear. + +Prompt rules: +- Keep it to one short intro line, one choice line, and at most three short notes +- Explain `App ID` in plain language before asking about it +- Do not combine this credential prompt with a technical-path prompt or a provider prompt in the same turn +- Explain that this quickstart requires `App ID`, `App Certificate`, and ConvoAI service activation +- Mention that the quickstart auth path is fixed to RTC Token +- Do not ask provider questions until this project-readiness status is clear enough for preflight + +Suggested shape: + +**ZH:** +```text +继续前先确认三个前置条件: +- App ID:你的声网项目标识,接入一定会用到 +- App Certificate:这个 quickstart 会用 RTC Token,因此这里也需要 +- ConvoAI 开通:项目需要先开通 ConvoAI 服务 + +A. 这三个我都已经准备好了 +B. 我还没有 / 不清楚,先告诉我去哪看 +``` + +**EN:** +```text +Before we continue, I need to confirm three prerequisites: +- App ID: your Shengwang project identifier; the app will need this +- App Certificate: this quickstart uses RTC Token, so it is also required here +- ConvoAI activation: the project must have ConvoAI enabled + +A. I already have all three ready +B. I do not have them / I am not sure — tell me where to find them first +``` + +## Compact Default-Provider Prompt + +When the user has not requested custom providers and the default provider baseline is still acceptable, +prefer a compact provider confirmation prompt before expanding into the full checklist. + +Prompt rules: +- Keep it to one short intro line, one choice line, and at most three short notes +- Do not restate an implementation plan, project tree, or framework recommendation here +- Do not combine this provider prompt with a technical-path prompt or a credential prompt in the same turn +- Only use this prompt when the required default-provider keys are already available or the user explicitly confirms they have them +- If the required default-provider keys are missing or unknown, skip this prompt and expand the provider options / readiness check instead +- If the user confirms the default provider baseline, continue without expanding the full provider checklist +- Only if the user asks to customize should the agent render the detailed provider options + +Suggested shape: + +**ZH:** +```text +如果你已经有默认 provider 所需的 key,我可以先按默认三段式继续: +A. 我有默认 provider key,按默认三段式继续 +B. 我没有 / 不确定,我要自定义 provider + +说明: +- 如果你选 A,我就按默认三段式继续 +- 如果你选 B,我再展开 LLM / TTS / ASR 选项 +``` + +**EN:** +```text +If you already have the keys needed for the default providers, I can continue with the default provider baseline: +A. I have the default provider keys — continue with the default provider baseline +B. I do not have them / I am not sure — I want to customize providers + +Notes: +- If you choose A, I will continue with the default provider baseline +- If you choose B, I will expand the LLM / TTS / ASR options +``` + +## Detailed Provider Checklist + +Use this only after the user has already chosen a technical path and the credential status is clear. +Do not use this as the first onboarding prompt. +When the user has asked to customize providers, combine only the still-unresolved provider-specific questions into one message. + +Message requirements: +- Use the user's language consistently +- Start with at most one short recap line +- Ask only about unresolved provider fields that are still unresolved +- Under each unresolved field, show the supported options inline to reduce prompt height +- Number only the currently visible unresolved fields, starting from `1` +- Mark fields with defaults as optional +- Ask the user to reply once with numeric codes such as `1A 4B 6A` +- Do not mix this with a `key=value` quick-reply example in the same prompt + +If the user already provided enough detail for some fields, do not restate those +questions. Keep the option list only for the unresolved fields. + +Numbering rules: +- Renumber based only on the fields shown in the current prompt +- Do not use stable global IDs across turns +- If a field is already known, omit it and do not reserve its number +- Platform and backend should also be shown whenever they are unresolved, even though they are optional +- LLM, TTS, ASR, and ASR language should still be shown whenever they are unresolved, even though they are optional +- If a visible field has a default, its number may be omitted from the reply + +Parsing rules: +- Parse numeric answers against the current prompt's visible numbering +- Accept sparse one-line replies such as `1A 4B 6A` +- If a visible optional field is omitted, apply its default automatically +- If a visible mandatory field is omitted, ask only for that field +- If a selected option is `Other`, ask a narrow follow-up only for that field +- If a code is invalid or incomplete, ask only for the unresolved item + +Suggested shape: + +**ZH:** +```text +我还缺这几项信息,确认完我就可以继续: +1. [field 1](可选,留空=默认) + A. ... B. ... C. 用默认(...) +2. [field 2] + A. ... B. ... C. 其他,直接写代码 + +补充说明: +- ConvoAI 默认优先走官方 sample;服务端优先用 `agent-server-sdk` +- 客户端优先用 `agora-agent-client-toolkit`,如果目标栈不适配再直接用 RTC SDK 入会 +- Native 平台(iOS / Android / Flutter / Windows / macOS)走多平台 sample repo,客户端直接调 ConvoAI REST API,不需要 `agent-server-sdk` 和 `agora-agent-client-toolkit`,也不需要配套服务端 +- 可选题如果不写,就自动用默认值 +- 你回一行就行,例如:2B 4A;没写出来的可选题会自动用默认 +- 如果你的目标不是 Web,而是 iOS / Android / Electron,也一起按编号回复 +``` + +**EN:** +```text +I still need these details before I continue: +1. [field 1] (optional, blank=default) + A. ... B. ... C. Use default (...) +2. [field 2] + A. ... B. ... C. Other, specify the code + +Notes: +- ConvoAI should usually follow the official sample path, use `agent-server-sdk` on the server side, and use `agora-agent-client-toolkit` on the client side when possible instead of building from the REST spec from scratch +- If the client toolkit is not a fit for the target stack, the client should still join with the RTC SDK directly +- Native platforms (iOS / Android / Flutter / Windows / macOS) use the multi-platform sample repo, call the ConvoAI REST API directly from the client, and do not need `agent-server-sdk`, `agora-agent-client-toolkit`, or a separate server +- If you omit an optional question, I will apply its default automatically +- Reply in one line, for example: `2B 4A`; omitted optional numbers will use defaults +- If your target is not Web, but iOS / Android / Electron, include that choice by number as well +``` + +### Q2 — LLM + +Include this question only if the LLM provider has not already been confirmed. + +**ZH:** +> "LLM(可选,留空=默认 DeepSeek)" +> 选项(内联展示): +> A. 阿里云(aliyun) B. 字节跳动(bytedance) C. 深度求索(deepseek) D. 腾讯(tencent) E. 用默认的就行(aliyun) + +**EN:** +> "LLM (optional, blank=default DeepSeek)" +> Options (inline): +> A. Alibaba Cloud (aliyun) B. ByteDance (bytedance) C. DeepSeek (deepseek) D. Tencent (tencent) E. Use the default (aliyun) + +**Default:** aliyun + +### Q3 — TTS + +Include this question only if the TTS provider has not already been confirmed. + +**ZH:** +> "TTS(可选,留空=默认 bytedance)" +> 选项(内联展示): +> A. 字节跳动 / 火山引擎(bytedance) B. 微软(microsoft) C. MiniMax(minimax) D. 阿里 CosyVoice(cosyvoice) E. 腾讯(tencent) F. 阶跃星辰(stepfun) G. 用默认的就行(bytedance) + +**EN:** +> "TTS (optional, blank=default bytedance)" +> Options (inline): +> A. ByteDance / Volcengine (bytedance) B. Microsoft (microsoft) C. MiniMax (minimax) D. Alibaba CosyVoice (cosyvoice) E. Tencent (tencent) F. StepFun (stepfun) G. Use the default (bytedance) + +**Default:** bytedance (Volcengine TTS) + +### Q4 — ASR Vendor + +Include this question only if the ASR provider has not already been confirmed. + +**ZH:** +> "ASR(可选,留空=默认 fengming)" +> 选项(内联展示): +> A. 声网凤鸣(fengming) B. 腾讯(tencent) C. 微软(microsoft) D. 科大讯飞(xfyun) E. 科大讯飞大模型(xfyun_bigmodel) F. 科大讯飞方言(xfyun_dialect) G. 用默认的就行(fengming) + +**EN:** +> "ASR (optional, blank=default fengming)" +> Options (inline): +> A. Shengwang Fengming (fengming) B. Tencent (tencent) C. Microsoft (microsoft) D. iFlytek (xfyun) E. iFlytek BigModel (xfyun_bigmodel) F. iFlytek Dialect (xfyun_dialect) G. Use the default (fengming) + +**Default:** fengming + +### Q5 — ASR Language + +Include this question only if the ASR language has not already been confirmed. + +Choose the recommended default from the use case: +- English use case -> `en-US` +- Chinese or unspecified use case -> `zh-CN` + +If the question is shown and the user omits it, apply the recommended default automatically. + +**ZH:** +> "ASR 语言(可选,留空=默认 [zh-CN / en-US])" +> 选项(内联展示): +> A. 中文(zh-CN,支持中英混合) B. 英文(en-US) C. 其他,直接写代码 D. 用默认的就行 + +**EN:** +> "ASR language (optional, blank=default [zh-CN / en-US])" +> Options (inline): +> A. Chinese (zh-CN, supports Chinese-English mix) B. English (en-US) C. Other, specify the code D. Use the default + +**Default:** `en-US` for clearly English scenarios, otherwise `zh-CN` + +Prompt rendering rule: +- In the actual user-facing prompt, render each visible question as two lines only: + - line 1: question number + field name + - line 2: all options inline, separated by two spaces +- Example: + - `2. LLM(可选,留空=默认)` + - ` A. aliyun B. bytedance C. deepseek D. tencent E. 用默认(aliyun)` +- Keep the detailed reference blocks below in vertical form; only the emitted prompt should be compact + +### Platform Question + +Include this question whenever platform is still missing. + +**ZH:** +> "目标平台是什么?(可选,留空=默认 Web)" +> 选项(内联展示): +> A. Web B. iOS C. Android D. Electron E. 其他,直接写平台 F. 用默认的就行(Web) + +**EN:** +> "What is the target platform? (optional, blank=default Web)" +> Options (inline): +> A. Web B. iOS C. Android D. Electron E. Other, specify the platform F. Use the default (Web) + +**Default:** Web + +### Backend Question + +Include this question whenever backend language is still missing. +Skip this question entirely if the user's confirmed platform is a native platform (iOS, Android, Flutter, Windows, macOS) — native ConvoAI apps are self-contained and call the REST API directly, no separate server needed. Record backend as "不涉及" / "not needed" in the spec. + +**ZH:** +> "服务端准备用什么语言?(可选,留空=默认 Python)" +> 选项(内联展示): +> A. Python B. Go C. Java D. Node.js E. 其他,直接写语言 F. 用默认的就行(Python) + +**EN:** +> "What backend language are you using? (optional, blank=default Python)" +> Options (inline): +> A. Python B. Go C. Java D. Node.js E. Other, specify the language F. Use the default (Python) + +**Default:** Python + +--- + +## Output: Structured Spec + +After the user replies, normalize the answers immediately into a compact internal spec. +Do not ask for a separate confirmation turn if every blocking field is resolved. + +```yaml +use_case: [text] +primary: ConvoAI +supporting: [RTC SDK | RTC SDK + RTM | RTC SDK + Cloud Recording | none] +platform: [Web | iOS | Android | Electron | other] +implementation: [sample-aligned | minimal-custom | unspecified] +backend: [Python | Go | Java | Node.js | other | not needed] +project_readiness: + app_id: [ready | missing | unknown] + app_certificate: [ready | missing | unknown] + convoai_activation: [ready | missing | unknown] + rtc_token_path: [ready | missing | unknown] +providers: + asr: [fengming | tencent | microsoft | xfyun | xfyun_bigmodel | xfyun_dialect] + asr_language: [zh-CN | en-US | other] + llm: [aliyun | bytedance | deepseek | tencent] + tts: [bytedance | minimax | tencent | microsoft | cosyvoice | stepfun] +``` + +Apply the defaults declared earlier in this file when the user has explicitly accepted them. +Do not invent provider values beyond the supported lists in this file. + +## Route After Collection + +After the structured spec is ready: +- Follow the architecture rules in [README.md](README.md) +- Use [sample-repos.md](sample-repos.md) for sample inspection and clone workflow +- Use [generation-rules.md](generation-rules.md) for stable generation constraints +- Use `convoai-restapi/index.mdx` or endpoint docs only for missing low-level API details diff --git a/skills/voice-ai-integration/references/conversational-ai/request-modes.md b/skills/voice-ai-integration/references/conversational-ai/request-modes.md new file mode 100644 index 0000000..0fb91aa --- /dev/null +++ b/skills/voice-ai-integration/references/conversational-ai/request-modes.md @@ -0,0 +1,109 @@ +# ConvoAI Request Modes + +Use this file before choosing a ConvoAI workflow. + +ConvoAI is only one product inside the Shengwang skill set. Do **not** change the top-level +product routing for RTC, RTM, Cloud Recording, or token work. This file only decides which +ConvoAI sub-flow to use after ConvoAI has already been selected as the primary product. + +## Goal + +Separate requests that need a full quickstart-style onboarding flow from requests that already +have a working baseline and only need targeted implementation, debugging, or production work. + +The key distinction is **working baseline**: +- A working baseline means the user already has ConvoAI code that can run, or explicitly says + they already have a working project / demo / codebase. +- If the user only has RTC code, or only has a sample repo checked out but has **not** proven a + working ConvoAI path yet, treat that as **not** having a working baseline. + +## Modes + +| Mode | Use when | Default next step | +|------|----------|-------------------| +| `quickstart` | The user is starting from scratch, wants a minimal demo, wants the official sample path, or has not yet run ConvoAI successfully | [quickstart-intake.md](quickstart-intake.md) | +| `integration` | The user already has an app or repo, but the ConvoAI path is not yet fully connected or verified | [quickstart-intake.md](quickstart-intake.md), then targeted implementation if a working baseline is still missing | +| `advanced-feature` | The user explicitly says the existing ConvoAI code is already running and only wants incremental capability work | [advanced-feature-routing.md](advanced-feature-routing.md) | +| `debugging` | The user provides errors, logs, broken behavior, or asks why an existing flow does not work | [advanced-feature-routing.md](advanced-feature-routing.md) | +| `ops-hardening` | The user asks about production auth, scaling, retries, quota, observability, or cost | [advanced-feature-routing.md](advanced-feature-routing.md) | + +## Detection Rules + +### Route to `quickstart` + +Choose `quickstart` when the user says things like: +- "从零开始" +- "帮我跑一个最小 demo" +- "第一次接 ConvoAI" +- "按官方 sample 来" +- "想先跑通" + +Also choose `quickstart` when the user has not yet confirmed any working ConvoAI baseline. + +### Route to `integration` + +Choose `integration` when the user already has an app, workspace, or product context, but the +ConvoAI path is still not proven end-to-end. + +Examples: +- RTC app exists, now adding ConvoAI +- Existing web / mobile project wants ConvoAI business logic inserted +- The user wants sample-aligned integration into an existing codebase +- The user wants to swap one provider path but has not yet proven the overall ConvoAI flow works + +### Route to `advanced-feature` + +Choose `advanced-feature` only when the user has already confirmed a working ConvoAI baseline and +the ask is incremental. + +Examples: +- Add MCP / tools +- Add history or interrupt APIs +- Add template variables or prompt customization +- Add recording, multi-agent behavior, or other capability extensions +- Switch a provider for a known working flow + +### Route to `debugging` + +Choose `debugging` when the user leads with a failure signal: +- Error codes like `400`, `403`, `409`, `422`, `503` +- Agent `FAILED` +- Vendor auth / parameter issues +- Token, channel, or join behavior problems +- "why is this not working" with existing code / logs + +### Route to `ops-hardening` + +Choose `ops-hardening` when the request is about production readiness rather than first-run +success. + +Examples: +- Auth strategy +- Quota management +- Retry policy +- Monitoring / alerts +- Cost optimization + +## Transition Rules + +- `quickstart` and `integration` should start with the ConvoAI quickstart flow in `quickstart-intake.md`. +- `advanced-feature`, `debugging`, and `ops-hardening` should **skip** the full quickstart intake. +- `integration` should still use the full quickstart intake if the user has not yet proven that a + ConvoAI baseline works in their environment. +- `advanced-feature` and `debugging` may still trigger a **partial** preflight for the exact part + being changed, such as auth, token handling, or a single provider. + +## Required Output + +Before continuing, summarize the classification in one short recap: + +```text +ConvoAI mode: [quickstart / integration / advanced-feature / debugging / ops-hardening] +Why: [one sentence] +Next step: [which reference file will be used] +``` + +## Safety Rule + +Do not force users with an existing working ConvoAI project back through the full quickstart path. +Do not skip the quickstart path for users who are still blocked on foundational prerequisites. diff --git a/skills/voice-ai-integration/references/conversational-ai/sample-repos.md b/skills/voice-ai-integration/references/conversational-ai/sample-repos.md index 1d95104..31a09d2 100644 --- a/skills/voice-ai-integration/references/conversational-ai/sample-repos.md +++ b/skills/voice-ai-integration/references/conversational-ai/sample-repos.md @@ -3,33 +3,37 @@ Use this registry when the user needs a sample app, a reference project structure, or a known repository for ConvoAI integration work. -Default rule: +## Default Rule - `sample-aligned` is the default implementation mode when a listed repo matches the user's stack or requested structure. - `minimal-custom` may only be used if the user explicitly asks for a minimal demo or says not to follow the sample repo. - Fall back to Shengwang doc fetching only when the repo does not cover the needed API detail or is not useful for the user's question. -Alignment rules: +## Alignment Rules - Preserve sample env var names from the cloned repo's env template files unless the user explicitly asks to rename or normalize them. - Preserve the sample repo's folder structure, dependency choices, and API shape by default. -- If the sample repo already uses official SDKs or agent libraries for the needed flow, keep that path instead of replacing it with handwritten REST calls. +- Preserve the key libraries and dependency pattern already present in the chosen sample repo unless the user explicitly asks for a different architecture. - Apply a tight diff budget: change only what is required for the user's confirmed provider choices and requested functionality. - Before editing code, state which sample repo is being followed, which env template files were inspected, and list the exact planned differences. -Maintenance rules: +## Maintenance Rules - Keep repo URLs here only. Other ConvoAI docs should link to this file instead of repeating URLs. - Store repo root URLs only. +- Prefer HTTPS URLs by default. - Keep descriptions short and stable. Store only the structural fields that must be preserved during implementation. -Usage workflow: +## Usage Workflow 1. Pick the row that matches the user's platform or implementation goal. -2. Clone the repo on demand with `git clone --depth 1 `. +2. If a listed sample matches the quickstart or integration request, first let the user accept the default technical path (or explicitly ask for sample-aligned implementation), then clone that repo on demand with `git clone --depth 1 `, and prefer the HTTPS URL by default. 3. Inspect the repo to confirm its current stack, folder map, entrypoints, env template files, and API surface. 4. Use the cloned repo's actual env template files as the source of truth for env naming. -5. Prefer the official SDKs, agent libraries, and dependency patterns already present in the sample repo over handwritten REST calls. +5. Preserve the dependency and project pattern already present in the sample repo rather than inventing a fresh starter structure. 6. If the repo does not answer the question, fetch Shengwang docs for the missing API or product details. 7. Keep the implementation structurally close to the sample unless the user explicitly requests `minimal-custom`. +8. If the matching sample repo cannot be inspected in the current environment, stop and report that blocker instead of silently inventing a fresh starter structure. + +## Registry | Sample | Repo URL | Default Stack | Backend Entrypoint | Frontend Entrypoint | Use When | |--------|----------|---------------|--------------------|---------------------|----------| -| ConvoAI web quickstart | git@gitee.com:agoraio-community/conversational-ai-quickstart.git | Monorepo with Bun scripts, `web` on Next.js 16 + React 19 + TypeScript, and `server` on FastAPI/Python | `server/src/server.py` | `web/app/page.tsx` | The user wants a ConvoAI web app structure reference, starter layout, or frontend/backend shape that stays close to the official quickstart | -| ConvoAI native client apps | git@gitee.com:agoraio-community/conversational-ai-quickstart-native.git | Multi-platform monorepo: iOS, Android, Flutter, Windows, macOS. Each platform lives in its own subdirectory. Repo contains an `AGENTS.md` describing the directory layout per platform. Self-contained — each platform app calls ConvoAI REST API directly, no separate server needed. | N/A (no server) | See repo `AGENTS.md` for per-platform entrypoints | The user wants a ConvoAI native client (non-Web): Android, iOS, Flutter, Windows, or macOS. Clone this repo, read its `AGENTS.md` to locate the target platform directory, then inspect and align only that subdirectory. | +| ConvoAI web quickstart | https://gitee.com/agoraio-community/conversational-ai-quickstart.git | Monorepo with Bun scripts, `web` on Next.js 16 + React 19 + TypeScript, and `server` on FastAPI/Python | `server/src/server.py` | `web/app/page.tsx` | The user wants a ConvoAI web app structure reference, starter layout, or frontend/backend shape that stays close to the official quickstart | +| ConvoAI native client apps | https://gitee.com/agoraio-community/conversational-ai-quickstart-native.git | Multi-platform monorepo: iOS, Android, Flutter, Windows, macOS. Each platform lives in its own subdirectory. Repo contains an `AGENTS.md` describing the directory layout per platform. Self-contained — each platform app calls ConvoAI REST API directly, no separate server needed. | N/A (no server) | See repo `AGENTS.md` for per-platform entrypoints | The user wants a ConvoAI native client (non-Web): Android, iOS, Flutter, Windows, or macOS. Clone this repo, read its `AGENTS.md` to locate the target platform directory, then inspect and align only that subdirectory. | From 593f4fb070da53a25f56c59bed6957f67335061a Mon Sep 17 00:00:00 2001 From: chenyuguo Date: Wed, 25 Mar 2026 17:22:07 +0800 Subject: [PATCH 2/2] feat: convoai integration improve --- skills/voice-ai-integration/SKILL.md | 39 +++++++++++++++++++ .../references/conversational-ai/README.md | 8 ++++ .../conversational-ai/quickstart-intake.md | 12 ++++++ .../conversational-ai/sample-repos.md | 3 +- 4 files changed, 61 insertions(+), 1 deletion(-) diff --git a/skills/voice-ai-integration/SKILL.md b/skills/voice-ai-integration/SKILL.md index 606513a..29c8fde 100644 --- a/skills/voice-ai-integration/SKILL.md +++ b/skills/voice-ai-integration/SKILL.md @@ -9,6 +9,22 @@ license: MIT metadata: author: shengwang version: "1.0.0" + runtime: + required_binaries: + - bash + - curl + - git + network_hosts: + - doc.shengwang.cn + - doc-mcp.shengwang.cn + - gitee.com + required_env: + - SHENGWANG_APP_ID + - SHENGWANG_APP_CERTIFICATE + conditional_env: + - RTC_TOKEN + - provider-specific API keys + - provider-specific service identifiers --- # Shengwang Integration @@ -85,6 +101,29 @@ Research order: Once Step 3 provides enough information, proceed with implementation. +## Runtime Requirements + +This skill expects the following runtime basics: +- `bash` and `curl` for local doc-fetch helper scripts +- `git` for sample-repo inspection when the sample-aligned path is chosen +- Network access to `doc.shengwang.cn`, `doc-mcp.shengwang.cn`, and `gitee.com` when using doc fetch or sample inspection + +Core quickstart prerequisites: +- `SHENGWANG_APP_ID` +- `SHENGWANG_APP_CERTIFICATE` +- ConvoAI service activation in Shengwang Console + +Some flows also require conditional credentials such as provider API keys or service identifiers. +Those should always come from environment variables or user-provided secure input, never from hardcoded values. + +## Safety & Consent Rules + +- Do not clone external repos into the user's main workspace by default. Prefer a temporary path for inspection first. +- Do not modify an existing user project until the user has explicitly asked for code generation or integration work. +- Do not write secrets into project files unless the user explicitly asks for that behavior. Prefer env vars and example placeholders. +- Before performing network fetches or external repo inspection, state what will be downloaded or cloned. +- If a required runtime dependency or credential is missing, stop and explain the blocker instead of improvising around it. + ## Download Rules - Use `git clone --depth 1 ` with an HTTPS repo URL by default — GitHub/Gitee URLs must be repo root only (no branch/subdirectory paths) diff --git a/skills/voice-ai-integration/references/conversational-ai/README.md b/skills/voice-ai-integration/references/conversational-ai/README.md index 7375295..67ceb2d 100644 --- a/skills/voice-ai-integration/references/conversational-ai/README.md +++ b/skills/voice-ai-integration/references/conversational-ai/README.md @@ -71,6 +71,14 @@ Do not treat the REST quick start or endpoint index as the default architecture - Common diagnosis → [common-errors.md](common-errors.md) - Doc fetching guide → [../doc-fetching.md](../doc-fetching.md) +## Backend Doc Mapping + +When the user wants a server language that the demo does not cover, use these official quickstart docs: +- `Go` → `docs://default/convoai/restful/get-started/quick-start-go` +- `Java` → `docs://default/convoai/restful/get-started/quick-start-java` + +Treat those cases as a hybrid path: sample repo for overall structure when useful, official language quickstart for backend details. + ## Docs Fallback If fetch fails: https://doc.shengwang.cn/doc/convoai/restful/get-started/quick-start diff --git a/skills/voice-ai-integration/references/conversational-ai/quickstart-intake.md b/skills/voice-ai-integration/references/conversational-ai/quickstart-intake.md index 9c66dc3..6f3baff 100644 --- a/skills/voice-ai-integration/references/conversational-ai/quickstart-intake.md +++ b/skills/voice-ai-integration/references/conversational-ai/quickstart-intake.md @@ -497,4 +497,16 @@ After the structured spec is ready: - Follow the architecture rules in [README.md](README.md) - Use [sample-repos.md](sample-repos.md) for sample inspection and clone workflow - Use [generation-rules.md](generation-rules.md) for stable generation constraints +- Use the backend-language mapping below when the demo does not cover the chosen server language - Use `convoai-restapi/index.mdx` or endpoint docs only for missing low-level API details + +### Backend Language → Official Quickstart + +| Backend | Primary official doc | +|---------|----------------------| +| `Go` | `docs://default/convoai/restful/get-started/quick-start-go` | +| `Java` | `docs://default/convoai/restful/get-started/quick-start-java` | + +When the chosen backend is not covered by the sample repo, treat the flow as: +- frontend / architecture reference from the sample repo when useful +- backend implementation details from the mapped official quickstart doc above diff --git a/skills/voice-ai-integration/references/conversational-ai/sample-repos.md b/skills/voice-ai-integration/references/conversational-ai/sample-repos.md index 31a09d2..be268f5 100644 --- a/skills/voice-ai-integration/references/conversational-ai/sample-repos.md +++ b/skills/voice-ai-integration/references/conversational-ai/sample-repos.md @@ -23,13 +23,14 @@ repository for ConvoAI integration work. ## Usage Workflow 1. Pick the row that matches the user's platform or implementation goal. -2. If a listed sample matches the quickstart or integration request, first let the user accept the default technical path (or explicitly ask for sample-aligned implementation), then clone that repo on demand with `git clone --depth 1 `, and prefer the HTTPS URL by default. +2. If a listed sample matches the quickstart or integration request, first let the user accept the default technical path (or explicitly ask for sample-aligned implementation), then clone that repo on demand into a temporary inspection path with `git clone --depth 1 `, and prefer the HTTPS URL by default. 3. Inspect the repo to confirm its current stack, folder map, entrypoints, env template files, and API surface. 4. Use the cloned repo's actual env template files as the source of truth for env naming. 5. Preserve the dependency and project pattern already present in the sample repo rather than inventing a fresh starter structure. 6. If the repo does not answer the question, fetch Shengwang docs for the missing API or product details. 7. Keep the implementation structurally close to the sample unless the user explicitly requests `minimal-custom`. 8. If the matching sample repo cannot be inspected in the current environment, stop and report that blocker instead of silently inventing a fresh starter structure. +9. Only copy or adapt code into the user's actual project after the user has explicitly asked for implementation in that workspace. ## Registry