feat(liaison): use Tencent Cloud ASR for voice sessions#54
Open
kaileliu wants to merge 1 commit into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the robonix-liaison voice-session pipeline to use Tencent Cloud realtime ASR over WebSocket (including URL signing + env-driven configuration) in place of the previous robonix/system/speech/asr_stream gRPC streaming provider, while keeping mic discovery and the downstream voiceprint → Pilot → optional TTS flow intact.
Changes:
- Replace the voice-session ASR path with a Tencent Cloud WebSocket client that sends PCM in fixed-size chunks and parses recognition results into
ASR_FINAL. - Add Tencent ASR signing/config via environment variables and basic unit coverage for URL generation / result aggregation.
- Update crate docs and add a minimal
voice_clientexample for smoke-testing a running stack.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| rust/crates/robonix-liaison/src/voice.rs | Implements Tencent Cloud ASR WebSocket client, signing, env config, and integrates it into the voice-session pipeline. |
| rust/crates/robonix-liaison/README.md | Documents the new Tencent ASR environment variables and updated voice flow. |
| rust/crates/robonix-liaison/examples/voice_client.rs | Adds a minimal non-interactive client example to validate ASR_FINAL + SESSION_DONE. |
| rust/crates/robonix-liaison/Cargo.toml | Adds dependencies required for Tencent ASR signing and WebSocket transport; registers the new example target. |
| rust/Cargo.lock | Locks new transitive dependencies (TLS/WebSocket/signing stack). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+533
to
546
| let final_deadline = tokio::time::Instant::now() + Duration::from_secs(10); | ||
| while tokio::time::Instant::now() < final_deadline { | ||
| match tokio::time::timeout(Duration::from_millis(500), ws.next()).await { | ||
| Ok(Some(Ok(msg))) => { | ||
| handle_tencent_asr_message(msg, &mut results, &mut send_finished)?; | ||
| if send_finished { | ||
| break; | ||
| } | ||
| } | ||
| Ok(Some(Err(e))) => anyhow::bail!("receive Tencent ASR message: {e}"), | ||
| Ok(None) => break, | ||
| Err(_) if !results.is_empty() => break, | ||
| Err(_) => {} | ||
| } |
Comment on lines
+522
to
+528
| for chunk in audio_pcm.chunks(TENCENT_ASR_CHUNK_BYTES) { | ||
| ws.send(Message::Binary(chunk.to_vec().into())) | ||
| .await | ||
| .map_err(|e| anyhow::anyhow!("send Tencent ASR audio chunk: {e}"))?; | ||
| drain_tencent_asr_messages(&mut ws, &mut results).await?; | ||
| tokio::time::sleep(Duration::from_millis(TENCENT_ASR_CHUNK_INTERVAL_MS)).await; | ||
| } |
Comment on lines
+699
to
+706
| let app_id = std::env::var("ROBONIX_LIAISON_TENCENT_ASR_APP_ID") | ||
| .or_else(|_| std::env::var("TENCENT_ASR_APP_ID")) | ||
| .or_else(|_| std::env::var("TENCENTCLOUD_APP_ID")) | ||
| .map_err(|_| { | ||
| anyhow::anyhow!( | ||
| "missing Tencent ASR app id; set ROBONIX_LIAISON_TENCENT_ASR_APP_ID or TENCENT_ASR_APP_ID" | ||
| ) | ||
| })?; |
Member
|
Thanks for adding cloud asr support 👍. My review notes:
see system:
atlas:
listen: 127.0.0.1:50051
log: info
scene:
log: info
executor:
listen: 127.0.0.1:50061
log: info
pilot:
listen: 127.0.0.1:50071
log: debug
vlm:
upstream: ${VLM_BASE_URL}
api_key: ${VLM_API_KEY}
model: ${VLM_MODEL}
api_format: openai
liaison:
listen: 127.0.0.1:50081
log: info
# maybe add some new config fields here? the backends, tencent cloud apis...
# you might need to update rbnx tool to pass the config to your robonix-liaison,
# you can reference the above 'vlm' fields and related logic in rbnx to
# implement the similar config/arg passing to your robonix-liaion program
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR replaces Liaison’s voice-session ASR path from the previous gRPC ASR stream provider to Tencent Cloud realtime ASR over WebSocket, while keeping the rest of the voice flow unchanged: mic capture is still discovered through Atlas, and voiceprint, Pilot, optional TTS, and speaker playback continue to use the existing pipeline. It adds Tencent Cloud ASR signing and configuration through environment variables, sends captured PCM audio to Tencent Cloud, parses the returned recognition results into
ASR_FINAL, and updates the Liaison documentation accordingly. The change was verified withcargo test -p robonix-liaison, a live Tencent Cloud ASR API test using environment-provided credentials, and an end-to-end Liaison voice-session test that successfully produced anASR_FINALresult followed bySESSION_DONE.