Skip to content

feat(liaison): use Tencent Cloud ASR for voice sessions#54

Open
kaileliu wants to merge 1 commit into
syswonder:devfrom
kaileliu:feat/tencent-cloud-asr-liaison
Open

feat(liaison): use Tencent Cloud ASR for voice sessions#54
kaileliu wants to merge 1 commit into
syswonder:devfrom
kaileliu:feat/tencent-cloud-asr-liaison

Conversation

@kaileliu

@kaileliu kaileliu commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

This PR replaces Liaison’s voice-session ASR path from the previous gRPC ASR stream provider to Tencent Cloud realtime ASR over WebSocket, while keeping the rest of the voice flow unchanged: mic capture is still discovered through Atlas, and voiceprint, Pilot, optional TTS, and speaker playback continue to use the existing pipeline. It adds Tencent Cloud ASR signing and configuration through environment variables, sends captured PCM audio to Tencent Cloud, parses the returned recognition results into ASR_FINAL, and updates the Liaison documentation accordingly. The change was verified with cargo test -p robonix-liaison, a live Tencent Cloud ASR API test using environment-provided credentials, and an end-to-end Liaison voice-session test that successfully produced an ASR_FINAL result followed by SESSION_DONE.

@enkerewpo enkerewpo requested review from HustWolfzzb and Copilot June 4, 2026 09:17
@enkerewpo enkerewpo added the enhancement New feature or request label Jun 4, 2026
@enkerewpo enkerewpo added this to the robonix v1.0 milestone Jun 4, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the robonix-liaison voice-session pipeline to use Tencent Cloud realtime ASR over WebSocket (including URL signing + env-driven configuration) in place of the previous robonix/system/speech/asr_stream gRPC streaming provider, while keeping mic discovery and the downstream voiceprint → Pilot → optional TTS flow intact.

Changes:

  • Replace the voice-session ASR path with a Tencent Cloud WebSocket client that sends PCM in fixed-size chunks and parses recognition results into ASR_FINAL.
  • Add Tencent ASR signing/config via environment variables and basic unit coverage for URL generation / result aggregation.
  • Update crate docs and add a minimal voice_client example for smoke-testing a running stack.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
rust/crates/robonix-liaison/src/voice.rs Implements Tencent Cloud ASR WebSocket client, signing, env config, and integrates it into the voice-session pipeline.
rust/crates/robonix-liaison/README.md Documents the new Tencent ASR environment variables and updated voice flow.
rust/crates/robonix-liaison/examples/voice_client.rs Adds a minimal non-interactive client example to validate ASR_FINAL + SESSION_DONE.
rust/crates/robonix-liaison/Cargo.toml Adds dependencies required for Tencent ASR signing and WebSocket transport; registers the new example target.
rust/Cargo.lock Locks new transitive dependencies (TLS/WebSocket/signing stack).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +533 to 546
let final_deadline = tokio::time::Instant::now() + Duration::from_secs(10);
while tokio::time::Instant::now() < final_deadline {
match tokio::time::timeout(Duration::from_millis(500), ws.next()).await {
Ok(Some(Ok(msg))) => {
handle_tencent_asr_message(msg, &mut results, &mut send_finished)?;
if send_finished {
break;
}
}
Ok(Some(Err(e))) => anyhow::bail!("receive Tencent ASR message: {e}"),
Ok(None) => break,
Err(_) if !results.is_empty() => break,
Err(_) => {}
}
Comment on lines +522 to +528
for chunk in audio_pcm.chunks(TENCENT_ASR_CHUNK_BYTES) {
ws.send(Message::Binary(chunk.to_vec().into()))
.await
.map_err(|e| anyhow::anyhow!("send Tencent ASR audio chunk: {e}"))?;
drain_tencent_asr_messages(&mut ws, &mut results).await?;
tokio::time::sleep(Duration::from_millis(TENCENT_ASR_CHUNK_INTERVAL_MS)).await;
}
Comment on lines +699 to +706
let app_id = std::env::var("ROBONIX_LIAISON_TENCENT_ASR_APP_ID")
.or_else(|_| std::env::var("TENCENT_ASR_APP_ID"))
.or_else(|_| std::env::var("TENCENTCLOUD_APP_ID"))
.map_err(|_| {
anyhow::anyhow!(
"missing Tencent ASR app id; set ROBONIX_LIAISON_TENCENT_ASR_APP_ID or TENCENT_ASR_APP_ID"
)
})?;
@enkerewpo

enkerewpo commented Jun 5, 2026

Copy link
Copy Markdown
Member

Thanks for adding cloud asr support 👍. My review notes:

  1. Could you please test your functionality on the latest dev branch using the webots demo with an actual audio_driver configuration like laptop mic/speaker? In dev the liaison module has been moved to [root]/system so you might need to move your new code there.
  2. I see a lot of xxx_tencent_xxx in voice.rs, since the previous funasr is working, could we just split the "backends" to seperate rust code instead of merging them all in one file? like backend/funasr.rs backend/tencent_asr.rs, and please also add the config system for user to choose their backend when deploying:

see examples/webots/robonix_manifest.yaml:

system:
  atlas:
    listen: 127.0.0.1:50051
    log: info
  scene:
    log: info
  executor:
    listen: 127.0.0.1:50061
    log: info
  pilot:
    listen: 127.0.0.1:50071
    log: debug
    vlm:
      upstream: ${VLM_BASE_URL}
      api_key: ${VLM_API_KEY}
      model: ${VLM_MODEL}
      api_format: openai
  liaison:
    listen: 127.0.0.1:50081
    log: info
    # maybe add some new config fields here? the backends, tencent cloud apis...
    # you might need to update rbnx tool to pass the config to your robonix-liaison,
    # you can reference the above 'vlm' fields and related logic in rbnx to
    # implement the similar config/arg passing to your robonix-liaion program
  1. we recommend using program args for configs insteads of a lot of hard-coded environment variable strings

@enkerewpo enkerewpo added the module Core module / feature development label Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request module Core module / feature development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants