Problem statement
Great STT and TTS are hard to find for some languages, Apple has APIs that allow users to take advantage of on-device audio and text processing, the "same" technology used by Siri and keyboard dictation.
Having the above in mind, due to being on-device and high quality, it aligns with our goals to facilitate Assist interaction without having to rely on cloud for STT and TTS, using the recently introduced "Kiosk mode" for iOS we could turn those kiosk devices into home servers that process audio and text for Assist.
I vibe coded a mac app as proof of concept and I have been using the STT server since then and it is excellent (running on a Mac mini, iPad was not evaluated but should have similar performance)
https://github.com/bgoncal/Wyoming-Apple-STT-Server
Community signals
No response
Scope & Boundaries
In scope
- Add kiosk mode STT server feature
- Add kiosk mode TTS server feature
Not in scope
Foreseen solution
No response
Risks & open questions
No response
Appetite
No response
Execution issues
No response
Decision log
Problem statement
Great STT and TTS are hard to find for some languages, Apple has APIs that allow users to take advantage of on-device audio and text processing, the "same" technology used by Siri and keyboard dictation.
Having the above in mind, due to being on-device and high quality, it aligns with our goals to facilitate Assist interaction without having to rely on cloud for STT and TTS, using the recently introduced "Kiosk mode" for iOS we could turn those kiosk devices into home servers that process audio and text for Assist.
I vibe coded a mac app as proof of concept and I have been using the STT server since then and it is excellent (running on a Mac mini, iPad was not evaluated but should have similar performance)
https://github.com/bgoncal/Wyoming-Apple-STT-Server
Community signals
No response
Scope & Boundaries
In scope
Not in scope
Foreseen solution
No response
Risks & open questions
No response
Appetite
No response
Execution issues
No response
Decision log