What problem does this solve?
Currently the AI coach has text only input. Most models and providers supported allow multimodal inputs (image, audio). It would be a good feature to have the image input for chat. The Google health app has this and is really useful for calorie intake tracking. It might also be good to have speech-to-text so the users can speak to the coach instead of typing.
Proposed solution
Most models natively support image inputs using the API. Minor changes to the UI are required to have a camera icon in the input bar and then have option to click or upload image from gallery.
For speech-to-text there are couple of frameworks that can be used for swift. Voice to voice is known to be worse in accuracy and quality of answers, so might better to have S2T.
Area
AI Coach (tools, prompts, on-device LLM)
Alternatives considered
No response
Would you be willing to work on this?
None
What problem does this solve?
Currently the AI coach has text only input. Most models and providers supported allow multimodal inputs (image, audio). It would be a good feature to have the image input for chat. The Google health app has this and is really useful for calorie intake tracking. It might also be good to have speech-to-text so the users can speak to the coach instead of typing.
Proposed solution
Most models natively support image inputs using the API. Minor changes to the UI are required to have a camera icon in the input bar and then have option to click or upload image from gallery.
For speech-to-text there are couple of frameworks that can be used for swift. Voice to voice is known to be worse in accuracy and quality of answers, so might better to have S2T.
Area
AI Coach (tools, prompts, on-device LLM)
Alternatives considered
No response
Would you be willing to work on this?
None