Skip to content

[Feature]: Make AI coach multimodal #27

Description

@saksham2001

What problem does this solve?

Currently the AI coach has text only input. Most models and providers supported allow multimodal inputs (image, audio). It would be a good feature to have the image input for chat. The Google health app has this and is really useful for calorie intake tracking. It might also be good to have speech-to-text so the users can speak to the coach instead of typing.

Proposed solution

Most models natively support image inputs using the API. Minor changes to the UI are required to have a camera icon in the input bar and then have option to click or upload image from gallery.

For speech-to-text there are couple of frameworks that can be used for swift. Voice to voice is known to be worse in accuracy and quality of answers, so might better to have S2T.

Area

AI Coach (tools, prompts, on-device LLM)

Alternatives considered

No response

Would you be willing to work on this?

None

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions