Skip to content

Support image input in the chat completion request#55

Open
Youho99 wants to merge 4 commits into
lhenault:mainfrom
Youho99:main
Open

Support image input in the chat completion request#55
Youho99 wants to merge 4 commits into
lhenault:mainfrom
Youho99:main

Conversation

@Youho99

@Youho99 Youho99 commented Jul 10, 2024

Copy link
Copy Markdown

Tested with a single image

This pull request responds to issue #54

It allows you to take into account the architecture of the OpenAI API request with an image

Example on the OpenAI documentation:

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4-turbo",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What'\''s in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
            }
          }
        ]
      }
    ],
    "max_tokens": 300
  }'

The code has not been prettyfied, so we need to review that

@lhenault

Copy link
Copy Markdown
Owner

Thanks for your work, will happily review this once you think it's ready (and passing the pre-commit check). If you have a working example for VLM / image processing to share, that would be a nice addition to the existing ones.

@Youho99 Youho99 marked this pull request as ready for review July 15, 2024 14:35
@Youho99

Youho99 commented Jul 15, 2024

Copy link
Copy Markdown
Author

Don't use grpcio and grpcio-tools 1.65.0 version (remised version)

I don't know how to modify it in the poetry requirements

@Youho99

Youho99 commented Jul 16, 2024

Copy link
Copy Markdown
Author

I just modified the rules regarding the versions of grpcio and grpcio-tools in the toml, and I regenerated the poetry.lock

Since this is my first time doing this, I would like to request special attention on this.

@Youho99

Youho99 commented Jul 16, 2024

Copy link
Copy Markdown
Author

I will provide an example of using my feature in a second step (in another PR I think)

@Youho99

Youho99 commented Jul 16, 2024

Copy link
Copy Markdown
Author

@lhenault I think you can review this PR (and change the version accordingly) :)

@lhenault

Copy link
Copy Markdown
Owner

Hey @Youho99 !

I tried your changes the other day and encountered a few issues, but probably because of me. Thanks again for your PR and sorry for the delay, it's very much appreciated. 😌

Let me have another look soon (and if you have a working example for image inputs that might speed up things).

@Youho99

Youho99 commented Aug 28, 2024

Copy link
Copy Markdown
Author

@lhenault

In the next few days I'll get back to it, and provide an example.

Let me know if you have any problems.

@Youho99

Youho99 commented Jan 9, 2025

Copy link
Copy Markdown
Author

@lhenault Hello and happy new year!

After a fews days (lol), i have finally produce an example for the image support.

Well, this one is not in the format of the examples already present in the library. We can do this work later.

Here is the project:
https://github.com/Youho99/phi-3_5-vision-onnx-simpleai

@Youho99

Youho99 commented Mar 23, 2025

Copy link
Copy Markdown
Author

@lhenault any update ?

@lhenault

Copy link
Copy Markdown
Owner

Hey sorry I somehow missed this and the previous update. I'll have a look at it soon. Thanks a lot for the submission!

@Youho99

Youho99 commented Oct 4, 2025

Copy link
Copy Markdown
Author

@lhenault
Can you reviex this PR ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants