Skip to content

Image embeddings on GCP#19

Draft
tonytw1 wants to merge 41 commits into
dev21from
spike-vertex-embedding
Draft

Image embeddings on GCP#19
tonytw1 wants to merge 41 commits into
dev21from
spike-vertex-embedding

Conversation

@tonytw1
Copy link
Copy Markdown
Member

@tonytw1 tonytw1 commented May 9, 2026

What does this change?

Implements the More Like This and semantic search features using GCP APIs rather than AWS Bedrock.

Uses Gemini Embedding 2 as the model.

Scales uploaded images using ImageOperations before submitting them to the prediction API.
Every image is scaled rather than relying on the original or optimised image.

Does not need a separate Lambda.

How should a reviewer test this change?

How can success be measured?

Who should look at this?

Tested? Documented?

  • locally by committer
  • locally by Guardian reviewer
  • on the Guardian's TEST environment
  • relevant documentation added or amended (if needed)

@tonytw1 tonytw1 force-pushed the spike-vertex-embedding branch 3 times, most recently from 76da1b8 to 340dc0c Compare May 11, 2026 19:52
@tonytw1 tonytw1 changed the title Spike vertex embedding Image embeddings on GCP May 11, 2026
@tonytw1 tonytw1 force-pushed the spike-vertex-embedding branch 3 times, most recently from 1509bdc to ab636a3 Compare May 18, 2026 07:28
@tonytw1 tonytw1 force-pushed the spike-vertex-embedding branch 12 times, most recently from b07d5ff to 4305b01 Compare May 25, 2026 17:55
tonytw1 added 10 commits May 25, 2026 18:58
… a source image and for input into a prediction end point.
…observed results are based solely on the pixels.
…ding source image to preserve the aspect ratio of the subjects and (maybe) avoid cropping out of subjects.
Bring in the GCP gen ai client library.

Embedding source is presented as an array of image bytes.
Not here; it can go after the normal image upload so that it doesn't impact latency.
…st after the Image message.

The update embeddings message needs to arrive after the Image create message.
tonytw1 added 29 commits May 25, 2026 18:58
…ile size so there is no penalty for trying to flex on max clarity of small tiles.
0.9 looks like a usable cutoff for visually similar.
…lar is handled with the knn special case.

Fixes no similar results because of:

```
 "filter": {
      "bool": {
        "must": [
          {
            "match": {
              "similar": {
                "query": "f7bfe3925ac6562dbb7428e32b36c9f5e605a434",
                "operator": "AND"
              }
            }
          }
        ],
        ```
@tonytw1 tonytw1 force-pushed the spike-vertex-embedding branch from 4305b01 to d883393 Compare May 25, 2026 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant