Skip to content

Method comparison: Image benchmark shows sample images#3309

Open
BenjaminBossan wants to merge 2 commits into
huggingface:mainfrom
BenjaminBossan:method-comparison-image-gen-app-shows-images
Open

Method comparison: Image benchmark shows sample images#3309
BenjaminBossan wants to merge 2 commits into
huggingface:mainfrom
BenjaminBossan:method-comparison-image-gen-app-shows-images

Conversation

@BenjaminBossan

Copy link
Copy Markdown
Member

The Gradio space for the PEFT benchmarks already contains logic to show the image generation results once the experiments have been run and pushed to the repo. However, it doesn't contain any logic to show generated sample images. Those are quite important, as the metrics are not sufficient to estimate the quality of the fine-tuned model.

This PR adds a gallery element below the dataframe that shows sample images. By default, it just shows the training images from the dataset to give viewers an impression of what they're dealing with. If a user clicks on a row in the dataframe, e.g. the row for LoRA, the five sample images for that PEFT method are shown (if they can be found in the HF bucket). There is a toggle for the user to switch between showing generated sample images and training images.

Morever, in this PR, I made a change to use json.dumps to put the configs into the dataframe instead of simple coercing them to string. That makes it easier to parse the data.

I also noticed that in the dataframe, we would show the combined metrics between all tasks, which included metrics that are not shared. So e.g. for the image gen task, we would show test accuracy, and for MetaMath, we would show DINO similarity. The PR changes the logic to only show relevant metrics.

Moreover, the image gen metrics were on the far right of the dataframe, making them hard to view. Now the order puts them further to the left.

Finally, I added a short task description above the dataframe so that viewers can understand at a glance what they're looking at. It includes a link to the checkpoints on HF buckets.

image

The Gradio space for the PEFT benchmarks already contains logic to
show the image generation results once the experiments have been run
and pushed to the repo. However, it doesn't contain any logic to show
generated sample images. Those are quite important, as the metrics are
not sufficient to estimate the quality of the fine-tuned model.

This PR adds a gallery element below the dataframe that shows sample
images. By default, it just shows the training images from the dataset
to give viewers an impression of what they're dealing with. If a user
clicks on a row in the dataframe, e.g. the row for LoRA, the five
sample images for that PEFT method are shown (if they can be found in
the HF bucket). There is a toggle for the user to switch between
showing generated sample images and training images.

Morever, in this PR, I made a change to use json.dumps to put the
configs into the dataframe instead of simple coercing them to
string. That makes it easier to parse the data.

I also noticed that in the dataframe, we would show the combined
metrics between all tasks, which included metrics that are not
shared. So e.g. for the image gen task, we would show test accuracy,
and for MetaMath, we would show DINO similarity. The PR changes the
logic to only show relevant metrics.

Moreover, the image gen metrics were on the far right of the
dataframe, making them hard to view. Now the order puts them further
to the left.

Finally, I added a short task description above the dataframe so that
viewers can understand at a glance what they're looking at. It
includes a link to the checkpoints on HF buckets.
@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan BenjaminBossan requested a review from githubnemo June 8, 2026 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants