You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization
Our paper "Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization" is available at https://arxiv.org/abs/2412.18525.
Dataset and code will be available in two months.
Samples for Zero-shot Capabilities on Vision Tasks (Relatively Simple Samples)
Acknowledge the spatial structure and identify variations in light intensity, translating these into a gradient scale representing distances. Accentuate regions where light diminishes gradually, enhancing the perception of depth by dimming peripheral areas. Adjust the distribution of luminance to highlight the central vanishing point, converting detailed textures into smooth transitions of grayscale.
Output Image
Ground Truth
Input Image
Start by analyzing the spatial layout to identify key structural elements. Gradually obscure less relevant details in the periphery to focus primarily on central depth. Increase contrast between light and dark areas to enhance perception of distance. Transition the textures into smooth gradients to reflect variations in depth, with a focus on enhanced luminosity for regions that are further away.
Output Image
Ground Truth
Input Image
Convert each region’s color intensity to a grayscale value corresponding to its relative distance from the viewer, with nearer objects appearing lighter and those farther away darker. Gradually smooth transitions between these regions to reflect continuous depth variation. Remove textural details that do not affect perceived depth to create uniformity based on object proximity. Adjust overall brightness to highlight the spatial configuration without explicit texture representation.
Output Image
Ground Truth
Instruction-level Zero-shot Samples (Surface Normal Estimation).
Input Image
Unseen Explanatory Instruction
Output Image
Ground Truth
Input Image
Translate the visible structures into a range of bright colors reflecting orientation angles, enhancing variations across surfaces.
Output Image
Ground Truth
Input Image
Convert visual elements into a spectrum of colors that represent the directionality of surfaces, capturing the angles and orientations vividly.
Output Image
Ground Truth
Input Image
Translate the scene into a colorful array to indicate surface orientations and angles.
Capture the outline and prominent edges of the cylindrical object and its surroundings, simplify everything by removing textures and detailed surfaces, and emphasize only the contours and distinct features while rendering a higher contrast between light and dark regions with sharp shifts in tones.
Output Image
Ground Truth
Input Image
The vibrant scene with multiple colors and details could be simplified into a monochrome representation. First, focus on defining the high-contrast areas between light and dark in a much starker, black-and-white way. Then, it's important to emphasize contours and significant edges, such as the lines around the face, the dress’ folds, and the furniture's details, while downplaying softer gradients. Removing extraneous colors and textures leaves behind only the essential structural features that provide a more abstract, but recognizable silhouette and objects.
Output Image
Ground Truth
Input Image
Begin by eliminating most of the intricate details and colors, transforming the vibrant elements into simplified outlines. Keep only the borders and defined structures, ensuring that the environment and figure take on an abstract form. Remove all texture, reducing the entire composition to minimal contrasting edges that define the shapes more than the details.
Output Image
Ground Truth
Instruction-level Zero-shot Samples (Dehazing).
Input Image
Unseen Explanatory Instruction
Output Image
Ground Truth
Input Image
Gradually reduce atmospheric interference, allowing clearer visibility of buildings and sharpening the outlines. Enhance clarity and brightness to bring out the details within the cityscape, providing a crisper view.
Output Image
Ground Truth
Input Image
Increasing the clarity by reducing haze, enhancing contrast, and deepening colors to give a sharper and more vibrant appearance to the scene.
Output Image
Ground Truth
Input Image
To achieve clarity and vibrancy, adjust the brightness and reduce the foggy effect. Enhance the sharpness of the trees and structures, allowing their details to stand out against the clear blue sky.
Output Image
Ground Truth
Instruction-level Zero-shot Samples (Deraining).
Input Image
Unseen Explanatory Instruction
Output Image
Ground Truth
Input Image
Imagine a scenario where rainfall suddenly stops and the water settles, clearing up the scene to enhance visibility and eliminate rain streaks.
Output Image
Ground Truth
Input Image
Remove the raindrops and streaks, focusing on enhancing clarity and brightness to achieve a crisp and rain-free appearance in the environment.
Output Image
Ground Truth
Input Image
Imagine the rainfall gradually lessening until the sky clears completely, leaving only the vibrant greenery and the birds in focus.
Apply a pink color overlay to bicycles, completely matching their shapes.
Output Image
Ground Truth
Input Image
Apply a solid grey color tint to fully cover one banana instance.Paint over each stove with a powderblue color.
Output Image
Ground Truth
Input Image
Spectral_r is the reversed version of Spectral, transitioning through red, yellow, green, and blue. Based on the previously defined colors, help me complete the segmentation task below. Color all instances of bucket, toilet using Spectral_r colors, following their contours precisely.
Output Image
Ground Truth
Samples for Zero-shot Capabilities on Vision Tasks (Relatively Hard Samples)
Explanatory Instruction:"Fill in all the empty outlines with rich colors that reflect vibrant tones, while redefining the shapes with smooth textures. Add layers of depth to the flat contours by enhancing brightness gradients in the sky, shadowing in the mountains, and intricate shades among the flowers. Reintroduce the sensation of open space and dimension by contrasting sharp objects with muted backgrounds and crisp details in the foreground."Resolution:448×448.
Instruction-level Zero-shot Samples (Deraining)
Explanatory Instruction:"Slowly remove the rain falling from the sky in the image, still maintain the state of night, and the girl on the bridge is also still holding the umbrella, but readjust the light in the distance."Limitations:The model struggles to preserve smaller objects and environmental details.Resolution:448×448.
Explanatory Instruction:"Increase the overall brightness to reveal details in dark areas while preserving highlights. Adjust the contrast to enhance the brightness differences between regions, making the structures and textures more distinct. Optimize color saturation to make previously dull colors more vibrant, such as the blue on the floor becoming more prominent. Apply denoising to reduce noise commonly found in low-light images, improving the overall quality. Ensure the final image appears natural while retaining the authentic style of the scene."Limitations:Controlling the intensity of lighting enhancement through language instructions is challenging, often resulting in significant deviations in the output.Resolution:448×448.
Instruction-level Zero-shot Samples (Desnowing)
Explanatory Instruction:"Remove the falling snow from the sky in the image, keep the other objects and snow in the image, still keep it dark, but pay attention to the adjustment of light behind the tree."Limitations:The second generated image struggles to retain nighttime details, while the third and fourth images exhibit poor performance in removing snow from the sky. Additionally, attempting to remove snow from the ground simultaneously can result in significant distortions.Resolution:448×448.
Explanatory Instruction:"The image shows noticeable multiple visual overlaps of trees and buildings. I would like to remove visual overlaps and restore a clear, sharp image without blurring. Do not alter the main content and pay attention to adjusting the light."Limitations:The success rate of guiding the model's task-level zero-shot capability through language instructions is relatively low.Resolution:448×448.
Instruction-level Zero-shot Samples (Dehazing)
Explanatory Instruction:"Retain the distant clouds in the image while removing as much fog as possible. Attempt to restore the faintly visible sun in the distance, but ensure there is no strong sunlight. Focus on recovering the mountains and the nearby trees as much as possible."Limitations:It will cause distortions in certain objects.Resolution:448×448.