Skip to content

Halluminate/sample_sft

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Halluminate logo

Sample SFT Datasets

Website: halluminate.ai

This repository showcases sample Supervised Fine-Tuning (SFT) data for UI/action grounding tasks and ecommerce/payment flows.

the events_flights trajectories were recorded using Halluminate's flight search simulator

Repository layout

  • sample_sft/events_flights/
    • JSONL transcripts of flight-search interactions (clicks, typing, selections) with aligned screenshots and concise rationales.
    • Side-by-side .md renderings that include images and stepwise annotations.
    • screenshot/ directory containing referenced PNGs.
  • sample_sft/payment_samples/
    • JSONL transcripts of ecommerce checkout flows (search, add-to-cart, address, payment) with aligned screenshots and rationales.
    • Side-by-side .md renderings and a screenshot/ directory.

Data format (JSONL)

Each line is a single step in an interaction sequence:

  • timestamp — wall-clock string for ordering (e.g., 2025-08-10_00:04:14).
  • action — atomic UI action (e.g., click (x, y), type text: <text>, scroll...).
  • screenshot — relative path to the raw screenshot PNG.
  • marked_screenshot — path to an annotated screenshot showing the referenced UI element/region.
  • element — short label of the target UI element when available.
  • rect — bounding box of the target region: { left, top, right, bottom }.
  • action_description — natural language description of the intended action.
  • action_description_checked — QA tag for description quality (e.g., Good).
  • thought — concise rationale describing why this step is taken.

Fields may be null when a step does not target a specific element (e.g., free typing, scrolling).

Examples

Example: events_flights (first lines only)

{"timestamp": "2025-08-10_00:04:14", "action": "click (846, 680)", "screenshot": "screenshot/20250810_000414_1.png", "element": "Where to?", "rect": {"left": 753, "top": 657, "right": 1031, "bottom": 713}, "marked_screenshot": "screenshot/20250810_000414_1_marked.png", "action_description": "click <\\the \"Where to?\" input box in the center section of the window>", "action_description_checked": "Good", "thought": "To begin searching for flights ..."}
{"timestamp": "2025-08-10_00:04:15", "action": "type text: jfk", "screenshot": "screenshot/20250810_000415_2.png", "element": null, "rect": null, "marked_screenshot": "screenshot/20250810_000415_2.png", "action_description": "type text: jfk", "action_description_checked": null, "thought": "To find the cheapest direct one-way flight ..."}

Example: payment_samples (first lines only)

{"timestamp": "2025-08-16_21:05:28", "action": "click (611, 99)", "screenshot": "screenshot/20250816_210528_1.png", "element": "Search Amazon", "rect": {"left": 415, "top": 85, "right": 1437, "bottom": 123}, "marked_screenshot": "screenshot/20250816_210528_1_marked.png", "action_description": "click <\\the search bar at the top-center of the Amazon page>", "action_description_checked": "Good", "thought": "Let me click into the search bar ..."}
{"timestamp": "2025-08-16_21:05:30", "action": "type text: Airpod pro 2", "screenshot": "screenshot/20250816_210530_3.png", "element": null, "rect": null, "marked_screenshot": "screenshot/20250816_210530_3.png", "action_description": "type text: Airpod pro 2", "action_description_checked": null, "thought": "I want to search specifically for the AirPods Pro 2 ..."}

Markdown companions

For many sequences there is a companion .md file that:

  • Restates the task description and difficulty level.
  • Shows each step with the corresponding annotated screenshot.
  • Mirrors the JSONL content for readability.

Intended uses

  • Schema exploration for building parsers and loaders.
  • Prototyping SFT data ingestion and validation.
  • Training small models on UI-grounded action prediction and rationale generation.
  • Demos for screenshot-conditioned agent pipelines.

Getting started

  • Parse JSONL line-by-line. Unknown fields should be ignored; prefer additive parsers.
  • Use screenshot/ relative paths to resolve images.
  • Treat coordinates as pixel units relative to the screenshot resolution.
  • Do not assume every step has a rect or element.

Notes

  • This is a sample set, not exhaustive coverage. Some tasks are abbreviated.
  • File names include timestamps to help with ordering and reproducibility.
  • If you publish results using this sample, please attribute this repository.

About

Sample SFT trajectories produced by action collectors

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors