diff --git a/tutorials/Jupyter-Notebook/intro-to-jupyter.md b/tutorials/Jupyter-Notebook/intro-to-jupyter.md index db62ca7..3ab51e6 100644 --- a/tutorials/Jupyter-Notebook/intro-to-jupyter.md +++ b/tutorials/Jupyter-Notebook/intro-to-jupyter.md @@ -1,13 +1,13 @@ -## Section 6: Launching the Jupyter Notebook Environment +## Launching the Jupyter Notebook Environment Jupyter is an open source project that provides a webapp interface for writing code and documents. Throughout this tutorial, we will be using a Jupyter Notebook environment for making Tapis User Requests. -### Step 6.1: Starting up your Jupyter Notebook Environment +### Starting up your Jupyter Notebook Environment For this tutorial, we will use [TACC's Public JupyterHub](https://public.jupyter.tacc.cloud) You may login with your TACC accounts. -### Step 6.2: Navigating to the $WORK File System +### Navigating to the $WORK File System On successful login, ensure that you have access to a folder, `work`, within the Jupyter file system. diff --git a/tutorials/Tapis_FineTune/01-finetune-app.md b/tutorials/Tapis_FineTune/01-finetune-app.md index d6db402..98f575b 100644 --- a/tutorials/Tapis_FineTune/01-finetune-app.md +++ b/tutorials/Tapis_FineTune/01-finetune-app.md @@ -1,91 +1,86 @@ # Ultralytics Fine-Tuning App -This application allows users to fine-tune Ultralytics YOLO models using Singularity containers in a batch processing environment. It is designed to run on High-Performance Computing (HPC) systems via Tapis, leveraging GPU acceleration for training tasks. +This application allows users to fine-tune Ultralytics YOLO 26 models using Singularity containers in a batch processing environment. It is designed to run on High-Performance Computing (HPC) systems via Tapis, leveraging GPU acceleration for training tasks. > **Note:** This app is already registered for the tutorial and is available to run via the Tapis UI. --- -## App Definition - -The following JSON represents the application definition used to register the fine-tuning service in Tapis: - -```json -{ - "id": "ultralytics-fine-tune", - "version": "0.1", - "description": "An app to fine-tune ultralytics Yolo using Singularity in batch mode.", - "jobType": "BATCH", - "runtime": "SINGULARITY", - "containerImage": "/work/projects/aci/cic/apps/ultralytics-fine-tune/Ultralytics_FT_Tapis_app.sif", - "jobAttributes": { - "execSystemExecDir": "${JobWorkingDir}/jobs/${JobUUID}", - "execSystemInputDir": "${JobWorkingDir}/jobs/${JobUUID}/data", - "execSystemOutputDir": "${JobWorkingDir}/jobs/${JobUUID}/output", - "parameterSet": { - "containerArgs": [ - { - "name": "nvidia", - "inputMode": "FIXED", - "arg": "--nv", - "notes": {} - } - ], - "envVariables": [ - { - "key": "EPOCHS", - "value": "3", - "description": "Number of epochs for the fine-tune job", - "inputMode": "REQUIRED", - "notes": {} - } - ] - }, - "memoryMB": 1, - "nodeCount": 1, - "coresPerNode": 1, - "maxMinutes": 10 - } -} -``` -# Understanding Ultralytics App Parameters - -The `ultralytics-fine-tune` application uses a specific set of parameters to manage how the Singularity container interacts with the HPC hardware and how the training process is executed. +## Locating the App and Configure Job Submission ---- +Go to the **App** tab and find the app with name `yolo-finetuning-arm64`. -## 1. Container Arguments -Container arguments define how the Tapis runtime (Singularity) is initialized on the execution system. + -| Parameter | Type | Value | Description | -| :--- | :--- | :--- | :--- | -| **nvidia** | `FIXED` | `--nv` | This is the most critical argument. It tells Singularity to bind the host's NVIDIA drivers inside the container, enabling GPU acceleration for YOLO training. | +Click on the **Submit Job** button to and then click on the **USE GUIDED JOB LAUNCHER** button. + +Now we are in the job configuration interface. Click **Continue** on the job summary page. ---- + -## 2. Environment Variables -Environment variables are passed into the training script inside the container to control the YOLO model's behavior. +In the **Execution Options** page, select the following: -| Variable | Mode | Default | Description | -| :--- | :--- | :--- | :--- | -| **EPOCHS** | `REQUIRED` | `3` | Defines the number of full passes through the training dataset. For this tutorial, it is set low (3) to ensure quick completion, but can be increased for real-world accuracy. | + 1. Execution System - `vista-test-nairr` + 2. Job Type - `Batch` + 3. Batch Logical Queue - `gh` ---- +Click **Continue** -## 3. Resource Attributes -These parameters define the hardware footprint requested from the Slurm scheduler on the execution system. + -* **Node Count (`1`):** The number of physical machines requested. Fine-tuning for this tutorial is optimized for a single node. -* **Cores Per Node (`1`):** The number of CPU cores allocated. Since the primary work is done by the GPU (via the `--nv` flag), CPU requirements are kept minimal. -* **Memory (`1 MB`):** The RAM allocation. *Note: In many Tapis configurations, 1 implies a minimum default or is managed by the specific queue policy.* -* **Max Minutes (`10`):** The "Wallclock" time. If the job exceeds 10 minutes, the scheduler will terminate it to prevent hanging processes from wasting allocation credits. +Click **Continue** + +Click **Continue** + ---- +Click **Continue** + + + +There are 4 environment variables important for the fine-tuning job. + + 1. EPOCHS - number of learning rounds. 10 or 20 is a good number. + 2. YOLO_26_MODEL - the yolo model name. Here we use `yolo26n` for the best trade-off between quality and speed. + 3. TWO_STAGE_FINE_TUNE - If true, we use two-stage fine-tuning process where the first stage freezes the backbone and trains only the neck and head, allowing the detection layers to adapt to the new classes without disrupting pretrained features. The second stage unfreezes all layers and trains the full model with a lower learning rate to refine the backbone for the target domain. + 4. The freeze parameter accepts an integer. An integer freeze=10 freezes the first 10 layers (0 through 9, which corresponds to the backbone in YOLO26). This speeds up training and reduces overfitting when the dataset is small relative to the model capacity. + +Just keep all these settings as is, and click **Continue**. + + + +Expand **TACC Resource Allocation** and **Reservation Name** + 1. For **TACC Resource Allocation**, put a space and then `TRA24006` after `-A` + 2. For **Reservation Name**, put a space and then your *reservation code* after `--reservation` + +Note that the reservation code for **Sunday** sessions is `Tapis+Tutorial-Sun` and the reservation code for **Monday** sessions is `Tapis+Tutorial-Mon`. + +Click **Continue** + + + +Click **Continue** + + + + +## Submit the job + +Click **Submit Job**, and this should submit your job. + + + +It can take roughly 5-10 minutes to finish the job, but depending on the job waiting time, it can be even longer. + +But once finished, you can open the tapisjob.out file and view it. At the end of the output, you should see message indicating that the fine-tuned models are now saved to FlexServ's private model pool (`$SCRATCH/flexserv/models`). + + + +## Up Next -> **Note:** These parameters are pre-configured for the tutorial. When using the Tapis UI, you will primarily interact with the **EPOCHS** variable. \ No newline at end of file +In our prompt engineering section, we will use a coding LLM in FlexServ to generate a python code that will call the Yolo inference API in FlexServ to perform the object detection inference using both the `yolo26n` based model and the `yolo26n-fine-tuned` model. We can see the difference in terms of the accuracy of these two models. \ No newline at end of file diff --git a/tutorials/Tapis_FineTune/02-finetune-job.md b/tutorials/Tapis_FineTune/02-finetune-job.md index edb6ee8..d08cc37 100644 --- a/tutorials/Tapis_FineTune/02-finetune-job.md +++ b/tutorials/Tapis_FineTune/02-finetune-job.md @@ -1,80 +1,15 @@ -# Submitting an Ultralytics Fine-Tuning Job + \ No newline at end of file diff --git a/tutorials/Tapis_FlexServ/01c-code-gen-flexserv.md b/tutorials/Tapis_FlexServ/01c-code-gen-flexserv.md index 68137b6..f70afd1 100644 --- a/tutorials/Tapis_FlexServ/01c-code-gen-flexserv.md +++ b/tutorials/Tapis_FlexServ/01c-code-gen-flexserv.md @@ -1,4 +1,4 @@ -## Section 7: Prompt Engineering and Generating Image Detection Code +## Prompt Engineering and Generating Image Detection Code [Lecture Slides](https://docs.google.com/presentation/d/1BVLnUbyiWjsaS33zMshW3TXqtfvv6zGklaCNBeX7Go0/edit?slide=id.g3cdba15a02d_6_191#slide=id.g3cdba15a02d_6_191) @@ -11,19 +11,25 @@ To test the capabilities of the FlexServ inference server, we can provide a comp ### Exploring the FlexServ UI -### Step 7.1: Refresh Model Pool +### Refresh Model Pool -- Refresh the Model pool so you can see public and private models available for you to run. - +- Refresh the Model pool so you can see public and private models available for you to run. +- Drag the following model from public pool to private pool. + - Qwen/Qwen2.5-Coder-14B-Instruct + - Qwen/Qwen2.5-Coder-32B-Instruct +- Right click one of the above models and click **Load** in the menu. +- Wait until the progress bar completes. If load fails, try again. -### Step 7.2: Update the Responses API and Parameters + -- Copy and paste the following prompt into the FlexServ UI in the `Responses API`, `Input(Markdown)` section, shown in the image below. +### Update the Responses API and Parameters +- Below is the prompt for our code generation. Before pasting it into the chat box of FlexServ, make sure you update the following FACTS in the `FACTS TO KNOW` section: + - BASEURL of FLEXSERV inference engine: (your FlexServ URL here) + - Bearer Auth token for FLEXSERV inference engine: (your FlexServ Token here)
-
TASK DESCRIPTION:
This is an IMAGE-LEVEL BINARY CLASSIFICATION task implemented using an object detection model.
The goal is to determine whether an image contains an animal or not.
@@ -34,16 +40,74 @@ Each directory contains two subdirectories:
images/ → contains image files (.jpg, .jpeg, .png)
labels/ → contains YOLO format .txt files
-GROUND-TRUTH LOGIC: An image is considered an animal if a corresponding .txt file exists and is not empty in the labels/ folder.
+GROUND-TRUTH LOGIC:
+An image is considered an animal if a corresponding .txt file exists and is not empty in the labels/ folder.
+A non-empty file is a file whose size is larger than 0, and the size of an empty image is 0.
MODEL REQUIREMENTS:
Use ONLY a pretrained Ultralytics YOLO detection model (e.g., yolov8n.pt).
-Load the model using the Ultralytics YOLO API.
+Call our RESTful API for yolo inference.
Assume YOLO detects animals using class ID animal at index 0.
+YOLO INFERENCE APIs:
+Sample CURL Request:
+```
+curl -sS -X POST '${BASEURL}/v1/yolo/infer' \
+ -H 'Authorization: Bearer ${FLEXSERV_TOKEN}' \
+ -H 'Content-Type: application/json' \
+ -d '{"model":"${FLEXSERV_MODEL_ID}","task":"detect","source":{"type":"upload","media_type":"image","content_base64":"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL","filename":"NOR3__2019-07-19__11-40-00-1-_JPG.rf.b85ee30f99a803b09f8c5a7da7f9a508.jpg"},"params":{"conf":0.25,"iou":0.7,"imgsz":640,"max_det":300,"show_labels":true,"show_conf":true},"response":{"include":["predictions","timing"],"box_format":"xyxy","classification_topk":5,"return_original_shape":true}}'
+```
+
+RESPONSE:
+```
+{
+ "object": "yolo.inference",
+ "task": "detect",
+ "model": "/app/models/private/yolo--yolo26l/model.pt",
+ "media_type": "image",
+ "predictions": [
+ {
+ "frame_index": 0,
+ "path": "image0.jpg",
+ "original_shape": {
+ "height": 640,
+ "width": 640
+ },
+ "detections": [
+ {
+ "class_id": 61,
+ "class_name": "toilet",
+ "confidence": 0.480474591255188,
+ "bbox": [
+ 0,
+ 29.76239013671875,
+ 637.2445068359375,
+ 629.2056274414062
+ ],
+ "bbox_format": "xyxy",
+ "track_id": null
+ }
+ ]
+ }
+ ],
+ "timing": {
+ "inference_ms": 21.03
+ },
+ "annotated_media": null,
+ "annotated_media_mime_type": null,
+ "annotated_media_filename": null,
+ "warnings": []
+}
+```
+For each image, one object in the 'predictions' array, any if anything detected, the 'detections' array will contain
+a list of detected objects, and if nothing detected, there won't be 'detections' array.
+If any detected object is with class_id=0, an animal is detected.
+
+
DETECTION LOGIC (IMPORTANT):
Run object detection on each image.
-If the model produces AT LEAST ONE detection of an animal class with confidence >= 0.5:
+
+If the model produces AT LEAST ONE detection of an animal class with confidence >= 0.5 and IoU >= 0.7:
→ The image-level prediction is animal.
EVALUATION METRICS:
@@ -59,63 +123,72 @@ Print for each image: filename, ground-truth status, and prediction.
At the end, print a summary report including total images, counts for each metric, and overall detection accuracy.
CODING REQUIREMENTS:
-Store the main path in DATASET_ROOT.
+Store the main path in a global varaible DATASET_ROOT.
+Set global variable for BASEURL and Bearer Auth Token.
+Set global variable for BASE_YOLO_MODEL and FINE_TUNED_YOLO_MODEL, and also a MODEL_TO_USE for easy model switching.
+Set global variable for confidence threashold and IoU threashold.
+Make sure we disable SSL/TLS verification and also disable the related warning.
+Make sure we pass image_name into the yolo inference request.
Use pathlib or os for robust file path matching.
Read only .jpg files.
+For inference of each image file, print the number of the image versus total number of images, the time spent for each inference request versus the total time spent for the entire inference step (in ms), the ground truth and detection result.
Include clear comments explaining each step.
+Output the accuracy in percentage format.
+Don't use any mock or dummy functions. Make sure every line functions.
+It is okay to capture general Exception instead of every single type of Exceptions.
+
+DEFENSIVE PROGRAMMING
+In case of any unexpected conditions, make sure the following:
+1. Make sure we don't do SSL/TLS verification when sending request.
+2. Make sure we avoid zero division
+
+FACTS TO KNOW:
+BASEURL of FLEXSERV inference engine: https://vista.tacc.utexas.edu:60324
+Bearer Auth token for FLEXSERV inference engine: 128374981723089470189234709182734
+FLEXSERV model ID format: FLEX:{PUB|PRI}:author/model[@revision], we only use private model pool, and omit the revision in model ID.
+DATASET_ROOT address: /home/jovyan/ai-tutorial-2026/datasets/AnimalEcology.v4i.yolov11
+BASE_YOLO_MODEL for the request: FLEX:PRI:yolo/yolo26n
+FINE_TUNED_YOLO_MODEL for the request: FLEX:PRI:yolo/yolo26n-fine-tuned
After the code, briefly explain how the program works in plain English.
+ + +Running the code in Jupyter, and you should be able to see the evaluation result similar to below + + -image 1/1 /home/jovyan/work/vista/ai-tutorial-2026/datasets/AnimalEcology.v4i.yolov11/test/images/KPC2__2019-09-19__15-47-42-1-_JPG.rf.608031a2809f0f6714f175d3e5eb7f06.jpg: 640x640 1 animal, 96.6ms -Filename: KPC2__2019-09-19__15-47-42-1-_JPG.rf.608031a2809f0f6714f175d3e5eb7f06.jpg, Ground Truth: no_animal, Prediction: animal - -Speed: 2.5ms preprocess, 96.6ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 640) -image 1/1 /home/jovyan/work/vista/ai-tutorial-2026/datasets/AnimalEcology.v4i.yolov11/test/images/NOR3__2019-07-19__11-40-00-1-_JPG.rf.b85ee30f99a803b09f8c5a7da7f9a508.jpg: 640x640 (no detections), 104.2ms -Speed: 1.9ms preprocess, 104.2ms inference, 0.7ms postprocess per image at shape (1, 3, 640, 640) -Filename: NOR3__2019-07-19__11-40-00-1-_JPG.rf.b85ee30f99a803b09f8c5a7da7f9a508.jpg, Ground Truth: animal, Prediction: no_animal -.... -... -Evaluation Metrics: -Total images processed: 100 -Total animal images (based on label files): 71 -True Positives: 47 -True Negatives: 6 -False Positives: 23 -False Negatives: 24 -Overall detection accuracy: 0.53 --