Edge AI Vision Workshop

Workshop Overview

This workshop puts AI-assisted coding at the center of embedded engineering. Participants use Claude Code as a co-pilot to extend a running Edge AI object-detection pipeline on real NXP hardware — no deep ML expertise required.

The board runs a Python application that:

Captures frames from a USB webcam via OpenCV
Runs TFLite inference on the eIQ® Neutron NPU (2 TOPS) using a pipelined execution
Overlays bounding boxes, inference timing, and shows statistics in a dashboard
Streams the annotated video over WiFi to a browser via Flask MJPEG
Exposes a REST API for live configuration

Participants connect their laptops to the board via WiFi, open the live stream in a browser, and then vibe-code enhancements by describing what they want to Claude Code in natural language.

Learning Objectives

By the end of this workshop, participants will be able to:

Connect to and work on an embedded Linux board remotely via SSH and VS Code
Understand the structure of a prototyped real-time Edge AI inference pipeline
Use AI-assisted coding (Claude Code) to extend an embedded application without starting from scratch
Swap and compare TFLite models running on a hardware NPU
Customize visual overlays, detection logic, and REST API behavior
Stream and visualize live AI inference results from an embedded device

Prerequisites

Participant prerequisites

Laptop with Wi-Fi (for internet + Claude Code)
Basic Python familiarity (can read and modify Python code)
Claude Code installed and authenticated (make install-claude-code API_KEY=...., or manually: npm install -g @anthropic-ai/claude-code)

Board prerequisites (pre-configured by organizers)

FRDM-IMX95 booted into Linux
USB webcam connected (Logitech C922 or equivalent)
tflite_runtime pre-installed in the Yocto BSP

Board WiFi connectivity

Connect to the board through serial interface (via Putty or any other means) and execute the following commands to connect to the local network:

wpa_passphrase "${SSID}" "${SSID_PASSWORD}" > /tmp/wpa.conf
wpa_supplicant -B -i mlan0 -c /tmp/wpa.conf
udhcpc -i mlan0

Check the IP address of the board via:

ip addr

Host environment (pre-workshop)

Models are exported from Ultralytics with full int8 quantization and compiled for the Neutron NPU using the NXP eIQ Toolkit. Download the eIQ Toolkit manually from here and copy the archive to the top folder of the repository. Run on a Linux laptop or WSL terminal:

# 1. Install the NXP eIQ Toolkit (contains neutron-converter)
make install-eiq ARCHIVE=./EIQ-NEUTRON-SDK-3.1.2-LIN.zip

# 2. Install laptop Python dependencies
make install-deps

Board Python environment setup

If the venv is not yet created on the board:

make board-deps BOARD_IP=${BOARD_IP}

Board inference environment setup

To prepare the inference environment run:

make board-deploy-app BOARD_IP=${BOARD_IP}

Models Execution

Compiling

To compile the model and deploy it to the board run the following command:

# Run the full model pipeline copying the model via SSH (board must be reachable)
make model BOARD_IP=${BOARD_IP}

The pipeline:

Exports and quantize yolov8s.pt → yolov8s_full_integer_quant.tflite (fully int8 quantized — input and output tensors are int8, not float32)
Compiles → yolov8s_neutron.tflite using neutron-converter --target imx95
SCPs the compiled model + COCO labels to /opt/models/ on the board

Why full integer quantization? The Neutron NPU requires int8/uint8 tensors end-to-end.

Inference

To start the application run:

make board-start BOARD_IP=${BOARD_IP}

This command will return an IP address, copy paste it to open it into a browser. You should now see the annotated video from your camera.

Optimizations

Switching YOLOv8 Model Variants

The pipeline supports other YOLOv8 variants out of the box. Larger variants detect more accurately but run slower on the NPU.

Variant	Key	Parameters
YOLOv8n	`n`	3.2 M
YOLOv8s	`s`	11.2 M
YOLOv8m	`m`	25.9 M

How to switch

Step 1 — Edit board/config.json on your laptop and change the variant field:

{
  "model": {
    "variant": "s",
    "variants": {
      "n": "yolov8n",
      "s": "yolov8s",
      "m": "yolov8m",
    },
    "models_dir": "/opt/models"
  }
}

Set "variant" to "n", "s" or "m"

Step 2 — Compile and deploy the new model (laptop/WSL terminal).
make model reads the variant automatically from config.json:

make model BOARD_IP=${BOARD_IP}

This will:

Download the selected model weights from Ultralytics
Export to fully int8 quantized TFLite
Compile for the Neutron NPU with neutron-converter
Deploy the compiled model and the updated config.json to the board

Step 3 — Restart the app on the board to load the new model:

make board-start BOARD_IP=${BOARD_IP}

The sidebar in the browser will show the active variant in the model badge (top-right corner) and the per-stage pipeline latencies will update to reflect the new model's timing.

Improving Performance with the Split Pipeline

By default make model deploys the model as a single TFLite file. The inference loop runs all layers sequentially: CPU pre-processing → NPU → CPU post-processing.

make model-split-pipeline splits the compiled model into separate sub-models and runs each in its own thread, overlapping NPU inference with CPU work:

 Thread A (CPU pre)  ──►  Thread B (NPU)  ──►  Thread C (CPU post)
       frame N+1               frame N               frame N-1

While the NPU is crunching frame N, the CPU is already pre-processing frame N+1 and post-processing frame N-1 — keeping all three stages busy in parallel.

How to enable the split pipeline

Run the following after make model:

make model-split-pipeline BOARD_IP=${BOARD_IP}

This will:

Analyze the compiled model to find the NPU/CPU boundary
Split it into pre.tflite (CPU) / npu.tflite (NPU) / post.tflite (CPU)
Generate a pipeline.json manifest describing the three stages
Deploy all sub-models and the updated config.json to the board

Restart the app to activate the new pipeline:

make board-start BOARD_IP=${BOARD_IP}

The Pipeline Stages card in the browser sidebar will show N rows, one per stage, with their individual average latencies, confirming pipelined execution is active.

Note: the split pipeline requires tflite-extractor from the NXP eIQ Toolkit to be available on PATH alongside neutron-converter.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
board		board
host		host
scripts		scripts
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SBOM.spdx.json		SBOM.spdx.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Edge AI Vision Workshop

Table of Contents

Workshop Overview

Learning Objectives

Prerequisites

Participant prerequisites

Board prerequisites (pre-configured by organizers)

Board WiFi connectivity

Host environment (pre-workshop)

Board Python environment setup

Board inference environment setup

Models Execution

Compiling

Inference

Optimizations

Switching YOLOv8 Model Variants

How to switch

Improving Performance with the Split Pipeline

How to enable the split pipeline

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Edge AI Vision Workshop

Table of Contents

Workshop Overview

Learning Objectives

Prerequisites

Participant prerequisites

Board prerequisites (pre-configured by organizers)

Board WiFi connectivity

Host environment (pre-workshop)

Board Python environment setup

Board inference environment setup

Models Execution

Compiling

Inference

Optimizations

Switching YOLOv8 Model Variants

How to switch

Improving Performance with the Split Pipeline

How to enable the split pipeline

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages