This workshop puts AI-assisted coding at the center of embedded engineering. Participants use Claude Code as a co-pilot to extend a running Edge AI object-detection pipeline on real NXP hardware — no deep ML expertise required.
The board runs a Python application that:
- Captures frames from a USB webcam via OpenCV
- Runs TFLite inference on the eIQ® Neutron NPU (2 TOPS) using a pipelined execution
- Overlays bounding boxes, inference timing, and shows statistics in a dashboard
- Streams the annotated video over WiFi to a browser via Flask MJPEG
- Exposes a REST API for live configuration
Participants connect their laptops to the board via WiFi, open the live stream in a browser, and then vibe-code enhancements by describing what they want to Claude Code in natural language.
By the end of this workshop, participants will be able to:
- Connect to and work on an embedded Linux board remotely via SSH and VS Code
- Understand the structure of a prototyped real-time Edge AI inference pipeline
- Use AI-assisted coding (Claude Code) to extend an embedded application without starting from scratch
- Swap and compare TFLite models running on a hardware NPU
- Customize visual overlays, detection logic, and REST API behavior
- Stream and visualize live AI inference results from an embedded device
- Laptop with Wi-Fi (for internet + Claude Code)
- Basic Python familiarity (can read and modify Python code)
- Claude Code installed and authenticated (
make install-claude-code API_KEY=...., or manually:npm install -g @anthropic-ai/claude-code)
- FRDM-IMX95 booted into Linux
- USB webcam connected (Logitech C922 or equivalent)
tflite_runtimepre-installed in the Yocto BSP
Connect to the board through serial interface (via Putty or any other means) and execute the following commands to connect to the local network:
wpa_passphrase "${SSID}" "${SSID_PASSWORD}" > /tmp/wpa.conf
wpa_supplicant -B -i mlan0 -c /tmp/wpa.conf
udhcpc -i mlan0
Check the IP address of the board via:
ip addr
Models are exported from Ultralytics with full int8 quantization and compiled for the Neutron NPU using the NXP eIQ Toolkit. Download the eIQ Toolkit manually from here and copy the archive to the top folder of the repository. Run on a Linux laptop or WSL terminal:
# 1. Install the NXP eIQ Toolkit (contains neutron-converter)
make install-eiq ARCHIVE=./EIQ-NEUTRON-SDK-3.1.2-LIN.zip
# 2. Install laptop Python dependencies
make install-depsIf the venv is not yet created on the board:
make board-deps BOARD_IP=${BOARD_IP}To prepare the inference environment run:
make board-deploy-app BOARD_IP=${BOARD_IP}
To compile the model and deploy it to the board run the following command:
# Run the full model pipeline copying the model via SSH (board must be reachable)
make model BOARD_IP=${BOARD_IP}The pipeline:
- Exports and quantize
yolov8s.pt→yolov8s_full_integer_quant.tflite(fully int8 quantized — input and output tensors are int8, not float32) - Compiles →
yolov8s_neutron.tfliteusingneutron-converter --target imx95 - SCPs the compiled model + COCO labels to
/opt/models/on the board
Why full integer quantization? The Neutron NPU requires int8/uint8 tensors end-to-end.
To start the application run:
make board-start BOARD_IP=${BOARD_IP}
This command will return an IP address, copy paste it to open it into a browser. You should now see the annotated video from your camera.
The pipeline supports other YOLOv8 variants out of the box. Larger variants detect more accurately but run slower on the NPU.
| Variant | Key | Parameters |
|---|---|---|
| YOLOv8n | n |
3.2 M |
| YOLOv8s | s |
11.2 M |
| YOLOv8m | m |
25.9 M |
Step 1 — Edit board/config.json on your laptop and change the variant field:
{
"model": {
"variant": "s",
"variants": {
"n": "yolov8n",
"s": "yolov8s",
"m": "yolov8m",
},
"models_dir": "/opt/models"
}
}Set "variant" to "n", "s" or "m"
Step 2 — Compile and deploy the new model (laptop/WSL terminal).
make model reads the variant automatically from config.json:
make model BOARD_IP=${BOARD_IP}This will:
- Download the selected model weights from Ultralytics
- Export to fully int8 quantized TFLite
- Compile for the Neutron NPU with
neutron-converter - Deploy the compiled model and the updated
config.jsonto the board
Step 3 — Restart the app on the board to load the new model:
make board-start BOARD_IP=${BOARD_IP}The sidebar in the browser will show the active variant in the model badge (top-right corner) and the per-stage pipeline latencies will update to reflect the new model's timing.
By default make model deploys the model as a single TFLite file. The inference
loop runs all layers sequentially: CPU pre-processing → NPU → CPU post-processing.
make model-split-pipeline splits the compiled model into separate sub-models
and runs each in its own thread, overlapping NPU inference with CPU work:
Thread A (CPU pre) ──► Thread B (NPU) ──► Thread C (CPU post)
frame N+1 frame N frame N-1
While the NPU is crunching frame N, the CPU is already pre-processing frame N+1 and post-processing frame N-1 — keeping all three stages busy in parallel.
Run the following after make model:
make model-split-pipeline BOARD_IP=${BOARD_IP}This will:
- Analyze the compiled model to find the NPU/CPU boundary
- Split it into
pre.tflite(CPU) /npu.tflite(NPU) /post.tflite(CPU) - Generate a
pipeline.jsonmanifest describing the three stages - Deploy all sub-models and the updated
config.jsonto the board
Restart the app to activate the new pipeline:
make board-start BOARD_IP=${BOARD_IP}The Pipeline Stages card in the browser sidebar will show N rows, one per stage, with their individual average latencies, confirming pipelined execution is active.
Note: the split pipeline requires
tflite-extractorfrom the NXP eIQ Toolkit to be available onPATHalongsideneutron-converter.