Skip to content

p2o-lab/VisionForge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VisionForge

A Python desktop application that combines industrial computer vision, robot arm control, and real-time data logging into a unified operator interface. Built for the DAAD Project 1 research environment.


What Is VisionForge?

VisionForge is a multi-purpose vision-robotics control station. An operator connects it to a Baumer VAX-50C industrial camera and a UFACTORY xArm5 collaborative robot arm, then uses three specialized computer vision pipelines to perform automated measurement and tracking tasks on physical equipment.

The application streams live camera frames via ZeroMQ, runs deep-learning models for detection and OCR, allows direct robot arm control via gamepad, logs all measurements to an InfluxDB time-series database, and supports third-party plugins through a built-in Plugin SDK.


Hardware Requirements

Component Model Connection
Camera Baumer VAX-50C (BayerRG8, 30 FPS) ZMQ over TCP (port 5555/5556)
Robot arm UFACTORY xArm5 TCP via xArm SDK
Gamepad PS4 / Xbox-compatible USB controller USB HID (via Pygame)
Database InfluxDB 2.x localhost:8086

Software Stack

  • GUI: PySide6 (Qt 6)
  • Computer vision: OpenCV 4
  • Deep learning: PyTorch + Ultralytics (YOLO)
  • Text recognition: PARSeq (custom strhub implementation included)
  • Multi-object tracking: Supervision / ByteTrack
  • Camera streaming: ZeroMQ (PUB-SUB)
  • Robotics: xArm Python SDK
  • Gamepad input: Pygame
  • Time-series storage: InfluxDB client
  • Gesture recognition: MediaPipe (≥ 0.10.14)
  • Industrial standard: MTPPy (VDI/VDE/NAMUR 2658 — OPC UA PEA)

Getting Started

1. Install dependencies

pip install pyside6 opencv-python numpy torch torchvision zmq pygame \
            supervision ultralytics pillow influxdb-client mediapipe

Also install the xArm Python SDK from UFACTORY.

2. Start the camera server

The Baumer camera must be running a ZMQ publisher on ports 5555 (stream) and 5556 (control) before launching VisionForge.

3. Run

python main.py

On startup:

  1. Splash screen loads YOLO + PARSeq weights and scans the plugins/ folder
  2. Connection dialog prompts for camera IP, arm IP, and OPC UA port (default 48050)
  3. Main window opens — five built-in tabs plus a sidebar entry for each loaded plugin
  4. OPC UA server starts — connect UaExpert or a POL to opc.tcp://<host>:48050/

Project Structure

VisionForge/
├── main.py                          # Entry point
├── app/
│   ├── main_window.py               # Tabbed shell + plugin + inspection integration
│   ├── splash.py                    # Startup screen (loads models + scans plugins)
│   ├── connection_dialog.py         # IP + OPC UA port configuration
│   ├── styles/theme.py              # Dark theme stylesheet
│   ├── core/
│   │   ├── camera_client.py         # ZMQ camera subscriber (QThread)
│   │   ├── arm_controller.py        # xArm5 SDK wrapper (QThread)
│   │   ├── gamepad_controller.py    # Gamepad input thread
│   │   └── media_manager.py         # Video/image recording
│   ├── inspection/                  # MTP / VDI VDE NAMUR 2658 integration
│   │   ├── inspection_service.py    # InspectionService base class (MTPPy + Qt)
│   │   ├── inspection_module.py     # InspectionModule — owns OPC UA server
│   │   ├── inspection_tab.py        # Inspection sub-tab widget
│   │   └── service_aware_panel.py   # Wraps TaskPanel in QTabWidget + InspectionTab
│   ├── widgets/                     # Reusable UI components
│   ├── plugin_sdk/                  # Plugin SDK
│   │   ├── base.py                  # TaskPanel + TaskProcessor ABCs
│   │   ├── manifest.py              # PluginManifest + load_manifest()
│   │   ├── loader.py                # PluginLoader singleton
│   │   ├── utils.py                 # save_frame(), resolve_path()
│   │   └── dev_runner.py            # Standalone plugin test runner
│   └── tasks/
│       ├── seven_segment/           # YOLO + ByteTrack + PARSeq pipeline
│       │   └── service.py           # MTP inspection service for this task
│       ├── valve_orientation/       # AprilTag pose estimation pipeline
│       └── module_tracker/          # ArUco 3D localization pipeline
├── plugins/
│   ├── arm_spin/                    # Arm Y-axis oscillation tutorial plugin
│   ├── hand_pointer/                # Gesture-controlled arm movement
│   └── example_difference/          # Frame-difference motion detector (+ service.py)
├── docs/
│   ├── plugin_sdk.md                # Plugin development guide
│   └── mtp_inspection.md            # MTP inspection service guide
└── media/
    ├── images/                      # Captured screenshots
    └── videos/                      # Recorded clips

Model weights (excluded from repo) go here:

app/tasks/seven_segment/weights/yolo/best.pt
app/tasks/seven_segment/weights/parseq/parseq_7seg.ckpt

The Three Vision Pipelines

Seven-Segment Display Reader

Reads numeric values off industrial seven-segment displays.

  • YOLO-OBB locates each display; ByteTrack assigns persistent IDs across frames
  • PARSeq OCR reads the digit string from each cropped display
  • Logs to InfluxDB bucket 7Segments_Measurments at a user-configurable rate (1–10 Hz)

Valve Orientation Detection

Classifies industrial valves as open or closed using AprilTag markers.

  • Attach a small AprilTag sticker to each valve handle
  • Camera estimates the marker's 3D rotation (solvePnP)
  • Operator calibrates each valve once (fully-open + fully-closed reference poses)
  • System classifies state automatically using a 30-frame voting window
  • Supports up to 32 valves simultaneously (marker IDs 0–31)

Configuration saved to: app/tasks/valve_orientation/valve_config.json

Module Tracker (3D Localization)

Builds a live overhead map of large lab modules using 250 mm ArUco markers.

  • Detects ArUco markers (6×6 dictionary), computes 3D pose relative to camera
  • Projects into a lab coordinate frame, renders a top-down map
  • Logs positions to InfluxDB bucket Modules_Tracking

Configuration saved to: app/tasks/module_tracker/module_tracker_config.json


Arm Control (Gamepad)

Input Action
Left stick XY motion in Cartesian space
L1 / R1 Z down / up
L2 / R2 Pitch / Yaw
X button Speed −5%
Y button Speed +5%
START Return to safe position

Speed range: 20–500 mm/s, adjustable via the UI slider.


Architecture

Threading model

All hardware I/O runs in dedicated QThread workers. The Qt signal/slot system keeps the UI thread unblocked. ML models are loaded once at splash screen and globally cached.

Boot sequence

main.py
  └─ SplashScreen
       ├─ ModelLoader (QThread) — loads YOLO + PARSeq
       ├─ get_plugin_loader().scan() — discovers plugins/
       └─ ConnectionDialog (camera IP, arm IP, OPC UA port)
            └─ MainWindow
                 ├─ CameraClient.start()
                 ├─ ArmController.start()
                 ├─ Built-in panels: stack indices 0–4
                 ├─ Plugin panels: stack indices 5+
                 ├─ InspectionModule created (one per platform)
                 │    ├─ Each task/plugin with service.py → service registered
                 │    └─ OPC UA server started at opc.tcp://0.0.0.0:<port>/
                 └─ Sidebar guard wired (ABORT dialog if navigating away from active service)

Camera configuration

The camera server owns the physical acquisition (resolution, FPS, exposure, gain, white balance). VisionForge can adjust these at runtime via the Camera Configuration panel, which sends commands through CameraClient.send_params() over a ZMQ REQ socket back to the server.

Arm modes

  • Mode 0 (position mode)set_position() commands, used by valve scan and go-to-position functions
  • Mode 1 (servo mode)set_servo_cartesian() commands, used for continuous real-time control (gamepad, plugins)

MTP Inspection (VDI/VDE/NAMUR 2658)

VisionForge is a compliant Process Equipment Assembly (PEA). Each vision task and plugin can expose its live results as an OPC UA inspection service that an external Process Orchestration Layer (POL) — or a test client like UaExpert — can start, stop, and subscribe to.

How it works

  • One shared OPC UA server runs at the configurable port (default 48050).
  • Each task/plugin that includes a service.py gets one service registered in that server.
  • The platform enforces a one-active-service constraint — only one service can be in EXECUTE at a time (hardware constraint: one camera).
  • When a POL starts a service remotely, VisionForge auto-switches the panel into view and shows a POL control banner in the Inspection sub-tab.
  • Navigating away from a panel whose service is running triggers a confirmation dialog that sends ABORT before switching.

Adding an inspection service to a plugin

  1. Write service.py subclassing InspectionService (declare parameters + report values).
  2. Add "service_module": "service" and "service_class": "MyClass" to plugin.json.
  3. That's it — VisionForge wires everything at startup.

See docs/mtp_inspection.md for the full API, state machine reference, and UaExpert testing guide.


Plugin SDK

Plugins are folders dropped into plugins/. They are discovered and loaded automatically at startup. A broken plugin is skipped and logged — it never crashes the platform.

Plugin structure

plugins/my_plugin/
├── plugin.json      # manifest
├── panel.py         # UI widget  (subclasses TaskPanel)
└── processor.py     # CV worker  (subclasses TaskProcessor)

plugin.json

{
  "name":           "my_plugin",
  "label":          "My Plugin",
  "icon":           "",
  "version":        "1.0.0",
  "panel_module":   "panel",
  "panel_class":    "MyPanel",
  "service_module": "service",      
  "service_class":  "MyInspectionService"
}

service_module and service_class are optional. If present, the platform loads the service and registers it in the OPC UA server. See docs/mtp_inspection.md.

TaskProcessor

Runs in a background QThread. Receives camera frames through a Queue(maxsize=1) — always the most recent frame, never falls behind.

from VisionForge.app.plugin_sdk import TaskProcessor

class MyProcessor(TaskProcessor):
    def process_frame(self, frame_rgb, meta):
        # frame_rgb: HxWx3 uint8 RGB numpy array
        # emit results — never touch Qt widgets here
        self.display_ready.emit(qimage)
        self.result_ready.emit(my_data)

TaskPanel

UI widget running on the Qt main thread. The base class automatically connects the camera feed to the processor when the panel is shown and disconnects it when hidden.

from VisionForge.app.plugin_sdk import TaskPanel
from processor import MyProcessor

class MyPanel(TaskPanel):
    def on_attach(self):
        # called once at load time — build UI and wire processor
        self._processor = MyProcessor()
        self._processor.display_ready.connect(self._on_display)
        # build the rest of your UI...

    def cleanup(self):
        super().cleanup()   # always call super

Guards: self._cam and self._arm may be None if hardware was not connected at startup. Always check before use.

Loading .pt model weights from a plugin

from VisionForge.app.plugin_sdk import resolve_path

WEIGHTS = resolve_path(__file__, "weights/my_model.pt")
# resolves relative to the plugin folder regardless of working directory

Load in __init__, never in process_frame.

Dev runner

Test a plugin against real hardware without launching the full platform:

python -m VisionForge.app.plugin_sdk.dev_runner plugins/my_plugin \
    --camera 192.168.1.100 --arm 192.168.1.200 --opc-port 48050

Opens up to four windows: the plugin panel (maximized), a hardware status monitor, the camera configuration panel, and — if the plugin declares a service — the Inspection window with full MTP state controls and a live OPC UA server for UaExpert testing. See docs/mtp_inspection.md.

SDK zip

python tools/build_sdk_zip.py
# → dist/visionforge-sdk-1.0.0.zip

The zip mirrors the VisionForge/ package structure so from VisionForge.app.plugin_sdk import TaskPanel resolves identically in the dev environment and in the deployed platform.


Available Plugins

arm_spin

Tutorial plugin. Displays a 90°-rotated camera feed and oscillates the arm ±60 mm in the Y axis at 0.3 Hz using servo mode. Motion runs in a plain threading.Thread at 20 Hz, independent of the camera frame rate.

hand_pointer

Gesture-controlled arm movement using the MediaPipe Tasks API.

Gesture Action
Index finger extended (pointing) Arm moves in the screen direction of the finger
Fist Arm holds position
Anything else No movement

Screen X maps to arm Y, screen Y maps to arm Z. A QTimer at 20 Hz drives commands — movement is time-based, not frame-rate-dependent. The hand_landmarker.task model is downloaded automatically to the plugin folder on first run.

example_difference

Reference plugin. Computes frame-to-frame difference, applies a threshold mask, and reports a motion percentage. Demonstrates the complete TaskPanel + TaskProcessor lifecycle.

About

A Python desktop application that combines industrial computer vision, robot arm control, and real-time data logging into a unified operator interface.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages