Skip to content

Bourn23/wow_bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Embodied Agent for World of Warcraft

A comprehensive AI agent system for World of Warcraft automation that combines computer vision, reinforcement learning, and multimodal AI models to create an intelligent gaming assistant.

🎯 Project Overview

This project implements an embodied AI agent capable of playing World of Warcraft through visual perception, object tracking, and intelligent action selection. The system integrates multiple state-of-the-art AI models including SAM2 for object segmentation, MineCLIP for video-language understanding, and custom reinforcement learning agents.

🏗️ Architecture

Core Components

  1. Vision System

    • Real-time screen capture and processing
    • Object detection and segmentation using FastSAM and SAM2
  2. AI Models

    • MineCLIP: Video-language model for understanding game contexts
    • SAM2: Segment Anything Model 2 for object segmentation and tracking
    • CLIP: Contrastive language-image pre-training for image understanding
    • FastSAM: Fast Segment Anything Model for real-time segmentation
  3. Agent System

    • Based on OpenAI's [Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos] (https://github.com/openai/Video-Pre-Training)
    • Reinforcement learning agents with action prediction
    • Multimodal policy networks combining vision and language
    • Autoregressive action modeling
  4. Data Pipeline

    • Video and input recording system
    • Data preprocessing for model training
    • Frame-action synchronization

📁 Project Structure

├── vision/                          # Computer vision modules
│   ├── vision_agent.py             # Screen capture and window management
│   └── tracking_agent.py           # Object tracking implementation
├── env/                            # Environment interfaces
│   └── wow_env.py                  # World of Warcraft environment wrapper
├── MineCLIP/                       # MineCLIP model implementation
├── SAM2/                           # SAM2 model and training code
├── FastSAM/                        # FastSAM model implementation
├── data_loader.py                  # Data loading and preprocessing
├── bbox_selector.py               # Bounding box selection GUI
├── config.py                      # Configuration management
└── recordings/                     # Recorded gameplay data

🚀 Features

Computer Vision

  • Real-time Screen Capture: Efficient screen recording with window-specific targeting
  • Object Detection: YOLO-based detection for game entities
  • Object Segmentation: SAM2-powered precise object segmentation
  • Multi-object Tracking: Persistent tracking across video frames
  • Bounding Box Selection: Interactive GUI for target selection

AI Models

  • MineCLIP Integration: Video-language understanding for contextual gameplay
  • SAM2 Real-time Tracking: Live object segmentation and tracking
  • CLIP Analysis: Image-text similarity for decision making
  • Custom RL Agents: Trained agents for specific game tasks

Data Management

  • Input Recording: Synchronized recording of video, keyboard, and mouse inputs
  • Data Visualization: Tools for analyzing recorded gameplay data
  • Preprocessing Pipeline: Automated data preparation for model training
  • Autoregressive Data Loading: Temporal sequence processing for RL training

Game Integration

  • WoW Environment: OpenAI Gym-compatible environment wrapper
  • Action Mapping: Translation between AI decisions and game controls
  • Health Monitoring: OCR-based health and status tracking
  • Quest Management: Automated quest detection and completion

🛠️ Installation

Prerequisites

  • Python 3.9+
  • PyTorch with MPS/CUDA support
  • OpenCV
  • macOS (for ScreenCaptureKit integration)

System Dependencies

# Install tesseract for OCR (health bar reading)
brew install tesseract

# Install mmseg for DinoV2 (if using)
pip install mmseg

Setup

# Clone the repository
git clone <repository-url>
cd Embodied_Agent_WoW

# Install dependencies
pip install -r requirements.txt

# Download model checkpoints
cd SAM2
./download_checkpoints.sh

# Setup MineCLIP
cd MineCLIP
pip install -e .

Model Downloads

  • SAM2 Checkpoints: Download from the official SAM2 repository
  • MineCLIP Models: Available variants include attn and avg
  • FastSAM Weights: Automatically downloaded on first use

🎮 Usage

Basic Agent Operation

# Start the main agent
python 15.01.main_agent_w_tracking.py

# Available commands:
# v - Start vision agent
# t - Start tracking agent  
# s - Show tracking visualization
# a - Take automated actions
# q - Quit

MineCLIP-Powered Agent

# Run agent with MineCLIP integration
python 19.wow_agent_with_mineclip.py variant=avg ckpt.path=MineCLIP/CKPT/avg.pth

Data Recording

# Record gameplay data
python 00.10.01.save_video_click.py

# Visualize recorded data
python 23.data_loader_visualizer.py

Model Training

# Train action prediction model
python 21.ft_inverse_dynamics_model.py

# Test trained model
python 28.test_trained_model.py

📊 Models and Performance

Supported Models

Model Purpose Performance Notes
SAM2 Object Segmentation Real-time Multiple size variants
MineCLIP Video-Language Context-aware Pre-trained on gaming data
FastSAM Fast Segmentation 150+ FPS Lightweight alternative
CLIP Image-Text High accuracy Fine-tuned variants

Training Results

The project includes checkpoints for various model training epochs, with performance metrics tracked for:

  • Action prediction accuracy
  • Object tracking precision
  • Segmentation IoU scores
  • Policy reward optimization

🔧 Configuration

Model Configuration

# conf_mineclip.yaml
mineclip:
  variant: avg  # or attn
  resolution: [256, 160]
  device: mps  # or cuda/cpu

Agent Configuration

# config.py
AGENT_RESOLUTION = (224, 224)
MINECLIP_CONFIG = {
    'resolution': [256, 160],
    'fps': 60
}

📈 Components Detail

Vision Pipeline

  1. Screen Capture: Uses macOS ScreenCaptureKit for efficient capture
  2. Object Detection: YOLO-based detection with FastSAM segmentation
  3. Tracking: SAM2-powered persistent object tracking
  4. Action Recognition: MineCLIP-based understanding of game states

Data Processing

  1. Video Processing: Frame extraction and preprocessing
  2. Action Synchronization: Keyboard/mouse event alignment
  3. Sequence Generation: Temporal sequences for RL training
  4. Augmentation: Data augmentation for robust training

Model Integration

  1. Multi-modal Fusion: Combining vision and language models
  2. Real-time Inference: Optimized for live gameplay
  3. Transfer Learning: Pre-trained models adapted for gaming
  4. Ensemble Methods: Multiple models for robust predictions

🧪 Experimental Features

  • Autoregressive Action Modeling: Predicting action sequences
  • Multi-object Tracking: Tracking multiple game entities
  • Quest Automation: Automated quest completion
  • Combat Optimization: AI-driven combat strategies
  • Exploration Strategies: Intelligent map exploration

🤝 Contributing

Contributions are welcome! Areas for improvement:

  • Additional game environment support
  • Model optimization and efficiency improvements
  • New vision algorithms integration
  • Enhanced data collection tools

📄 License

This project is for research and educational purposes. Please ensure compliance with game terms of service.

🔗 Related Work

📞 Contact

For questions and collaboration opportunities, please open an issue or contact the development team.


This project represents cutting-edge research in embodied AI and multimodal learning for gaming applications.

About

Embodied AI agent for World of Warcraft combining computer vision (SAM2, FastSAM), video-language models (MineCLIP), and RL for intelligent game automation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors