This project investigates sequential interaction tasks in reinforcement learning (RL) for virtual reality (VR), using biomechanical user models.
The work is built on SIM2VR, a framework for integrating MuJoCo-based biomechanical simulations into Unity VR applications, and extends it to study ordered interaction tasks that better reflect real-world VR UI usage.
This project is implemented on top of SIM2VR, which provides:
- Synchronized Unity–MuJoCo simulation
- Image-based visual observations from a virtual HMD
- Low-latency control of biomechanical user models
- Infrastructure for training and evaluating RL-based simulated users
SIM2VR serves as the simulation and integration backbone of this work. All communication, rendering, and biomechanical control mechanisms follow the SIM2VR and User-in-the-Box design.
Using SIM2VR, we train a biomechanical motion agent that:
- Receives image-based observations rendered from the Unity VR environment
- Outputs muscle activation signals to control upper-limb motion
- Interacts with the same virtual environment as a real VR user
The biomechanical model and perception pipeline follow the SIM2VR / User-in-the-Box framework, ensuring physically plausible movement and realistic sensory input.
This project targets sequential button-press tasks, where the agent must:
- Press multiple UI elements
- Follow a fixed task order
- Maintain performance over extended interaction horizons
This setting more closely mirrors real VR interfaces, where users perform ordered, multi-step actions rather than independent reaches.
Sequential interaction introduces a fundamental learning challenge:
- Multiple buttons are visible simultaneously
- The reward is defined relative to the active target, but this target is not observable
- Image observations provide no explicit notion of task order
This reward–observation mismatch leads to ambiguous credit assignment and unstable learning, despite correct low-level control.
Several approaches were explored within the image-based observation paradigm:
- Curriculum learning with increasing sequence length
- Visual differentiation of buttons
- Embedding additional task information directly into image observations
These approaches did not reliably resolve target ambiguity or learning instability.
To address this, we introduce a minimal, explicit target indicator:
- A small sphere marker is placed at the center of the currently active button
- The agent is trained to reach the marker using the same reward structure
- After a successful press, the marker moves to the next button in the sequence
This provides clear visual disambiguation of task order while preserving:
- Image-based perception
- Continuous muscle-level control
- Compatibility with the SIM2VR framework
Importantly, this solution does not modify the biomechanical model, action space, or reward formulation—only the perceptual clarity of the task.
- This work exposed a limitation in biomechanical user models when handling multiple interaction targets under image-based perception.
@inproceedings{FischerIkkala24,
author = {Fischer, Florian and Ikkala, Aleksi and Klar, Markus and Fleig, Arthur and Bachinski, Miroslav and Murray-Smith, Roderick and H\"{a}m\"{a}l\"{a}inen, Perttu and Oulasvirta, Antti and M\"{u}ller, J\"{o}rg},
title = {SIM2VR: Towards Automated Biomechanical Testing in VR},
year = {2024},
publisher = {Association for Computing Machinery},
booktitle = {Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology},
doi = {10.1145/3654777.3676452}
}

