Hi Zeroth Bot community,
URML (urml.dev) is a small, Apache-2.0 language for describing robot intent: an intent becomes a typed primitive, validated against the robot's declared capabilities and a safety envelope, then dispatched. Zeroth Bot is a low-cost open 3D-printed humanoid built for sim-to-real and RL, and URML is interesting to it both as a validated-intent layer above the control stack and as the place an RL policy declares the envelope it was trained in.
Nothing here asks the project to adopt, host, or maintain anything. This is a request for comment.
(1) Validated intent: the humanoid's kinematic structure and stability limits map onto a URML whole_body declaration; a command is validated against that envelope before dispatch. (2) Learned-policy envelope: a trained policy can carry its observation/action spaces and training-domain bounds as a URML LearnedPolicy declaration, so the validator refuses to dispatch it outside the domain it learned -- the out-of-distribution action caught before it reaches a low-cost humanoid's joints.
Two real questions: (1) does a URML whole_body manifest for the humanoid read right? (2) For the RL / sim-to-real side, is a declared training envelope on a deployed policy useful -- and which is the cleaner first seam?
Full write-up: https://github.com/URML-MARS/URML/blob/main/docs/rfcs/0497-zeroth-bot-outreach.md
Thanks for Zeroth Bot; a low-cost RL humanoid is exactly where bounding a learned policy by its training envelope matters.
Ido Yahalomi (URML, greenvh@gmail.com)
AI-assisted prose, maintainer-reviewed before posting (see https://github.com/URML-MARS/URML/blob/main/VIBE.md). Human-only correspondence available on request.
Hi Zeroth Bot community,
URML (urml.dev) is a small, Apache-2.0 language for describing robot intent: an intent becomes a typed primitive, validated against the robot's declared capabilities and a safety envelope, then dispatched. Zeroth Bot is a low-cost open 3D-printed humanoid built for sim-to-real and RL, and URML is interesting to it both as a validated-intent layer above the control stack and as the place an RL policy declares the envelope it was trained in.
Nothing here asks the project to adopt, host, or maintain anything. This is a request for comment.
(1) Validated intent: the humanoid's kinematic structure and stability limits map onto a URML
whole_bodydeclaration; a command is validated against that envelope before dispatch. (2) Learned-policy envelope: a trained policy can carry its observation/action spaces and training-domain bounds as a URMLLearnedPolicydeclaration, so the validator refuses to dispatch it outside the domain it learned -- the out-of-distribution action caught before it reaches a low-cost humanoid's joints.Two real questions: (1) does a URML
whole_bodymanifest for the humanoid read right? (2) For the RL / sim-to-real side, is a declared training envelope on a deployed policy useful -- and which is the cleaner first seam?Full write-up: https://github.com/URML-MARS/URML/blob/main/docs/rfcs/0497-zeroth-bot-outreach.md
Thanks for Zeroth Bot; a low-cost RL humanoid is exactly where bounding a learned policy by its training envelope matters.
Ido Yahalomi (URML, greenvh@gmail.com)
AI-assisted prose, maintainer-reviewed before posting (see https://github.com/URML-MARS/URML/blob/main/VIBE.md). Human-only correspondence available on request.