A recorded run of a vision-language model driving the Unitree G1 in simulation. Each step shows the head-camera view, the model's reasoning, and the discrete action it chose. Use the slider inside the viewer to scrub through steps.