The 2024
and 2025
editions built a simulation-based sim-to-real pipeline for tactile manipulation, covering tasks such as
peg insertion, lock opening and visuo-tactile fusion with visuotactile sensors.
However, many real-world scenarios — transparent objects, liquids, granular materials and fragile solids —
remain extremely difficult to simulate and render with sufficient fidelity.
Real-world demonstration data offers a powerful complement to simulation,
enabling policy learning for these challenging domains.
ManiSkill-ViTac Challenge 2026 addresses this gap.
Built on the ViTaMIn-B
bimanual visuo-tactile data collection platform, the challenge provides real-world demonstration
trajectories for contact-rich bimanual tasks. Participants train language-conditioned visuo-tactile
policies directly from these demonstrations and are evaluated on the same physical hardware.
By introducing language guidance, the challenge aims to advance research toward
TVLA (Tactile-Vision-Language-Action) models —
unifying tactile sensing, visual perception and language understanding within a single action policy.
Bridging the Reality Gap
Tasks involve transparent objects, liquids and material fragmentation — scenarios where the reality gap in simulation remains particularly wide, making real-world demonstration data essential.
ViTaMIn-B Bimanual Platform
All demonstration data is collected on ViTaMIn-B, a bimanual visuo-tactile teleoperation system that captures synchronized vision, tactile and proprioceptive streams at scale.
Language-Conditioned TVLA
Each task is paired with natural-language instructions. Policies must ground language into tactile-visual control, pushing the frontier of Tactile-Vision-Language-Action models.