![]()
One of the first commercial capture systems built to give humanoid robots the 3D signals they use at inference: depth, camera pose, and full-body tracking.
DUBAI, UNITED ARAB EMIRATES, July 3, 2026 /EINPresswire.com/ — Unidata, a data infrastructure company focused on embodied AI, has launched a multimodal egocentric capture system for humanoid-robot training datasets.
In one session it records synchronized stereo video, full-body skeletal motion, and continuous camera pose (the spatial and proprioceptive signals robots use at inference), plus per-frame depth maps from the stereo footage.
The Problem With Existing Egocentric Datasets
Egocentric datasets used in robotics pretraining are already large: Build AI’s Egocentric-1M (April 2026) reached one million hours of factory footage, the largest ever assembled. But whether captured on consumer GoPros and smartphones or purpose-built rigs, almost all of this footage shares the same limits: 2D RGB video, no depth, no camera-pose metadata.
Without intrinsic and extrinsic parameters, the recordings can’t be reliably adapted across perception stacks or reconstructed in 3D, usable for pretraining and little else.
Unidata answers that gap by building depth estimation, spatial tracking, and pose logging into the capture hardware.
System Architecture
The primary setup centers on the Pico 4 Ultra VR headset, which is both stereo capture device and full-body motion-tracking hub. Its stereo cameras mirror human binocular vision (two lenses at a fixed baseline), enabling disparity-based depth estimation and per-frame depth maps alongside RGB video. Intrinsic and extrinsic parameters are logged every frame, feeding 3D reconstruction pipelines without post-processing.
Trackers on the wrists, ankles, and waist are converted in real time into a full skeletal model of the actor, pairing egocentric video with timestamped pose for training locomotion and manipulation. Two wrist-mounted cameras add close-range views of hand-object interactions.
Hand Tracking and Tactile Data
The headset’s onboard module logs palm and finger coordinates via computer vision. Like all vision-based tracking, it degrades when objects occlude the hands, a limitation Unidata is addressing by testing instrumented gloves.
A deeper gap is tactile feedback. Visual data can’t capture force distribution across a gripper, surface texture, or an object’s mass asymmetry (a tool may look uniform yet weigh more on one side). Robotic manipulators sense this directly in deployment; human-demonstration datasets rarely capture it, so Unidata has made tactile instrumentation a priority for its next phase.
Portability and Scale
The Pico 4 Ultra setup runs fully autonomously in the field (no tether, no external power), practical across varied environments. An alternative pairs a head-mounted ZED stereo camera with three wide-angle cameras (two wrists, one headset) for higher-fidelity depth, extra tracking, and frame-perfect hardware sync; the trade-off is mobility, since it needs a wall connection or an on-body battery. Unidata runs both but treats the headset as its main scaling path: low overhead, full autonomy, and depth, pose, and skeletal data with no post-processing.
About Unidata
Unidata is a UAE-based data collection and labeling company with 9+ years of expertise. The company delivers end-to-end services — collection, annotation, delivery — with SLA-backed timelines and dedicated project managers, and works with a global roster of enterprise clients.
Eugenia Trofimova
Unidata
e.trofimova@unidata.pro
Visit us on social media:
LinkedIn
Legal Disclaimer:
EIN Presswire provides this news content “as is” without warranty of any kind. We do not accept any responsibility or liability
for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this
article. If you have any complaints or copyright issues related to this article, kindly contact the author above.
![]()
Media gallery

