Guide

Robot Data Formats: HDF5, RLDS & LeRobot

Which episode format should you use for imitation learning and VLA training? A practical comparison.

Why the Format Matters

Robot learning datasets contain multi-modal time-series data: joint positions, images, actions, language instructions, and metadata. How you structure and serialize that data determines training speed, portability, and whether your dataset can be reused by the wider community. Three formats dominate the ecosystem today.

Format Overview

Feature	HDF5	RLDS (TFDS)	LeRobot
File format	.hdf5 (binary, hierarchical)	TFRecord shards	Parquet + video files
Episode structure	Groups per episode, datasets per modality	Nested dict per step, episodes as sequences	Flat table with episode index column
Image storage	Embedded as byte arrays or compressed datasets	Embedded in TFRecord as encoded bytes	Separate MP4 video files (one per camera)
Random access	Excellent (native HDF5 chunking)	Poor (sequential TFRecord reads)	Good (Parquet row-group seeking)
Streaming	Requires full download	Supports streaming via TFDS	Supports streaming via Hugging Face Hub
Framework	Framework-agnostic (h5py, PyTables)	TensorFlow-native; usable in JAX	PyTorch-native; Hugging Face ecosystem
Community adoption	robomimic, MimicGen, RoboCasa	Open X-Embodiment, RT-X, Bridge V2	LeRobot, Hugging Face Hub datasets

HDF5

HDF5 is the legacy standard in robotics research. Each episode is a group containing datasets for observations, actions, and rewards. The format supports compression (gzip, lzf) and chunked I/O, making it efficient for random-access training loops. It is framework-agnostic and supported by nearly every programming language.

Best for: Single-lab projects, robomimic-compatible pipelines, and when you need fast random access to individual timesteps.

RLDS (TFDS)

RLDS (Reinforcement Learning Datasets) is a specification built on TensorFlow Datasets. It represents episodes as sequences of steps, each step being a nested dictionary. The Open X-Embodiment dataset—the largest multi-robot dataset to date—uses RLDS, making it the de facto format for large-scale cross-embodiment training.

Best for: Contributing to or training on Open X-Embodiment, using JAX/TensorFlow pipelines, and datasets that need to be streamed from cloud storage.

LeRobot Format

LeRobot stores episodes as Parquet tables (one row per timestep) with images saved as separate MP4 video files. This design optimizes for the Hugging Face Hub: datasets can be streamed, versioned, and previewed in the browser. The LeRobot Python library handles recording, replay, and training in a unified workflow.

Best for: PyTorch-based training, sharing datasets on Hugging Face Hub, projects using SO-100/SO-101/OpenArm hardware, and teams that want an end-to-end record-train-deploy workflow.

Converting Between Formats

Conversion scripts exist for all three pairings:

HDF5 to RLDS: The rlds_dataset_builder tool from Google converts HDF5 episodes into TFDS-compatible RLDS shards.
RLDS to LeRobot: The lerobot CLI includes lerobot convert to import RLDS datasets into Parquet + video format.
HDF5 to LeRobot: Use lerobot convert --from-hdf5 for a direct path, or go through RLDS as an intermediate step.

SVRC Recommendations

For new projects starting from scratch, we recommend the LeRobot format for its ease of use, Hub integration, and growing community. If you need to participate in the Open X-Embodiment ecosystem, export a parallel copy in RLDS. For legacy compatibility with robomimic, keep an HDF5 version.

SVRC's data collection services can deliver datasets in any of these three formats. Our dataset catalog lists publicly available datasets with format metadata, and the SVRC Platform provides tools for browsing, converting, and annotating robot data.