What Is RoboMimic?
RoboMimic is an open-source framework for studying and benchmarking offline imitation learning algorithms for robot manipulation. Developed by Ajay Mandlekar and colleagues at Stanford and NVIDIA, it was first released alongside the paper "What Matters in Learning from Offline Human Demonstrations for Robot Manipulation" (CoRL 2021). The project provides three things that did not exist in a unified package before: standardized demonstration datasets collected in robosuite simulation, reference implementations of six imitation learning algorithms, and reproducible evaluation protocols for fair comparison.
The six supported algorithms span the major families of offline learning:
- Behavioral Cloning (BC) — supervised regression from observations to actions. The simplest baseline and often surprisingly competitive.
- BC-RNN — BC with an LSTM backbone that conditions on observation history. Captures temporal dependencies that feedforward BC misses.
- BCQ (Batch Constrained Q-Learning) — an offline RL method that constrains the policy to stay close to the demonstration distribution while optimizing Q-values.
- CQL (Conservative Q-Learning) — offline RL that penalizes Q-values for out-of-distribution actions, preventing overestimation.
- IQL (Implicit Q-Learning) — avoids querying out-of-distribution actions entirely by learning a value function through expectile regression.
- TD3-BC — TD3 with a behavioral cloning regularization term that keeps the policy near the data while allowing RL improvement.
RoboMimic matters because it answers a practical question: given a fixed dataset of human demonstrations, which algorithm extracts the best policy? Before RoboMimic, every paper used different environments, different data, and different evaluation — making apples-to-apples comparison impossible. The framework also supports real robot data in the same HDF5 format, so findings transfer from simulation benchmarks to hardware deployment.
Installation
RoboMimic requires Python 3.8+ and depends on robosuite for simulation environments. The recommended approach is a dedicated conda environment to isolate dependencies.
# Create and activate conda environment
conda create -n robomimic python=3.10
conda activate robomimic
# Install robomimic and robosuite
pip install robomimic
pip install robosuite
# For development (editable install with source)
git clone https://github.com/ARISE-Initiative/robomimic.git
cd robomimic
pip install -e .
# Verify installation
python -c "import robomimic; print(robomimic.__version__)"
For GPU-accelerated training (required for image-based policies), install PyTorch with CUDA support first:
# PyTorch with CUDA 12.1 (adjust for your CUDA version)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# Then install robomimic
pip install robomimic
Verify your GPU is visible to PyTorch before proceeding:
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}, Device: {torch.cuda.get_device_name(0)}')"
Datasets
RoboMimic provides pre-collected demonstration datasets for four robosuite manipulation tasks. Each dataset comes in two operator quality levels (proficient-human and multi-human) and two observation types (low-dimensional state and image).
Download datasets using the built-in script:
# Download the Lift task dataset (low-dim, proficient-human, 200 demos)
python robomimic/scripts/download_datasets.py --tasks lift
# Download all four tasks
python robomimic/scripts/download_datasets.py --tasks all
# Download image observation datasets (larger, ~10 GB per task)
python robomimic/scripts/download_datasets.py --tasks lift --dataset_type image
Each dataset is stored as an HDF5 file with the following structure:
- data/demo_0/obs/ — observation arrays (joint positions, end-effector poses, gripper states, optional images)
- data/demo_0/actions — action arrays (joint velocity or position targets)
- data/demo_0/rewards — sparse reward signal (1 on success, 0 otherwise)
- data/demo_0/dones — episode termination flags
- data/mask/ — train/validation split indices
Low-dimensional observations include robot joint positions (7D), joint velocities (7D), end-effector position (3D), end-effector orientation (4D quaternion), and gripper aperture (1D). For bimanual tasks like Transport, all arrays are doubled (one per arm). Image observations add 84x84 or 128x128 RGB arrays from agentview and robot0_eye_in_hand cameras.
Training Your First Policy
The fastest path to a trained policy is BC on the Lift task with low-dimensional observations. This trains in under 30 minutes on a modern CPU and should reach 95%+ success rate.
Step 1: Generate a training config.
# Generate a default BC config for the Lift task
python robomimic/scripts/generate_config.py \
--algo bc \
--task lift \
--dataset_path datasets/lift/ph/low_dim.hdf5 \
--output_dir configs/
Step 2: Review and modify the config. The generated JSON config controls every training parameter. Key fields to understand:
{
"algo_name": "bc",
"experiment": {
"name": "bc_lift_lowdim",
"epoch_every_n_steps": 500,
"validation_epoch_every_n_steps": 50,
"save.enabled": true
},
"train": {
"data": "datasets/lift/ph/low_dim.hdf5",
"batch_size": 100,
"num_epochs": 2000,
"seed": 1
},
"observation": {
"modalities": {
"obs": {
"low_dim": ["robot0_eef_pos", "robot0_eef_quat", "robot0_gripper_qpos", "object"]
}
}
}
}
Step 3: Run training.
python robomimic/scripts/train.py --config configs/bc_lift_lowdim.json
# Training logs to stdout and TensorBoard
# Monitor: tensorboard --logdir trained_models/
Step 4: Evaluate the trained policy.
# Run 50 evaluation rollouts
python robomimic/scripts/run_trained_agent.py \
--agent trained_models/bc_lift_lowdim/models/model_best.pth \
--n_rollouts 50 \
--render
You should see success rates above 95% on Lift with proficient-human data. If your success rate is below 80%, check that you are using the correct dataset path and that training ran for the full 2000 epochs.
Algorithm Comparison
Choosing the right algorithm depends on your data budget, compute budget, and task complexity. This table summarizes the practical tradeoffs:
| Algorithm | Type | Data Efficiency | Compute Cost | Best For | Key Limitation |
|---|---|---|---|---|---|
| BC | Supervised | High | Low | Simple tasks, good data | Compounding errors on long horizons |
| BC-RNN | Supervised | High | Medium | Multi-step tasks, temporal reasoning | Slower inference, harder to tune |
| BCQ | Offline RL | Medium | High | Suboptimal data, needs improvement | Sensitive to hyperparameters |
| CQL | Offline RL | Medium | High | Conservative policies from noisy data | Can be overly conservative |
| IQL | Offline RL | Medium | Medium | Mixed-quality datasets | Requires careful expectile tuning |
| TD3-BC | Offline RL | Medium | High | Refining near-optimal demonstrations | Unstable with small datasets |
General guidance: Start with BC. If BC fails due to long-horizon compounding error, try BC-RNN. If your data is mixed quality (some demonstrations are suboptimal), try IQL. Only use BCQ/CQL/TD3-BC if you have a specific reason to believe offline RL improvement over the data distribution is both necessary and achievable. In practice, BC-RNN is the most reliable algorithm across the RoboMimic benchmark tasks.
Running the Benchmarks
RoboMimic defines four manipulation tasks of increasing difficulty, all implemented in robosuite:
- Lift — pick up a cube from the table. Single arm, 1 object. The easiest task, useful for verifying your training pipeline works. BC success: 95-100%.
- Can — pick up a can and place it in a target bin. Requires accurate grasping and placement. BC success: 80-95%. BC-RNN success: 90-98%.
- Square — pick up a square nut and place it on a peg. Requires precise alignment and insertion. BC success: 40-70%. BC-RNN success: 60-85%.
- Transport — bimanual task: one arm picks an object from a bin, hands it to the other arm, which places it in a target location. The hardest task, requiring coordination between two arms. BC-RNN success: 30-60%.
To run the full benchmark:
# Generate configs for all tasks and algorithms
python robomimic/scripts/generate_paper_configs.py --output_dir benchmark_configs/
# Run all experiments (use a cluster or run overnight)
python robomimic/scripts/train.py --config benchmark_configs/lift/bc.json
python robomimic/scripts/train.py --config benchmark_configs/can/bc.json
python robomimic/scripts/train.py --config benchmark_configs/square/bc_rnn.json
python robomimic/scripts/train.py --config benchmark_configs/transport/bc_rnn.json
# Evaluate all trained models
python robomimic/scripts/run_trained_agent.py \
--agent trained_models/lift/bc/models/model_best.pth \
--n_rollouts 100
Running the full benchmark across all algorithms and tasks takes approximately 3-5 days on a single RTX 3090. For image-based experiments, expect 2-3x longer training times.
Using Image Observations
Image-based policy training is where RoboMimic becomes most valuable for real-world transfer. Real robots do not have access to ground-truth object positions — they have cameras. Training visual policies in simulation and transferring to real hardware is the primary pipeline for imitation learning deployment.
To train a visual BC policy, change the observation modality in your config:
{
"observation": {
"modalities": {
"obs": {
"low_dim": ["robot0_eef_pos", "robot0_gripper_qpos"],
"rgb": ["agentview_image", "robot0_eye_in_hand_image"]
}
},
"encoder": {
"rgb": {
"core_class": "VisualCore",
"core_kwargs": {
"backbone_class": "ResNet18Conv",
"pool_class": "SpatialSoftmax"
}
}
}
}
}
For better visual representations, use a pre-trained encoder like R3M (from Nair et al., 2022) instead of training from scratch:
# Install R3M
pip install r3m
# In your config, set the encoder to use R3M
"backbone_class": "R3MConv",
"backbone_kwargs": {"r3m_model_class": "resnet18"}
GPU requirements: Image-based training with ResNet18 requires at least 8 GB VRAM. With R3M and batch size 16, expect 10-12 GB usage. An RTX 3080 (10 GB) or RTX 4090 (24 GB) is recommended. Training time increases from ~30 minutes (low-dim) to 4-8 hours (image-based) on a single GPU.
Collecting Your Own Data
RoboMimic accepts any HDF5 dataset that follows its format, which means you can collect demonstrations in robosuite or on real hardware and train with the same algorithms.
Collecting in robosuite via teleoperation:
# Launch robosuite teleoperation data collection
python robosuite/scripts/collect_human_demonstrations.py \
--environment Lift \
--robots Panda \
--controller OSC_POSE \
--device keyboard
Supported input devices include keyboard, SpaceMouse (3Dconnexion), and VR controllers via the device bridge. For large-scale collection, SpaceMouse provides the best tradeoff between data quality and operator fatigue.
Converting custom data to RoboMimic format: Your data must be organized as episodes in an HDF5 file. Each episode needs: observations (dict of arrays), actions (array), rewards (array), and dones (array). Use the conversion utility:
import h5py
import numpy as np
with h5py.File("my_dataset.hdf5", "w") as f:
grp = f.create_group("data")
for i, episode in enumerate(episodes):
demo = grp.create_group(f"demo_{i}")
obs = demo.create_group("obs")
obs.create_dataset("robot0_eef_pos", data=episode["eef_positions"])
obs.create_dataset("robot0_gripper_qpos", data=episode["gripper_states"])
demo.create_dataset("actions", data=episode["actions"])
demo.create_dataset("rewards", data=episode["rewards"])
demo.create_dataset("dones", data=episode["dones"])
# Create train/val split
mask = grp.create_group("mask")
n = len(episodes)
mask.create_dataset("train", data=np.arange(int(n * 0.9)))
mask.create_dataset("valid", data=np.arange(int(n * 0.9), n))
Transferring to Real Hardware
Sim-to-real transfer with RoboMimic-trained policies requires attention to three critical gaps:
- Observation gap: Real camera images differ from simulation renders. Use domain randomization in robosuite (texture, lighting, camera pose) during training, or fine-tune with a small number of real demonstrations (10-50 episodes often suffice).
- Action space gap: Ensure your real robot controller accepts the same action type (joint velocity vs. OSC position) at the same control frequency (typically 20 Hz for RoboMimic) as the training environment. Mismatched action spaces are the number one cause of failed transfers.
- Dynamics gap: Object friction, robot joint dynamics, and gripper behavior differ between simulation and reality. The practical solution is to collect a small dataset of real demonstrations and either fine-tune the sim-trained policy or train directly on real data.
For deployment on a real Franka Panda (the default RoboMimic robot), use the deoxys controller interface, which provides the same OSC_POSE action space as robosuite. For other robot arms, you will need to implement an action wrapper that maps RoboMimic actions to your robot's controller.
Common Errors and Fixes
- "KeyError: 'robot0_eef_pos'" during training — your dataset does not contain the observation keys expected by the config. Run
python robomimic/scripts/get_dataset_info.py --dataset path/to/data.hdf5to see available keys, then update your config to match. - Training loss drops but evaluation success is 0% — the policy is overfitting to the training data. Reduce batch size, add dropout (0.1-0.3), or increase dataset size. Also verify that your evaluation environment uses the same observation normalization as training.
- "CUDA out of memory" during image training — reduce batch size (start with 8 or 16 for image policies) or use gradient accumulation. If using R3M, freeze the encoder backbone to reduce memory by ~40%.
- BC-RNN training is unstable (loss spikes) — reduce learning rate to 1e-4, use gradient clipping (max_norm=1.0), and increase sequence length gradually. Start with seq_length=10 and increase to 20 once training stabilizes.
- "MuJoCo not found" error on import — since MuJoCo went open-source (v2.1.2+), install via
pip install mujoco. Remove any old mujoco-py installations that conflict:pip uninstall mujoco-py. - Trained policy works in sim but fails on real robot — check action space mismatch (joint velocity vs. OSC position), control frequency mismatch, and observation normalization. See the sim-to-real section above.
Integration with SVRC Hardware
SVRC provides pre-configured robot setups that are compatible with RoboMimic workflows, eliminating the hardware integration burden:
- OpenArm — SVRC's open-source 6-DOF robot arm. We provide a RoboMimic-compatible action wrapper that maps OSC_POSE actions to OpenArm's joint controller at 20 Hz. Dataset collection uses the same HDF5 format, so you can train directly with RoboMimic algorithms without format conversion.
- Franka Panda — the default RoboMimic robot. SVRC operates Franka systems with deoxys controllers pre-configured for RoboMimic-compatible data collection and policy deployment.
- Bimanual setups — for the Transport task on real hardware, SVRC provides dual-arm ALOHA-style systems with synchronized data collection at 50 Hz. The data format matches RoboMimic's bimanual structure directly.
All SVRC-collected datasets are validated against the RoboMimic HDF5 schema before delivery, ensuring they load directly into the training pipeline without modification.
Frequently Asked Questions
- What is RoboMimic used for? RoboMimic is a framework for benchmarking and developing offline imitation learning algorithms for robot manipulation. It provides standardized datasets, training pipelines, and evaluation protocols so researchers can fairly compare algorithms like BC, BC-RNN, BCQ, and IQL on the same tasks.
- What algorithms does RoboMimic support? Six algorithms: BC, BC-RNN, BCQ, CQL, IQL, and TD3-BC. Users can also add custom algorithms via the modular config system.
- Do I need a GPU to train RoboMimic policies? For low-dimensional observation policies, a modern CPU works but is slow. For image-based policies, an NVIDIA GPU with at least 8 GB VRAM is required. An RTX 3080 or better is recommended.
- Can I use RoboMimic with real robot data? Yes. RoboMimic accepts any HDF5 dataset that follows its format specification. Collect demonstrations on a real robot, convert them to the RoboMimic HDF5 format, and train with the same algorithms.
- How does RoboMimic compare to LeRobot? RoboMimic focuses on offline imitation learning benchmarks with robosuite. LeRobot (Hugging Face) is broader, covering data collection, multiple simulation backends, and real robot deployment. Many researchers use RoboMimic algorithms within LeRobot pipelines.
- What success rates should I expect on the benchmark tasks? Lift: 95-100% (BC). Can: 80-95% (BC). Square: 60-85% (BC-RNN). Transport: 30-60% (BC-RNN). These vary with data quality and hyperparameters.