REAL enables a quadrupedal robot to chain highly dynamic parkour maneuvers across complex terrains
with nominal vision (green box), and maintain stable locomotion even under severe visual degradation (red box).
- ๐ฐ News
- โจ Highlights
- ๐๏ธ Architecture
- ๐ Results
- ๐ Real-World Deployment
- ๐ฌ Ablation Study
- โ๏ธ Training Details
- ๐ TODO
- ๐ Citation
- ๐ Acknowledgements
| Date | Update |
|---|---|
| ๐ฅ 2026/03 | Paper submitted. Under review. |
| ๐ 2026/03 | Repository created. Code will be released upon acceptance. |
|
๐ง Spatio-Temporal Policy Learning A privileged teacher learns structured proprioceptionโterrain associations via cross-modal attention. The distilled student uses a FiLM-modulated Mamba backbone to suppress visual noise and build short-term terrain memory. |
โ๏ธ Physics-Guided Filtering An uncertainty-aware neural velocity estimator is fused with rigid-body dynamics through an Extended Kalman Filter (EKF), ensuring physically consistent state estimation during impacts and slippage. |
|
๐ฏ Consistency-Aware Loss Gating Adaptive gating between behavioral cloning and RL stabilizes policy distillation and improves sim-to-real transfer, preventing policy collapse under aggressive domain randomization. |
โก Real-Time Onboard Deployment Bounded O(1) inference at ~13.1 ms/step on a Unitree Go2 with zero-shot sim-to-real transfer โ no fine-tuning required on the real robot. |
๐ Stage 1 โ Privileged Teacher Policy Learning: The teacher policy learns precise proprioceptionโterrain associations through cross-modal attention. Proprioceptive states serve as Queries to selectively retrieve relevant terrain features encoded as Keys and Values from terrain scan dots.
๐ Stage 2 โ Distilling Student Policy with Spatio-Temporal Reasoning: The deployable student integrates FiLM-based visualโproprioceptive fusion with a Mamba temporal backbone. A physics-guided Bayesian estimator and consistency-aware loss gating further stabilize training and deployment.
REAL achieves 2x the overall success rate of the best prior baseline across hurdles, steps, and gaps:
| Method | Hurdles SR โ | Steps SR โ | Gaps SR โ | Overall SR โ | Overall MXD โ | MEV โ |
|---|---|---|---|---|---|---|
| Extreme Parkour | 0.18 | 0.14 | 0.10 | 0.16 | 0.21 | 34.24 |
| RPL | 0.05 | 0.04 | 0.03 | 0.04 | 0.10 | 1.56 |
| SoloParkour | 0.42 | 0.49 | 0.36 | 0.39 | 0.34 | 96.93 |
| REAL (Ours) ๐ | 0.82 | 0.94 | 0.28 | 0.78 | 0.45 | 18.41 |
๐ SR: Success Rate โ how often the robot reaches all target goals. MXD: Mean X-Displacement โ [0, 1], showing normalized forward progress. MEV: Mean Edge Violations โ average number of unsafe foot-edge contacts per episode.
We evaluate policy robustness under three simulated sensor degradation conditions: frame drops, Gaussian noise, and spatial FoV occlusion.
| Method | Nominal SR | Frame Drop SR | Gaussian Noise SR | FoV Occlusion SR |
|---|---|---|---|---|
| Extreme Parkour | 0.16 | 0.16 (โ0.00) | 0.11 (โ0.05) | 0.13 (โ0.03) |
| RPL | 0.04 | 0.01 (โ0.04) | 0.01 (โ0.03) | 0.01 (โ0.03) |
| SoloParkour | 0.39 | 0.20 (โ0.19) | 0.37 (โ0.03) | 0.41 (โ0.02) |
| REAL (Ours) ๐ | 0.78 | 0.61 (โ0.17) | 0.51 (โ0.27) | 0.72 (โ0.06) |
๐ก Under severe FoV occlusion, REAL retains 92% of its nominal performance (0.72 vs 0.78), while vision-reliant baselines suffer catastrophic failures.
Vision is completely masked 1 meter before each obstacle, forcing the policy to rely on spatio-temporal memory:
| Method | SR โ | MXD โ | MEV โ |
|---|---|---|---|
| Extreme Parkour | 0.11 | 0.20 | 44.03 |
| SoloParkour | 0.36 | 0.34 | 103.50 |
| REAL (Ours) ๐ | 0.55 | 0.39 | 24.84 |
| โ Baseline | โ REAL (Ours) |
|---|---|
![]() |
![]() |
| Fails immediately upon losing visual input | Maintains robust blind traversal across obstacles |
Zero-shot sim-to-real transfer on a physical Unitree Go2 quadruped using only onboard perception and computing:
| Scenario | Description | |
|---|---|---|
| ๐ฆ (a) | High Platform Leap | The robot dynamically jumps onto an elevated surface |
| ๐ฆ (b) | Scattered Box Navigation | Traversing irregularly placed obstacles |
| ๐ช (c) | Steep Staircase Climb | Ascending a steep staircase with precise foot placement |
| Backbone | Avg. Latency | Meets 20 ms Budget? |
|---|---|---|
| Transformer | 23.07 ms | โ No |
| Mamba (Ours) | 13.14 ms | โ Yes |
โก Mamba's bounded O(1) complexity eliminates the sequence-scaling bottleneck of Transformers, enabling the high-frequency reactivity required for aggressive parkour.
| Variant | SR โ | MXD โ | MEV โ | Time โ | Coll. โ |
|---|---|---|---|---|---|
| REAL (Full) ๐ | 0.78 | 0.45 | 18.41 | 0.02 | 0.06 |
| w/ MLP Estimator | 0.73 | 0.43 | 19.34 | 0.02 | 0.06 |
| w/o FiLM | 0.44 | 0.51 | 93.43 | 0.28 | 0.06 |
| w/o Mamba | 0.51 | 0.47 | 89.96 | 0.26 | 0.05 |
๐ Removing Mamba causes MEV to increase nearly 5x (18โ90). Disabling FiLM drops SR by 44%. Both are critical for robust spatio-temporal reasoning.
| Estimator | RMSE โ |
|---|---|
| MLP (Baseline) | 0.52 |
| MLP + EKF | 0.40 |
| 1D ResNet (Single frame) | 0.33 |
| 1D ResNet (10 frames) | 0.28 |
| 1D ResNet + EKF (10 frames, Ours) ๐ | 0.23 |
๐ก Our consistency-aware loss gating accelerates early-stage convergence and achieves a lower final training loss compared to a fixed-weight baseline.
| Item | Detail |
|---|---|
| ๐ฅ๏ธ Simulator | Isaac Gym |
| ๐ Robot Platform | Unitree Go2 |
| ๐ Control Frequency | 50 Hz (policy) / 1 kHz (PD controller) |
| ๐ฎ Training Hardware | Single NVIDIA RTX 4080 GPU |
| โณ Training Time | ~30 hours (from scratch) |
| ๐ท Depth Camera | Intel RealSense D435i |
| ๐ป Onboard Compute | NVIDIA Jetson |
| ๐ Deployment | Custom C++ + ONNX Runtime |
| ๐ฏ Reward Formulation | Same as Extreme Parkour |
We plan to release the full codebase upon paper acceptance. The following items are on our roadmap:
๐๏ธ Training
- Privileged teacher policy training code (Stage 1)
- Student distillation training code (Stage 2)
- Consistency-aware loss gating implementation
- Isaac Gym terrain environment and curriculum configs
- Domain randomization parameters and reward formulation
๐งฎ Models & Estimation
- Physics-guided filtering (EKF) module
- Uncertainty-aware velocity estimator (1D ResNet)
- Pre-trained model checkpoints (teacher & student)
If you find this work useful, please consider citing:
@article{real2026,
title = {REAL: Robust Extreme Agility via Spatio-Temporal
Policy Learning and Physics-Guided Filtering},
author = {Jialong Liu, Dehan Shen, Yanbo Wen,
Zeyu Jiang and Changhao Chen},
year = {2026}
}This work builds upon the simulation infrastructure of Isaac Gym and the terrain setup from Extreme Parkour. We thank the authors for their open-source contributions.
This project will be released under the MIT License.








