REAL enables a quadrupedal robot to chain highly dynamic parkour maneuvers across complex terrains
with nominal vision (green box), and maintain stable locomotion even under severe visual degradation (red box).
- 📰 News
- ✨ Highlights
- 🏗️ Architecture
- 📊 Results
- 🌍 Real-World Deployment
- 🔬 Ablation Study
- ⚙️ Training Details
- 📝 TODO
- 🔖 Citation
- 🙏 Acknowledgements
| Date | Update |
|---|---|
| 🔥 2026/03 | Paper submitted. Under review. |
| 🎉 2026/03 | Repository created. Code will be released upon acceptance. |
|
🧠 Spatio-Temporal Policy Learning A privileged teacher learns structured proprioception–terrain associations via cross-modal attention. The distilled student uses a FiLM-modulated Mamba backbone to suppress visual noise and build short-term terrain memory. |
⚛️ Physics-Guided Filtering An uncertainty-aware neural velocity estimator is fused with rigid-body dynamics through an Extended Kalman Filter (EKF), ensuring physically consistent state estimation during impacts and slippage. |
|
🎯 Consistency-Aware Loss Gating Adaptive gating between behavioral cloning and RL stabilizes policy distillation and improves sim-to-real transfer, preventing policy collapse under aggressive domain randomization. |
⚡ Real-Time Onboard Deployment Bounded O(1) inference at ~13.1 ms/step on a Unitree Go2 with zero-shot sim-to-real transfer — no fine-tuning required on the real robot. |
🎓 Stage 1 — Privileged Teacher Policy Learning: The teacher policy learns precise proprioception–terrain associations through cross-modal attention. Proprioceptive states serve as Queries to selectively retrieve relevant terrain features encoded as Keys and Values from terrain scan dots.
🎒 Stage 2 — Distilling Student Policy with Spatio-Temporal Reasoning: The deployable student integrates FiLM-based visual–proprioceptive fusion with a Mamba temporal backbone. A physics-guided Bayesian estimator and consistency-aware loss gating further stabilize training and deployment.
REAL achieves 2x the overall success rate of the best prior baseline across hurdles, steps, and gaps:
| Method | Hurdles SR ↑ | Steps SR ↑ | Gaps SR ↑ | Overall SR ↑ | Overall MXD ↑ | MEV ↓ |
|---|---|---|---|---|---|---|
| Extreme Parkour | 0.18 | 0.14 | 0.10 | 0.16 | 0.21 | 34.24 |
| RPL | 0.05 | 0.04 | 0.03 | 0.04 | 0.10 | 1.56 |
| SoloParkour | 0.42 | 0.49 | 0.36 | 0.39 | 0.34 | 96.93 |
| REAL (Ours) 🏆 | 0.82 | 0.94 | 0.28 | 0.78 | 0.45 | 18.41 |
📌 SR: Success Rate — how often the robot reaches all target goals. MXD: Mean X-Displacement ∈ [0, 1], showing normalized forward progress. MEV: Mean Edge Violations — average number of unsafe foot-edge contacts per episode.
We evaluate policy robustness under three simulated sensor degradation conditions: frame drops, Gaussian noise, and spatial FoV occlusion.
| Method | Nominal SR | Frame Drop SR | Gaussian Noise SR | FoV Occlusion SR |
|---|---|---|---|---|
| Extreme Parkour | 0.16 | 0.16 (↓0.00) | 0.11 (↓0.05) | 0.13 (↓0.03) |
| RPL | 0.04 | 0.01 (↓0.04) | 0.01 (↓0.03) | 0.01 (↓0.03) |
| SoloParkour | 0.39 | 0.20 (↓0.19) | 0.37 (↓0.03) | 0.41 (↑0.02) |
| REAL (Ours) 🏆 | 0.78 | 0.61 (↓0.17) | 0.51 (↓0.27) | 0.72 (↓0.06) |
💡 Under severe FoV occlusion, REAL retains 92% of its nominal performance (0.72 vs 0.78), while vision-reliant baselines suffer catastrophic failures.
Vision is completely masked 1 meter before each obstacle, forcing the policy to rely on spatio-temporal memory:
| Method | SR ↑ | MXD ↑ | MEV ↓ |
|---|---|---|---|
| Extreme Parkour | 0.11 | 0.20 | 44.03 |
| SoloParkour | 0.36 | 0.34 | 103.50 |
| REAL (Ours) 🏆 | 0.55 | 0.39 | 24.84 |
| ❌ Baseline | ✅ REAL (Ours) |
|---|---|
![]() |
![]() |
| Fails immediately upon losing visual input | Maintains robust blind traversal across obstacles |
Zero-shot sim-to-real transfer on a physical Unitree Go2 quadruped using only onboard perception and computing:
| Scenario | Description | |
|---|---|---|
| 🦘 (a) | High Platform Leap | The robot dynamically jumps onto an elevated surface |
| 📦 (b) | Scattered Box Navigation | Traversing irregularly placed obstacles |
| 🪜 (c) | Steep Staircase Climb | Ascending a steep staircase with precise foot placement |
| Backbone | Avg. Latency | Meets 20 ms Budget? |
|---|---|---|
| Transformer | 23.07 ms | ❌ No |
| Mamba (Ours) | 13.14 ms | ✅ Yes |
⚡ Mamba's bounded O(1) complexity eliminates the sequence-scaling bottleneck of Transformers, enabling the high-frequency reactivity required for aggressive parkour.
| Variant | SR ↑ | MXD ↑ | MEV ↓ | Time ↓ | Coll. ↓ |
|---|---|---|---|---|---|
| REAL (Full) 🏆 | 0.78 | 0.45 | 18.41 | 0.02 | 0.06 |
| w/ MLP Estimator | 0.73 | 0.43 | 19.34 | 0.02 | 0.06 |
| w/o FiLM | 0.44 | 0.51 | 93.43 | 0.28 | 0.06 |
| w/o Mamba | 0.51 | 0.47 | 89.96 | 0.26 | 0.05 |
📌 Removing Mamba causes MEV to increase nearly 5x (18→90). Disabling FiLM drops SR by 44%. Both are critical for robust spatio-temporal reasoning.
| Estimator | RMSE ↓ |
|---|---|
| MLP (Baseline) | 0.52 |
| MLP + EKF | 0.40 |
| 1D ResNet (Single frame) | 0.33 |
| 1D ResNet (10 frames) | 0.28 |
| 1D ResNet + EKF (10 frames, Ours) 🏆 | 0.23 |
💡 Our consistency-aware loss gating accelerates early-stage convergence and achieves a lower final training loss compared to a fixed-weight baseline.
| Item | Detail |
|---|---|
| 🖥️ Simulator | Isaac Gym |
| 🐕 Robot Platform | Unitree Go2 |
| 🔄 Control Frequency | 50 Hz (policy) / 1 kHz (PD controller) |
| 🎮 Training Hardware | Single NVIDIA RTX 4080 GPU |
| ⏳ Training Time | ~30 hours (from scratch) |
| 📷 Depth Camera | Intel RealSense D435i |
| 💻 Onboard Compute | NVIDIA Jetson |
| 🚀 Deployment | Custom C++ + ONNX Runtime |
| 🎯 Reward Formulation | Same as Extreme Parkour |
We plan to release the full codebase upon paper acceptance. The following items are on our roadmap:
🏋️ Training
- Privileged teacher policy training code (Stage 1)
- Student distillation training code (Stage 2)
- Consistency-aware loss gating implementation
- Isaac Gym terrain environment and curriculum configs
- Domain randomization parameters and reward formulation
🧮 Models & Estimation
- Physics-guided filtering (EKF) module
- Uncertainty-aware velocity estimator (1D ResNet)
- Pre-trained model checkpoints (teacher & student)
If you find this work useful, please consider citing:
@article{real2026,
title = {REAL: Robust Extreme Agility via Spatio-Temporal
Policy Learning and Physics-Guided Filtering},
author = {Jialong Liu, Dehan Shen, Yanbo Wen,
Zeyu Jiang and Changhao Chen},
year = {2026}
}This work builds upon the simulation infrastructure of Isaac Gym and the terrain setup from Extreme Parkour. We thank the authors for their open-source contributions.
This project will be released under the MIT License.








