Skip to content

Latest commit

 

History

History
290 lines (198 loc) · 9.47 KB

File metadata and controls

290 lines (198 loc) · 9.47 KB

🤖 REAL: Robust Extreme Agility Learning

Robust Extreme Agility via Spatio-Temporal Policy Learning and Physics-Guided Filtering

Paper License Platform


REAL enables a quadrupedal robot to chain highly dynamic parkour maneuvers across complex terrains
with nominal vision (green box), and maintain stable locomotion even under severe visual degradation (red box).

📋 Table of Contents


📰 News

Date Update
🔥 2026/03 Paper submitted. Under review.
🎉 2026/03 Repository created. Code will be released upon acceptance.

✨ Highlights

🧠 Spatio-Temporal Policy Learning

A privileged teacher learns structured proprioception–terrain associations via cross-modal attention. The distilled student uses a FiLM-modulated Mamba backbone to suppress visual noise and build short-term terrain memory.

⚛️ Physics-Guided Filtering

An uncertainty-aware neural velocity estimator is fused with rigid-body dynamics through an Extended Kalman Filter (EKF), ensuring physically consistent state estimation during impacts and slippage.

🎯 Consistency-Aware Loss Gating

Adaptive gating between behavioral cloning and RL stabilizes policy distillation and improves sim-to-real transfer, preventing policy collapse under aggressive domain randomization.

Real-Time Onboard Deployment

Bounded O(1) inference at ~13.1 ms/step on a Unitree Go2 with zero-shot sim-to-real transfer — no fine-tuning required on the real robot.


🏗️ Architecture


🎓 Stage 1 — Privileged Teacher Policy Learning: The teacher policy learns precise proprioception–terrain associations through cross-modal attention. Proprioceptive states serve as Queries to selectively retrieve relevant terrain features encoded as Keys and Values from terrain scan dots.

🎒 Stage 2 — Distilling Student Policy with Spatio-Temporal Reasoning: The deployable student integrates FiLM-based visual–proprioceptive fusion with a Mamba temporal backbone. A physics-guided Bayesian estimator and consistency-aware loss gating further stabilize training and deployment.


📊 Results

🏔️ Extreme Terrain Traversability



REAL achieves 2x the overall success rate of the best prior baseline across hurdles, steps, and gaps:

Method Hurdles SR ↑ Steps SR ↑ Gaps SR ↑ Overall SR ↑ Overall MXD ↑ MEV ↓
Extreme Parkour 0.18 0.14 0.10 0.16 0.21 34.24
RPL 0.05 0.04 0.03 0.04 0.10 1.56
SoloParkour 0.42 0.49 0.36 0.39 0.34 96.93
REAL (Ours) 🏆 0.82 0.94 0.28 0.78 0.45 18.41

📌 SR: Success Rate — how often the robot reaches all target goals. MXD: Mean X-Displacement ∈ [0, 1], showing normalized forward progress. MEV: Mean Edge Violations — average number of unsafe foot-edge contacts per episode.


🛡️ Robustness Under Perceptual Degradation

We evaluate policy robustness under three simulated sensor degradation conditions: frame drops, Gaussian noise, and spatial FoV occlusion.

Method Nominal SR Frame Drop SR Gaussian Noise SR FoV Occlusion SR
Extreme Parkour 0.16 0.16 (↓0.00) 0.11 (↓0.05) 0.13 (↓0.03)
RPL 0.04 0.01 (↓0.04) 0.01 (↓0.03) 0.01 (↓0.03)
SoloParkour 0.39 0.20 (↓0.19) 0.37 (↓0.03) 0.41 (↑0.02)
REAL (Ours) 🏆 0.78 0.61 (↓0.17) 0.51 (↓0.27) 0.72 (↓0.06)

💡 Under severe FoV occlusion, REAL retains 92% of its nominal performance (0.72 vs 0.78), while vision-reliant baselines suffer catastrophic failures.


🙈 Blind-Zone Maneuvers

Vision is completely masked 1 meter before each obstacle, forcing the policy to rely on spatio-temporal memory:



Method SR ↑ MXD ↑ MEV ↓
Extreme Parkour 0.11 0.20 44.03
SoloParkour 0.36 0.34 103.50
REAL (Ours) 🏆 0.55 0.39 24.84

👀 Real-World Extreme Blind Test

❌ Baseline ✅ REAL (Ours)
Fails immediately upon losing visual input Maintains robust blind traversal across obstacles

🌍 Real-World Deployment


Zero-shot sim-to-real transfer on a physical Unitree Go2 quadruped using only onboard perception and computing:

Scenario Description
🦘 (a) High Platform Leap The robot dynamically jumps onto an elevated surface
📦 (b) Scattered Box Navigation Traversing irregularly placed obstacles
🪜 (c) Steep Staircase Climb Ascending a steep staircase with precise foot placement

⏱️ Inference Latency


Backbone Avg. Latency Meets 20 ms Budget?
Transformer 23.07 ms ❌ No
Mamba (Ours) 13.14 ms Yes

⚡ Mamba's bounded O(1) complexity eliminates the sequence-scaling bottleneck of Transformers, enabling the high-frequency reactivity required for aggressive parkour.


🔬 Ablation Study

🧩 Component-Level Ablation

Variant SR ↑ MXD ↑ MEV ↓ Time ↓ Coll. ↓
REAL (Full) 🏆 0.78 0.45 18.41 0.02 0.06
w/ MLP Estimator 0.73 0.43 19.34 0.02 0.06
w/o FiLM 0.44 0.51 93.43 0.28 0.06
w/o Mamba 0.51 0.47 89.96 0.26 0.05

📌 Removing Mamba causes MEV to increase nearly 5x (18→90). Disabling FiLM drops SR by 44%. Both are critical for robust spatio-temporal reasoning.

📏 Velocity Estimation

Estimator RMSE ↓
MLP (Baseline) 0.52
MLP + EKF 0.40
1D ResNet (Single frame) 0.33
1D ResNet (10 frames) 0.28
1D ResNet + EKF (10 frames, Ours) 🏆 0.23

📈 Training Convergence

💡 Our consistency-aware loss gating accelerates early-stage convergence and achieves a lower final training loss compared to a fixed-weight baseline.


⚙️ Training Details

Item Detail
🖥️ Simulator Isaac Gym
🐕 Robot Platform Unitree Go2
🔄 Control Frequency 50 Hz (policy) / 1 kHz (PD controller)
🎮 Training Hardware Single NVIDIA RTX 4080 GPU
⏳ Training Time ~30 hours (from scratch)
📷 Depth Camera Intel RealSense D435i
💻 Onboard Compute NVIDIA Jetson
🚀 Deployment Custom C++ + ONNX Runtime
🎯 Reward Formulation Same as Extreme Parkour

📝 TODO

We plan to release the full codebase upon paper acceptance. The following items are on our roadmap:

🏋️ Training

  • Privileged teacher policy training code (Stage 1)
  • Student distillation training code (Stage 2)
  • Consistency-aware loss gating implementation
  • Isaac Gym terrain environment and curriculum configs
  • Domain randomization parameters and reward formulation

🧮 Models & Estimation

  • Physics-guided filtering (EKF) module
  • Uncertainty-aware velocity estimator (1D ResNet)
  • Pre-trained model checkpoints (teacher & student)

🔖 Citation

If you find this work useful, please consider citing:

@article{real2026,
  title   = {REAL: Robust Extreme Agility via Spatio-Temporal
             Policy Learning and Physics-Guided Filtering},
  author  = {Jialong Liu, Dehan Shen, Yanbo Wen,
             Zeyu Jiang and Changhao Chen},
  year    = {2026}
}

🙏 Acknowledgements

This work builds upon the simulation infrastructure of Isaac Gym and the terrain setup from Extreme Parkour. We thank the authors for their open-source contributions.

📄 License

This project will be released under the MIT License.