Robot arm kinematics and manipulation learning in MuJoCo physics simulation, featuring Pinocchio-based FK/IK, Catmull-Rom trajectory planning, and a multi-robot RL/IL training framework supporting PPO, SAC, TD3, DDPG and Behavior Cloning.
| Trajectory | Impedence Control |
|---|---|
![]() |
![]() |
manipulation_mujoco/
├── models/
│ ├── franka_emika_panda/ # Franka Panda MJCF model + meshes
│ └── trs_so_arm100/ # TRS SO-ARM100 MJCF model + meshes
├── kinematic/
│ ├── panda_kinematics.py # PandaKinematics — FK / IK / Jacobian (Pinocchio)
│ ├── trajectory.py # TrajectoryGenerator — cubic, Catmull-Rom, Cartesian arc
│ ├── run_fk.py # Forward kinematics demo
│ ├── run_ik.py # Inverse kinematics demo
│ └── run_trajectory.py # Figure-8 Lissajous trajectory demo
├── dynamics/
│ ├── impedance_controller.py # Task-space impedance control (τ = J^T K e + τ_bias)
│ └── admittance_controller.py # Task-space admittance control (virtual ODE + inner PD)
├── learning/
│ ├── envs/
│ │ ├── base_env.py # MuJocoRobotEnv — robot-agnostic Gymnasium base class
│ │ ├── registry.py # make_env("panda"/"so_arm", "reach"/"push"/"pick_place")
│ │ └── tasks/
│ │ ├── base_task.py # BaseTask interface
│ │ ├── reach.py
│ │ ├── push.py
│ │ └── pick_place.py
│ ├── robots/
│ │ ├── panda.py # FrankaPandaEnv (4D Cartesian delta + gripper)
│ │ └── so_arm.py # SoArm100Env (6D joint velocity)
│ ├── algos/
│ │ ├── rl_trainer.py # train_rl — PPO/SAC/TD3/DDPG + SuccessRateCallback
│ │ └── il/
│ │ └── bc.py # Behavior Cloning (MLP + CosineAnnealingLR)
│ ├── utils/
│ │ └── visualize.py # MuJoCo overlay helpers
│ ├── train.py # Unified training entry point
│ └── play.py # Real-time policy playback
├── requirements.txt
└── pyproject.toml
cd manipulation_mujoco
uv venv
source .venv/bin/activate
uv pip install -r requirements.txtsource .venv/bin/activate
# Forward kinematics
python kinematic/run_fk.py
# Inverse kinematics
python kinematic/run_ik.py
# Figure-8 continuous trajectory with live EE trail
python kinematic/run_trajectory.pyBoth demos use torque-controlled actuators (panda_motor.xml + scene_torque.xml).
Interact via Ctrl + drag in the MuJoCo viewer.
source .venv/bin/activate
# Impedance control — drag and release, arm springs back
python dynamics/impedance_controller.py
# Admittance control — arm compliantly follows your applied force
python dynamics/admittance_controller.pyControl law summary:
| Impedance | Admittance | |
|---|---|---|
| Input | Displacement e |
External force F_ext |
| Output | Joint torques directly | Virtual reference → inner PD |
| Feel | Stiff spring | Soft, mass-damper compliant |
| Equation | τ = J^T(Kp·e − Kd·ẋ) + τ_bias |
M_d·ẍ_v = F_ext − D_d·ẋ_v − K_d·(x_v−x_eq) |
Tune the parameters at the top of each file:
| Parameter | File | Effect |
|---|---|---|
KP / KD |
impedance_controller.py |
Spring stiffness / damping |
M_D |
admittance_controller.py |
Lower → faster response |
D_D |
admittance_controller.py |
Higher → smoother motion |
K_D |
admittance_controller.py |
0 = free drift, >0 = spring to eq |
source .venv/bin/activate
# Headless (recommended for speed)
python learning/train.py --robot panda --task reach --algo sac --timesteps 200000
python learning/train.py --robot panda --task push --algo td3 --timesteps 300000
python learning/train.py --robot panda --task pick_place --algo sac --timesteps 500000
# Enable real-time viewer
python learning/train.py --robot panda --task reach --algo sac --timesteps 200000 --render
# Parallel environments (headless only)
python learning/train.py --robot panda --task reach --algo sac --timesteps 200000 --n-envs 4
# TensorBoard logging
python learning/train.py --robot panda --task reach --algo sac --timesteps 200000 --tensorboard
tensorboard --logdir learning/runs/tb
# SO-ARM100
python learning/train.py --robot so_arm --task reach --algo sac --timesteps 100000source .venv/bin/activate
# Collect heuristic demos + train BC policy
python learning/train.py --robot panda --task reach --algo bc
# Visualize demo collection
python learning/train.py --robot panda --task reach --algo bc --rendersource .venv/bin/activate
# RL policy (SAC reach)
python learning/play.py --robot panda --task reach --algo sac \
--model learning/runs/panda_reach_sac.zip --episodes 5
# BC policy
python learning/play.py --robot panda --task reach --algo bc \
--model learning/runs/panda_reach_bc.pt --episodes 5
# Push / Pick-Place
python learning/play.py --robot panda --task push --algo sac \
--model learning/runs/panda_push_sac.zip
python learning/play.py --robot panda --task pick_place --algo sac \
--model learning/runs/panda_pick_place_sac.zip
# SO-ARM100
python learning/play.py --robot so_arm --task reach --algo sac \
--model learning/runs/so_arm_reach_sac.zip| Argument | Default | Description |
|---|---|---|
--robot |
panda |
panda | so_arm |
--task |
reach |
reach | push | pick_place |
--algo |
sac |
ppo | sac | td3 | ddpg | bc |
--timesteps |
200000 |
Total RL training steps |
--n-envs |
1 |
Number of parallel environments |
--render |
False |
Enable MuJoCo real-time viewer |
--tensorboard |
False |
Enable TensorBoard logging |
--save-dir |
learning/runs |
Model save directory |
--model |
— | Path to saved model (play only) |
--episodes |
10 |
Rollout episodes (play only) |
learning/runs/{robot}_{task}_{algo}.zip ← RL (Stable-Baselines3)
learning/runs/{robot}_{task}_{algo}.pt ← IL (Behavior Cloning)
- Create
learning/robots/my_robot.pyinheritingMuJocoRobotEnv - Implement
_build_spaces,_get_obs,_apply_action,get_ee_pos,get_ee_body - Register it in
learning/envs/registry.pyundermake_env - Use the same
train.py/play.pycommands with--robot my_robot

