This project explores end-to-end Reinforcement Learning for autonomous agricultural navigation, specifically targeting Task 1 of the Field Robot Event 2025 (FRE25). Using Politecnico di Milano's Rockerbot platform with its unique 4WIS4WID (4 Wheels Independent Steering 4 Wheels Independent Drive) kinematic configuration, the project implements a high-fidelity GPU-accelerated simulation environment in NVIDIA Isaac Lab.
Project Goal: Develop an RL policy that navigates crop rows by directly processing LIDAR sensor data and outputting control commands, without relying on traditional Sense-Plan-Act pipelines.
Current Status: Completed simulation development and curriculum level 4 training (Crab kinematic with command buffer control). The project successfully trains policies that navigate crop rows within ~2 minutes, achieving full row completion in ~1 hour of training.
Keywords: reinforcement learning, agricultural robotics, Isaac Lab, autonomous navigation, 4WIS4WID, PPO, sim-to-real
- GPU-Accelerated Physics: Achieves up to 3000 simulation steps/second using Isaac Lab's PhysX backend
 - Massively Parallel Training: Supports 4096+ environments running simultaneously
 - Custom Ray Marching LIDAR: From-scratch implementation for dynamic plant detection using signed distance fields
 - Procedural Scene Generation: Randomized crop row layouts with configurable curvature, spacing, and plant density
 - High-Fidelity Robot Model: Full Rockerbot CAD model with independent wheel steering and drive control
 
- Discrete Action Space: Simplified differential control (steering, throttle, command buffer)
 - Accumulator-Based Memory: Novel memory architecture avoiding RNN/LSTM computational overhead
 - Curriculum Learning: Progressive difficulty scaling from simple navigation to complex multi-row tasks
 - Comprehensive Reward Shaping: Multi-component reward balancing waypoint reaching, velocity, and constraint satisfaction
 
- Stable Baselines 3 (SB3): Primary training framework with PPO implementation
 - SKRL: Alternative training backend with multi-algorithm support
 - Extensible architecture for additional RL frameworks
 
- Isaac Lab installation (see Installation Guide)
 - Python 3.10+
 - NVIDIA GPU with CUDA support
 - Recommended: Conda environment for isolated dependency management
 
- 
Install Isaac Lab following the official guide. Conda installation is recommended.
 - 
Clone this repository outside the Isaac Lab directory:
 
git clone https://github.com/AIRLab-POLIMI/FRE25_IsaacLabSym.git
cd FRE25_IsaacLabSym- Install the extension in editable mode:
 
# Use isaaclab.sh -p instead of python if Isaac Lab not in venv/conda
python -m pip install -e source/FRE25_IsaacLabSym- Verify installation by listing available environments:
 
python scripts/list_envs.pyExpected output should include Fre25-Isaaclabsym-Direct-v0.
The repository provides convenient shell scripts for common operations:
# Train with Stable Baselines 3 (128 environments, headless mode)
./RUN_SB3_TRAIN.sh
# Train with custom number of environments
./RUN_SB3_TRAIN.sh --num_envs 512
# Train with Hydra configuration override
./RUN_SB3_TRAIN_HYDRA.sh# SB3 training
ISAAC_LAB_PATH="/path/to/IsaacLab"
${ISAAC_LAB_PATH}/isaaclab.sh -p scripts/sb3/train.py \
    --task Fre25-Isaaclabsym-Direct-v0 \
    --num_envs 128 \
    --headless
# SKRL training with Hydra config
${ISAAC_LAB_PATH}/isaaclab.sh -p scripts/skrl/train.py \
    --task Fre25-Isaaclabsym-Direct-v0 \
    --num_envs 128 \
    --headless# Play trained policy
./RUN_SB3_PLAY.sh
# Or specify checkpoint manually
${ISAAC_LAB_PATH}/isaaclab.sh -p scripts/sb3/play.py \
    --task Fre25-Isaaclabsym-Direct-v0 \
    --num_envs 1 \
    --checkpoint /path/to/model.zipTest the environment with keyboard control:
./RUN_TELEOP.shControls:
W/S: Throttle forward/backwardA/D: Steer left/rightE(hold): Step command bufferL: Reset controller state
See KEYBOARD_CONTROLS.md for detailed control documentation.
# Launch TensorBoard
./RUN_TENSORBOARD.sh
# Or manually specify log directory
tensorboard --logdir logs/sb3/Fre25-Isaaclabsym-Direct-v0# Random agent
python scripts/random_agent.py --task Fre25-Isaaclabsym-Direct-v0
# Zero-action agent
python scripts/zero_agent.py --task Fre25-Isaaclabsym-Direct-v0All simulation and training parameters are centralized in source/FRE25_IsaacLabSym/FRE25_IsaacLabSym/tasks/direct/fre25_isaaclabsym/fre25_isaaclabsym_env_cfg.py.
episode_length_s = 300.0: Episode duration in secondsdecimation = 4: Physics steps per control stepnum_envs: Number of parallel environments (set at runtime)
steering_scale = 2: Steering angle change per action step [deg/step]wheels_effort_scale = 15: Wheel motor velocity scalesteering_buffer_min/max = ±π/2: Steering angle limits [rad]
path_length = 3.0: Row length [m]paths_spacing = 1.2: Row spacing (Δᵣ) [m]n_control_points = 10: Control points for spline generation (cᵣ)n_plants_per_path = 10: Plants per row (ρ)plant_radius = 0.22: Plant collision radius (rₚₗₐₙₜ) [m]
lidar_rays_per_robot = 40: Number of rays (nᵣₐᵧₛ)lidar_max_distance = 1.0: Max sensing range [m]lidar_tolerance = 0.01: Ray marching convergence tolerance (εₗᵢ��ₐᵣ) [m]lidar_max_steps = 100: Max ray marching iterations (nₗᵢ��ₐᵣ)
waypoint_reached_epsilon = 0.35: Reached threshold (εw) [m]waypoints_per_row = 3: Waypoints per row
waypoint_reward_base = 100.0: Waypoint completion rewardvelocity_towards_waypoint_scale = 0.5: Velocity projection reward weightplant_collision_penalty = -100.0: Plant collision penaltytotal_reward_scale = 0.1: Final reward scaling factor
For complete parameter documentation with units and descriptions, see the configuration file.
FRE25_IsaacLabSym/
├── scripts/
│   ├── sb3/                          # Stable Baselines 3 training/play scripts
│   │   ├── train.py                  # SB3 training entry point
│   │   ├── play.py                   # Policy evaluation
│   │   └── callbacks.py              # Custom logging callbacks
│   ├── skrl/                         # SKRL training/play scripts
│   ├── teleop/                       # Keyboard teleoperation
│   ├── random_agent.py               # Random action agent
│   └── zero_agent.py                 # Zero action agent
│
├── source/FRE25_IsaacLabSym/
│   └── FRE25_IsaacLabSym/
│       └── tasks/
│           ├── direct/               # Direct RL environment (main)
│           │   └── fre25_isaaclabsym/
│           │       ├── fre25_isaaclabsym_env.py         # Environment implementation
│           │       ├── fre25_isaaclabsym_env_cfg.py     # Configuration
│           │       ├── RockerBot.py                     # Robot model
│           │       ├── CommandBuffer/                   # Command buffer logic
│           │       ├── PathHandler.py                   # Row generation
│           │       ├── PlantRelated/                    # Plant spawning/collision
│           │       ├── RayMarcher/                      # Custom LIDAR implementation
│           │       ├── WaypointRelated/                 # Waypoint system
│           │       ├── Assets/                          # 3D models (robot, plants)
│           │       └── agents/                          # RL agent configs
│           │
│           └── manager_based/        # Manager-based environment (legacy)
│
├── logs/                             # Training logs
│   ├── sb3/                          # SB3 logs and checkpoints
│   └── skrl/                         # SKRL logs and checkpoints
│
├── outputs/                          # Hydra experiment outputs
│
├── Multidisciplinary_Project_Report/ # Academic report (LaTeX)
│   ├── executive_summary.tex         # Main report document
│   └── Images/                       # Report figures
│
├── ExperimentalNotebooks/            # Jupyter notebooks for analysis
│
├── RUN_*.sh                          # Convenience shell scripts
├── KEYBOARD_CONTROLS.md              # Teleoperation documentation
└── README.md                         # This file
The project follows a curriculum to progressively increase task difficulty:
- Level 5 (Current): Plain navigation - Crab kinematics, automatic command buffer, skip 1 row
 - Level 4: Command buffer control - Crab kinematics, manual buffer control, skip 1 row ✓
 - Level 3: Multi-row skipping - Crab kinematics, manual buffer, skip up to 2 rows
 - Level 2: Full rigid body - FRBK kinematics with rotation, manual buffer
 - Level 1 (Goal): Full 4WIS4WID - Independent wheel control, complete task specification
 
Progress: Levels 5-4 completed. Level 4 trains to first row completion in ~2 minutes and full competence in ~1 hour.
After 339 training runs (487.29 hours total experimentation):
- Simulation Performance: Up to 3000 steps/second on GPU
 - Training Time: ~1 hour from scratch to competent navigation
 - First Success: First row completion in ~2 minutes of training
 - Configuration: 4096 parallel environments, PPO with discrete actions
 - Observation Space: 47 dimensions (steering, LIDAR, command buffer, past actions)
 - Action Space: 3 discrete actions (steering, throttle, command step)
 
Key findings documented in Multidisciplinary_Project_Report/executive_summary.tex.
- Current steering angle: 1D
 - LIDAR readings: 40D (configurable via 
lidar_rays_per_robot) - Command buffer state: 3D (current command, parity, buffer state)
 - Past actions: 3D (previous steering, throttle, command step)
 - Total: 47D base observation space
 
Discrete MultiDiscrete space for stability and sample efficiency:
- Steering: {-1, 0, +1} (left, neutral, right)
 - Throttle: {-1, 0, +1} (backward, stop, forward)
 - Command Step: {0, 1} (hold, advance buffer)
 - Hidden States (optional): {-1, +1}ⁿ for memory augmentation
 
Novel accumulator-based approach avoiding RNN computational overhead:
- Differential hidden state control: 
$\dot{\vec{h}} \in {-1, +1}^{n_h}$  - Continuous integration: 
$\vec{h}_{t+1} = \vec{h}_t + \alpha \cdot \dot{\vec{h}}$  - Provides policy with "scratchpad" for state tracking
 - Fully parallel implementation compatible with massive batching
 
Run VSCode task setup_python_env (Ctrl+Shift+P → Tasks: Run Task) to configure Python paths for Omniverse extensions.
pip install pre-commit
pre-commit run --all-filesEnable UI extension in Isaac Sim:
- Add 
source/directory to Extensions search paths - Enable "FRE25_IsaacLabSym" under Third Party extensions
 
Add extension path to .vscode/settings.json:
{
    "python.analysis.extraPaths": [
        "/path/to/FRE25_IsaacLabSym/source/FRE25_IsaacLabSym"
    ]
}Exclude unused Omniverse packages in .vscode/settings.json under python.analysis.extraPaths.
Reduce num_envs or adjust PhysX GPU memory settings in fre25_isaaclabsym_env_cfg.py:
gpu_max_rigid_contact_count=2**23
gpu_collision_stack_size=2**28Raymarching parameters can be tuned for performance vs accuracy:
- Reduce 
lidar_rays_per_robotfor faster computation - Adjust 
lidar_max_stepsfor raymarching precision 
If you use this work in your research, please cite:
@mastersthesis{ginefra2025fre25,
    title={4WIS4WID Mobile Robot Autonomous Navigation in Agricultural Setting using End-to-End Reinforcement Learning},
    author={Ginefra, Paolo},
    year={2025},
    school={Politecnico di Milano},
    type={Multidisciplinary Project},
    note={Supervisors: M. Restelli, S. Mentasti, M. Matteucci}
}- Isaac Lab Documentation: isaac-sim.github.io/IsaacLab
 - Field Robot Event: fieldrobot.com
 - Rockerbot Platform: Politecnico di Milano AIRLab
 - GitHub Repository: github.com/AIRLab-POLIMI/FRE25_IsaacLabSym
 
Planned extensions and research directions:
- Sim-to-Real Transfer: Domain randomization, sensor noise modeling, real Rockerbot deployment
 - Curriculum Progression: Advance to full 4WIS4WID control (Level 1)
 - Ablation Studies: Systematic evaluation of memory architecture, action discretization, reward components
 - Generalization Testing: Evaluation on unseen row curvatures, plant densities, spacing variations
 - Alternative Approaches: Comparison with imitation learning, classical controllers, other RL algorithms
 
See Multidisciplinary_Project_Report/executive_summary.tex Section 6 for detailed future work proposals.
This project is licensed under the BSD-3-Clause License, consistent with Isaac Lab licensing.
- NVIDIA Isaac Lab Team for the simulation framework
 - Politecnico di Milano AIRLab for Rockerbot platform and support
 - Field Robot Event Organization for the challenge specification
 - Supervisors: Prof. M. Restelli, Prof. S. Mentasti, Prof. M. Matteucci
 
Author: Paolo Ginefra
Institution: Politecnico di Milano - School of Industrial and Information Engineering
Academic Year: 2024-2025
For questions or collaboration inquiries, please open an issue on the GitHub repository.
