Skip to content

fthbng77/SP_SLAM3

Repository files navigation

SP-SLAM3

A modified version of ORB-SLAM3 that replaces the entire handcrafted feature pipeline with learned deep learning components to improve accuracy and robustness.

What we changed:

  • Replaced ORB feature extractor with SuperPoint (CNN-based learned keypoint detector + 256-dim float descriptor)
  • Replaced Hamming distance matching with L2 norm matching (SPmatcher) adapted for float descriptors
  • Integrated LightGlue as the primary frame-to-frame matcher in the tracking loop, with automatic L2 fallback
  • Added optical flow fallback (Lucas-Kanade + PnP RANSAC) for tracking recovery when all descriptor-based methods fail
  • Replaced DBoW2 with DBoW3 and added optional NetVLAD/CosPlace for learned place recognition / loop closing
  • Added FP16 inference support and GPU/CPU fallback for all neural network components
  • Added thread-safe GPU inference with mutex protection for concurrent SuperPoint + LightGlue operations

Result: Up to 2.25x better trajectory accuracy and up to 97% tracking rate on difficult EuRoC sequences where baseline tracked only 50-58%.

SP-SLAM3 vs Ground Truth Trajectory (3D)

SP-SLAM3 vs Ground Truth Trajectory (2D)

Absolute Trajectory Error Over Time

Watch the video

Forked from ORB-SLAM3. Pre-trained SuperPoint model from the official MagicLeap repository.


Key Changes from ORB-SLAM3

Component ORB-SLAM3 SP-SLAM3
Feature Extractor ORB (handcrafted) SuperPoint (learned, CNN-based)
Descriptor 256-bit binary (Hamming distance) 256-dim float (L2 distance)
Feature Matcher ORBmatcher (Hamming) LightGlue (primary) + SPmatcher L2 (fallback)
Tracking Recovery Relocalization only LightGlue + Optical Flow PnP + Relocalization
Place Recognition DBoW2 DBoW3 + NetVLAD/CosPlace (optional)
Inference Precision N/A FP32 / FP16 (CUDA)

Features

  • SuperPoint — Learned CNN-based keypoint detector and descriptor extractor with 256-dim float descriptors, replacing handcrafted ORB features
  • LightGlue Primary Matching — Attention-based learned matcher used as the primary frame-to-frame matcher in the tracking loop, with automatic fallback to brute-force L2 matching when unavailable
  • Multi-Level Tracking Recovery — Cascaded fallback chain: LightGlue matching -> L2 descriptor matching -> BoW reference keyframe matching -> Optical flow PnP pose recovery
  • Optical Flow + PnP — Lucas-Kanade optical flow with forward-backward consistency check and cv::solvePnPRansac for direct pose estimation when all descriptor-based methods fail
  • Thread-Safe GPU Inference — Mutex-protected GPU inference preventing race conditions between Tracking and LocalMapping threads
  • NetVLAD / CosPlace (optional) — Global descriptor-based loop closing using learned place recognition models, with automatic fallback to DBoW3
  • FP16 Inference — Half-precision inference support for SuperPoint, LightGlue, and PlaceRecognition on CUDA-capable GPUs
  • GPU/CPU Fallback — All neural network components gracefully fall back to CPU when CUDA is unavailable

1. Prerequisites

Tested on Ubuntu 20.04.

C++17 Compiler

SP-SLAM3 uses C++17 thread and chrono functionalities.

OpenCV

We use OpenCV to manipulate images and features. Supports OpenCV 3.x and 4.x. Tested with OpenCV 4.2.0 and 4.10.0.

Option A — Install from apt (recommended):

sudo apt-get install libopencv-dev

Option B — Build from source:

sudo apt-get update
sudo apt-get install build-essential cmake git pkg-config libgtk-3-dev \
    libavcodec-dev libavformat-dev libswscale-dev libv4l-dev \
    libxvidcore-dev libx264-dev libjpeg-dev libpng-dev libtiff-dev \
    gfortran openexr libatlas-base-dev python3-dev python3-numpy \
    libtbb2 libtbb-dev libdc1394-22-dev

cd ~
git clone https://github.com/opencv/opencv.git
cd opencv && git checkout 4.10.0

cd ~
git clone https://github.com/opencv/opencv_contrib.git
cd opencv_contrib && git checkout 4.10.0

cd ~/opencv
mkdir build && cd build

cmake -D CMAKE_BUILD_TYPE=Release \
      -D CMAKE_INSTALL_PREFIX=/usr/local \
      -D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib/modules \
      -D BUILD_EXAMPLES=ON ..

make -j$(nproc)
sudo make install
sudo ldconfig

Eigen3

Required by g2o. Required at least 3.1.0. Tested with Eigen3 3.4.0.

sudo apt install libeigen3-dev

Pangolin

Required for 3D visualization. Tested with Pangolin 0.6+.

# Install dependencies
sudo apt install libgl1-mesa-dev libglew-dev libpython3-dev

# Build from source
cd ~
git clone --recursive https://github.com/stevenlovegrove/Pangolin.git
cd Pangolin
mkdir build && cd build
cmake ..
make -j$(nproc)
sudo make install

Boost & OpenSSL

sudo apt install libboost-serialization-dev libssl-dev

DBoW3, Pangolin and g2o (Included in Thirdparty folder)

We use a BoW vocabulary based on the DBoW3 library to perform place recognition, and g2o library is used to perform non-linear optimizations. All these libraries are included in the Thirdparty folder.

Vocabulary

The SuperPoint vocabulary files are included in this repository in multiple formats: Vocabulary/superpoint_voc.dbow3 (binary, fast loading), Vocabulary/superpoint_voc.yml.gz (compressed YAML). For more information please refer to this repo.

You can convert between formats using:

./tools/convert_vocab Vocabulary/superpoint_voc.yml.gz Vocabulary/superpoint_voc.dbow3

NVIDIA Driver & CUDA Toolkit 12.2 with cuDNN 8.9.1

Follow these instructions for the installation of CUDA Toolkit 12.2.

If not installed during the CUDA Toolkit installation process, install the NVIDIA driver:

sudo apt-get install nvidia-driver-535

Export CUDA paths:

echo 'export PATH=/usr/local/cuda-12.2/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
sudo ldconfig

Verify NVIDIA driver availability:

nvidia-smi

LibTorch (with GPU | CUDA 12.1+)

If only CPU is available, install the CPU version of LibTorch. The system will automatically fall back to CPU mode.

# LibTorch 2.3.0 for CUDA 12.1/12.2
wget https://download.pytorch.org/libtorch/cu121/libtorch-cxx11-abi-shared-with-deps-2.3.0%2Bcu121.zip -O libtorch.zip
sudo unzip libtorch.zip -d /usr/local

Set the TORCH_DIR environment variable (optional, defaults to /usr/local/libtorch/share/cmake/Torch):

export TORCH_DIR=/usr/local/libtorch/share/cmake/Torch

2. Building SP-SLAM3

Clone the repository:

git clone --recursive https://github.com/fthbng77/SP_SLAM3.git

Build the project:

cd SP_SLAM3
chmod +x build.sh
./build.sh

If LibTorch is installed in a custom path:

TORCH_DIR=/path/to/libtorch/share/cmake/Torch ./build.sh

If you have multiple OpenCV versions installed, specify the correct one:

cd build
cmake .. -DCMAKE_BUILD_TYPE=Release \
  -DTorch_DIR=/usr/local/libtorch/share/cmake/Torch \
  -DOpenCV_DIR=/usr/lib/x86_64-linux-gnu/cmake/opencv4
make -j$(nproc)

3. Optional Model Export

SP-SLAM3 supports optional learned models for matching and place recognition. These are not required — the system falls back to brute-force L2 matching and DBoW3 when models are not provided.

LightGlue (Learned Matcher)

pip install git+https://github.com/cvg/LightGlue.git
python scripts/export_lightglue.py --output lightglue.pt

CosPlace / NetVLAD (Place Recognition)

# CosPlace (recommended — ResNet18 backbone, 512-dim descriptor)
pip install torch torchvision
python scripts/export_place_recognition.py --model cosplace --output cosplace.pt

# NetVLAD (4096-dim descriptor, requires hloc)
pip install hloc
python scripts/export_place_recognition.py --model netvlad --output netvlad.pt

4. Running (Monocular)

cd SP_SLAM3
export LD_LIBRARY_PATH=$(pwd)/lib:$LD_LIBRARY_PATH

Live Webcam

./Examples/Monocular/mono_webcam Vocabulary/superpoint_voc.dbow3 Examples/Monocular/EuRoC.yaml

EuRoC Dataset

Download the EuRoC MAV dataset zip from ETH Zurich Research Collection and extract the sequences (e.g., Examples/Monocular/MH_01_easy).

# Single sequence
./Examples/Monocular/mono_euroc \
  Vocabulary/superpoint_voc.dbow3 \
  Examples/Monocular/EuRoC.yaml \
  Examples/Monocular/MH_01_easy \
  Examples/Monocular/EuRoC_TimeStamps/MH01.txt

# Output: CameraTrajectory.txt, KeyFrameTrajectory.txt

Available sequences and timestamps:

Difficulty Sequence Timestamp File
Easy MH_01_easy EuRoC_TimeStamps/MH01.txt
Easy MH_02_easy EuRoC_TimeStamps/MH02.txt
Medium MH_03_medium EuRoC_TimeStamps/MH03.txt
Difficult MH_04_difficult EuRoC_TimeStamps/MH04.txt
Difficult MH_05_difficult EuRoC_TimeStamps/MH05.txt
Easy V1_01_easy EuRoC_TimeStamps/V101.txt
Medium V1_02_medium EuRoC_TimeStamps/V102.txt
Difficult V1_03_difficult EuRoC_TimeStamps/V103.txt
Easy V2_01_easy EuRoC_TimeStamps/V201.txt
Medium V2_02_medium EuRoC_TimeStamps/V202.txt
Difficult V2_03_difficult EuRoC_TimeStamps/V203.txt

Configuration

Edit Examples/Monocular/EuRoC.yaml to adjust parameters:

SuperPoint Parameters

Parameter Description Default
ORBextractor.nFeatures Number of features per image 800
ORBextractor.nLevels Scale pyramid levels (1 = no pyramid, recommended) 1
ORBextractor.iniThFAST SuperPoint confidence threshold 0.155
ORBextractor.minThFAST Fallback threshold (if too few features detected) 0.055
SuperPoint.useFP16 Enable FP16 inference on CUDA (0/1) 0

Note: Parameter names use ORBextractor prefix for backward compatibility with ORB-SLAM3.

LightGlue Parameters (Optional)

Parameter Description Default
LightGlue.model_path Path to exported TorchScript model (disabled)
LightGlue.useFP16 Enable FP16 inference on CUDA (0/1) 0

Place Recognition Parameters (Optional)

Parameter Description Default
PlaceRecognition.model_path Path to exported TorchScript model (disabled)
PlaceRecognition.useFP16 Enable FP16 inference on CUDA (0/1) 0

Example configuration with all features enabled:

# SuperPoint
ORBextractor.nFeatures: 800
ORBextractor.nLevels: 1
ORBextractor.iniThFAST: 0.155
ORBextractor.minThFAST: 0.055
SuperPoint.useFP16: 1

# LightGlue
LightGlue.model_path: "lightglue.pt"
LightGlue.useFP16: 1

# Place Recognition
PlaceRecognition.model_path: "cosplace.pt"
PlaceRecognition.useFP16: 1

5. Architecture

graph TD
    A["Input Image<br/>(Grayscale)"] --> B["<b>SuperPoint</b><br/>CNN Encoder + Dual Decoder<br/><i>FP32 / FP16</i>"]

    B --> K["Keypoints"]
    B --> D["256-dim Float<br/>Descriptors"]

    K --> M{Tracking Pipeline}
    D --> M

    M -- "1. primary" --> LG["<b>LightGlue</b><br/>Attention-based<br/>Learned Matcher"]
    M -- "2. fallback" --> SP["<b>SPmatcher</b><br/>Brute-force L2"]
    M -- "3. fallback" --> BOW["<b>BoW Matching</b><br/>Reference KeyFrame"]
    M -- "4. last resort" --> OF["<b>Optical Flow + PnP</b><br/>Lucas-Kanade + RANSAC"]

    LG --> T
    SP --> T
    BOW --> T
    OF --> T

    subgraph SLAM ["ORB-SLAM3 Backend"]
        T["<b>Tracking</b><br/>Pose Estimation<br/>+ Pose Optimization"]
        T --> LM["<b>Local Mapping</b><br/>Keyframe Processing<br/>Map Point Creation"]
        LM --> LC["<b>Loop Closing</b><br/>Loop Detection<br/>Pose Graph Optimization"]
    end

    A --> PR{Place Recognition}
    PR -- "default" --> DB["<b>DBoW3</b><br/>BoW Vocabulary"]
    PR -- "optional" --> NV["<b>NetVLAD / CosPlace</b><br/>Global Descriptor"]
    DB --> LC
    NV --> LC

    style B fill:#2d6a4f,color:#fff
    style LG fill:#1b4332,color:#fff
    style NV fill:#1b4332,color:#fff
    style SP fill:#40916c,color:#fff
    style BOW fill:#40916c,color:#fff
    style DB fill:#40916c,color:#fff
    style OF fill:#e76f51,color:#fff
    style SLAM fill:#f0f0f0,stroke:#333,color:#000
    style T fill:#264653,color:#fff
    style LM fill:#264653,color:#fff
    style LC fill:#264653,color:#fff
Loading

6. Evaluation

Ground truth trajectories for EuRoC sequences are included in evaluation/Ground_truth/EuRoC_left_cam/.

Benchmark: Comparison with State-of-the-Art

Absolute Trajectory Error (ATE RMSE) on the EuRoC MAV dataset (monocular, Sim(3) aligned, median of 5 runs):

Sequence SP-SLAM3 ORB-SLAM3 DSO DSM DROID-SLAM GO-SLAM
MH_01_easy 0.026 0.016 0.046 0.039 0.013 0.016
MH_02_easy 0.016 0.027 0.046 0.036 0.014 0.014
MH_03_medium 0.052 0.028 0.172 0.055 0.022 0.023
MH_04_difficult 0.085 0.138 3.810 0.057 0.043 0.045
MH_05_difficult 0.132 0.072 0.110 0.067 0.043 0.045
  • Classical feature-based: ORB-SLAM3, SP-SLAM3
  • Direct methods: DSO (Engel et al. 2018), DSM (Zubizarreta et al. 2020)
  • Deep learning SLAM: DROID-SLAM (Teed & Deng, NeurIPS 2021), GO-SLAM (Zhang et al. 2023)

Sources: ORB-SLAM3 — Campos et al. (2021) Table II; DSO, DSM — ORB-SLAM3 paper; DROID-SLAM, GO-SLAM — DPV-SLAM (Lipson et al., ECCV 2024) Table 3. All values median of 5 runs.

Improvement over baseline SP-SLAM3 (L2-only matching):

Sequence Baseline ATE Improved ATE ATE Change Tracking Improvement
MH_01_easy 0.0218 m 0.0255 m +17% 90.4% -> 98.8%
MH_02_easy 0.0279 m 0.0160 m -43% 85.9% -> 85.1%
MH_03_medium 0.0665 m 0.0523 m -21% 49.3% -> 72.9%
MH_04_difficult 0.1236 m 0.0847 m -32% 58.3% -> 96.8%
MH_05_difficult 0.1082 m 0.1317 m +22% 50.5% -> 95.0%

Analysis:

  • Tracking robustness dramatically improved: Difficult sequences jumped from ~50-58% to ~95-97% tracking rate thanks to LightGlue primary matching and optical flow PnP fallback
  • Easy sequences: SP-SLAM3 outperforms ORB-SLAM3 by 1.3-2.25x on easy sequences with near-perfect tracking (99%)
  • Difficult sequences: Tracking rate is now high (~95-97%) but ATE remains higher than ORB-SLAM3 due to accumulated drift over longer tracked trajectories. Further improvement requires better loop closing or local BA
  • Tracking pipeline: LightGlue (primary) -> L2 projection matching (fallback) -> BoW reference keyframe matching -> Optical flow + PnP (last resort)

Using evo

Install the evo evaluation tool:

pip install evo

Evaluate trajectory accuracy after running a sequence:

# Absolute Trajectory Error (ATE)
evo_ape tum evaluation/Ground_truth/EuRoC_left_cam/MH01_GT.txt CameraTrajectory.txt -as

# Relative Pose Error (RPE)
evo_rpe tum evaluation/Ground_truth/EuRoC_left_cam/MH01_GT.txt CameraTrajectory.txt -as

# Compare multiple results
evo_res results/*.zip -p --plot

7. Roadmap

Tracking Robustness

  • LightGlue primary matching — LightGlue used as the primary frame-to-frame matcher in TrackWithMotionModel, with automatic L2 fallback
  • Multi-level tracking recovery — Cascaded fallback chain: LightGlue -> L2 projection -> BoW reference KF -> Optical flow PnP
  • Optical flow + PnP fallback — Lucas-Kanade optical flow with forward-backward consistency check and cv::solvePnPRansac for direct pose estimation as last resort
  • Thread-safe GPU inference — Mutex protection for concurrent LightGlue/SuperPoint GPU operations between Tracking and LocalMapping threads
  • LightGlue TorchScript export — Fixed export pipeline with disabled dynamic pruning/early stopping for JIT compatibility

Matching Improvement

  • LightGlue integration — Attention-based learned matcher for SearchForInitialization, SearchForTriangulation, and primary tracking
  • Adaptive matcher selection — Use L2 matching when tracking is stable (high match count), switch to LightGlue only when needed to reduce GPU overhead on easy sequences
  • Matching threshold calibration — Optimize TH_HIGH / TH_LOW on benchmark datasets for L2 descriptor matching

Place Recognition

  • NetVLAD / CosPlace — Global descriptor-based loop closing replacing DBoW3
  • SuperPoint vocabulary regeneration — Retrain vocabulary with a larger and more diverse dataset

Thermal Camera Support

  • Adaptive thermal preprocessing — Add ThermalPreprocessor module with CLAHE, histogram equalization, and median filtering that auto-adjusts based on image statistics (contrast, mean brightness, noise level)
  • YAML-configurable thermal modeThermal.enablePreprocessing: 1 flag to enable/disable thermal preprocessing without affecting standard camera usage
  • SuperPoint fine-tuning on thermal data — Retrain SuperPoint on thermal datasets (KAIST Multispectral, FLIR ADAS) using homographic adaptation for improved keypoint detection on infrared imagery
  • Thermal-specific threshold tuning — Lower default iniThFAST / minThFAST values for thermal images with adaptive threshold adjustment based on detected keypoint count

Performance Optimization

  • Half precision (FP16) inference — FP16 mode for SuperPoint, LightGlue, and PlaceRecognition on CUDA
  • TensorRT / ONNX Runtime — Replace LibTorch with TensorRT for 2-3x speedup on NVIDIA GPUs
  • Remove image pyramid — Fully remove pyramid code (currently using nLevels=1 workaround)

8. License

SP-SLAM3 is released under the GPLv3 license, same as ORB-SLAM3.

9. References

  • ORB-SLAM3 - C. Campos, R. Elvira, J. J. G. Rodriguez, J. M. M. Montiel and J. D. Tardos
  • SuperPoint - D. DeTone, T. Malisiewicz and A. Rabinovich (MagicLeap)
  • LightGlue - P. Lindenberger, P.-E. Sarlin and M. Pollefeys
  • NetVLAD - R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla and J. Sivic
  • CosPlace - G. Berton, C. Masone and B. Caputo

About

ORB-SLAM3 with SuperPoint, LightGlue, and NetVLAD/CosPlace replacing handcrafted features with learned deep learning components for improved visual SLAM accuracy

Topics

Resources

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages