SFT3D: Unsupervised Online 3D Instance Segmentation with Synthetic Sequences and Dynamic Loss (TMM2026)

This repository contains the official PyTorch implementation of the method SFT3D. For more details, please refer to the original paper.

Citation

If you use SFT3D in your research, please cite the following paper:

@article{zhang2025SFT3D,
  title = {{SFT3D}: Unsupervised Online Instance Segmentation through Time},
  author={Zhang, Yifan and Zhang, Wei and He, Chuangxin and Miao, Zhonghua and Hou, Junhui},
  journal={IEEE Transactions on Multimedia},
  year={2026}
}

Supported Datasets

Ensure that the datasets are placed in the datasets folder with lowercase names, structured as follows:

SFT3D
├── datasets
│   ├── nuscenes
│   │   ├── v1.0-trainval
│   │   ├── samples
│   │   ├── sweeps
│   ├── pandaset
│   │   ├── 001
│   │   ├── 002
│   │   └── ...
│   ├── semantickitti
│   │   └── sequences

Note: PandaSet does not natively provide instance ground-truth. To generate it, run:

python create_instance_gt_pandaset.py --sensor pandargt

This script processes the PandarGT sensor data. You can create new sequences using create_pc_sequence.py.

Installation and Setup

Clone this repository:

git clone https://github.com/valeoai/SFT3D.git
cd SFT3D

Install dependencies:
```
pip install -r requirements.txt
```
Prepare the datasets as described in the Supported Datasets section.

Install Patchwork++ for dataset preprocessing (required for nuScenes and PandaSet):

git clone https://github.com/url-kaist/patchwork-plusplus.git
cd patchwork-plusplus
pip install -e .

Preprocessing

For SemanticKITTI, you can directly download the pre-processed segments:

wget https://github.com/valeoai/UNIT/releases/download/v1.0/segments_gridsample_sk.tar.gz

For nuScenes and PandaSet, you must run the preprocessing scripts:

python preprocess_nuscenes.py
python preprocess_pandaset.py

Note: Preprocessing may take several hours depending on your hardware.

Training the Model

1. Non-temporal Training (Single Frame)

Run the following commands based on your dataset:

SemanticKITTI:

python main_instance_segmentation.py general.experiment_name=sk_single data/datasets=semantic_kitti data.batch_size=3 general.num_frames=1 data.test_batch_size=3 trainer.max_epochs=50

PandaSet (PandarGT):

python main_instance_segmentation.py general.experiment_name=pdgt_single data/datasets=pandaset_pandargt data.batch_size=3 general.num_frames=1 data.test_batch_size=3 trainer.max_epochs=150

nuScenes:

python main_instance_segmentation.py general.experiment_name=ns_single data/datasets=nuscenes data.batch_size=3 general.num_frames=1 data.test_batch_size=3 trainer.max_epochs=4

2. Temporal Training (Multiple Frames)

To train the temporal models, use the following commands:

SemanticKITTI:

python main_instance_segmentation.py general.experiment_name=sk_temporal general.checkpoint=saved/sk_single/last-epoch.ckpt data/datasets=semantic_kitti data.batch_size=4 general.num_frames=2 general.consistency_loss=true data.test_batch_size=4 trainer.max_epochs=50

PandaSet (PandarGT):

python main_instance_segmentation.py general.experiment_name=pdgt_temporal general.checkpoint=saved/pdgt_single/last-epoch.ckpt data/datasets=pandaset_pandargt data.batch_size=5 general.num_frames=2 general.consistency_loss=true data.test_batch_size=5 trainer.max_epochs=150

nuScenes:

python main_instance_segmentation.py general.experiment_name=ns_temporal general.checkpoint=saved/ns_single/last-epoch.ckpt data/datasets=nuscenes data.batch_size=5 general.num_frames=2 general.consistency_loss=true data.test_batch_size=5 trainer.max_epochs=4

Inference

To run inference on a trained model, use the following commands:

SemanticKITTI:

python inference_instance_segmentation.py general.experiment_name=sk_temporal general.checkpoint=saved/sk_temporal/last-epoch.ckpt data.batch_size=1 general.forward_queries=true data/datasets=semantic_kitti data.predict_mode=validation

Adjust the command for other datasets or checkpoints. For non-temporal inference, set general.forward_queries=false.

Predictions will be saved in the assets directory of the dataset and are computed by default for the validation set only.

Evaluation

After inference, evaluate the segmentation results:

For SemanticKITTI, run:

python evaluate_4dpanoptic_sk -p ../datasets/semantickitti/assets/sk_temporal

Adapt the command for other datasets.

Troubleshooting

Memory Issues: The code uses mixed-precision (bf16) to save memory, but it requires newer hardware. To disable mixed-precision and use higher memory usage, set general.precision=32 in your config.
Slow Inference: Evaluation on nuScenes can be slow due to optimizations in 4D-PLS, which may lead to overflow on large datasets.

Acknowledgments

We sincerely appreciate the open-source project UNIT developed by valeoai.
Thanks to the authors of the SemanticKITTI, nuScenes, and PandaSet datasets.

Feel free to open issues or pull requests if you encounter problems or have suggestions!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Mask3D		Mask3D
assets		assets
datasets		datasets
evaluation		evaluation
preprocess		preprocess
utils		utils
LICENSE		LICENSE
README.md		README.md
create_gt_database.py		create_gt_database.py
create_instance_gt_pandaset.py		create_instance_gt_pandaset.py
create_pc_sequence.py		create_pc_sequence.py
diy_statistic_avg_obj_num.py		diy_statistic_avg_obj_num.py
pc_convert.py		pc_convert.py
preprocess_nuscenes.py		preprocess_nuscenes.py
preprocess_pandaset.py		preprocess_pandaset.py
preprocess_sk.py		preprocess_sk.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SFT3D: Unsupervised Online 3D Instance Segmentation with Synthetic Sequences and Dynamic Loss (TMM2026)

Citation

Supported Datasets

Installation and Setup

Preprocessing

Training the Model

1. Non-temporal Training (Single Frame)

2. Temporal Training (Multiple Frames)

Inference

Evaluation

Troubleshooting

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SFT3D: Unsupervised Online 3D Instance Segmentation with Synthetic Sequences and Dynamic Loss (TMM2026)

Citation

Supported Datasets

Installation and Setup

Preprocessing

Training the Model

1. Non-temporal Training (Single Frame)

2. Temporal Training (Multiple Frames)

Inference

Evaluation

Troubleshooting

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages