Skip to content

[SIGGRAPH Asia 2025] Official github repo of SeqTex, an end-to-end 3D texture generation method using video diffusion priors.

Notifications You must be signed in to change notification settings

VAST-AI-Research/SeqTex

Repository files navigation

SeqTex: Generate Mesh Textures in Video Sequence

(SIGGRAPH Asia 2025, Conference Track)

Ze Yuan* · Xin Yu*† · Yangtian Sun · Yuan-Chen Guo · Yan-Pei Cao · Ding Liang · Xiaojuan Qi
The University of Hong Kong   |   VAST
*Equal contribution    Project lead    Corresponding author

arXiv Paper Paper Project Page Model Checkpoint Hugging Face Demo


SeqTex Teaser

News

  • 2025-12-12: Code release v1.0.0! 🎉

Requirements

This repository is tested with 1 × NVIDIA A100 80GB GPU. For training, at least 8 GPUs with 80GB memory each are recommended. You can refer to the project page for a quick demo. We used 4 nodes to train for 1 week to obtain the results presented in the paper.

Environment

conda create -n seqtex python=3.10 -y
conda activate seqtex
conda install -c nvidia/label/cuda-11.8.0 cuda-toolkit -y
conda install -c conda-forge gxx=11 gcc=11 -y
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

Quick Start

Overfit

python launch.py --config configs/train.yaml --train \
    --gpu 0,1,2,3,4,5,6,7 trainer.num_nodes=1 name="train" tag="overfit" \
    data.scene_list=["data/indices/train_3d.jsonl","data/indices/train_image.jsonl"] \
    data.train_indices=[[0,1],[0,1]] data.task_list=["img2tex","geo2mv"] \
    data.eval_scene_list=["data/indices/train_3d.jsonl","data/indices/train_image.jsonl"] \
    data.val_indices=[[0,1],[0,1]] data.eval_task_list=["img2tex","geo2mv"] \
    data.extra_prompt_db=["data/indices/prompt_extension.json"] \
    data.repeat=1000 trainer.val_check_interval=1.0 

    # # to resume training, add below
    # system.weights=<main_ckpt_path>.ckpt system.ema_kwargs.cloud_or_local_key=<ema_path>.pth

SeqTex training uses FSDP by default; otherwise, it won't fit in 80GB GPU memory.

Testing

python launch.py --config configs/test.yaml --test \
    --gpu 0 trainer.num_nodes=1 name="test" tag="test_dtc" \
    data.eval_scene_list=["data/indices/test_one.jsonl"] \
    system.seqtex_transformer_name_or_path=VAST-AI/SeqTex-Transformer

Try the above command to perform a quick test for image-conditioned texture generation.

Alternatively, if you want more controllability, you can use SDXL to convert your text prompt to an image condition and then generate the texture map.

python launch.py --config configs/test.yaml --test \
    --gpu 0 trainer.num_nodes=1 name="test" tag="test_dtc_sdxl" \
    data.eval_scene_list=["data/indices/test_one.jsonl"] \
    system.seqtex_transformer_name_or_path=VAST-AI/SeqTex-Transformer \
    system.use_generated_img_cond=true

If everything goes well, you should get a result like the one shown here. You can use this as a sanity check.

Result Explanation

Each result contains 3-4 files:

outputs_wan/test/test_dtc_sdxl@20251013-121842/save
├── it0-test-0_0-img_cond.png           # (text2tex ONLY) image condition from SDXL/FLUX.
├── it0-test-0_0-mv-taskimg2tex.png     # please refer to [assets/explain.png](assets/explain.png)
├── it0-test-0_0-prompt.json            # the model id, and text prompt used
└── it0-test-0_0-uv-taskimg2tex.png     # 3 rows: position and normal map, and the final texture map generated

Data Organization

Two types of datasets are supported: 3D dataset and image dataset. You can prepare the data in the following format. The processing scripts may be partially found in our HF space demo (It has some known issues, e.g., fails to do UV unwrapping for some cases).

[Note]: Every 3D model with multiple parts should be merged into a single mesh with a single UV map and a single texture map. The texture map should preferably be an albedo map without lighting/shading. For image datasets, PBR shaded images are also supported.

3D Dataset

For 3D datasets, each directory contains a 3D model and a corresponding texture map. This is the primary data format we need. It is the data used by the img2tex task.

data/examples
└── 99
    └── 999c2f55-0cf6-4d46-913a-7671085d03f6
        ├── model.glb
        └── model.jpeg # preferably an albedo texture map

Image Dataset

For image datasets, each directory contains a set of images. This is an alternative data format to improve the generalization ability of the model. It is the data used by the geo2mv task.

data/examples_train_image
└── f6
    └── f63290551cac423883742cfdb8acc9ff
        ├── meta.json # transform_matrixs for each image
        ├── color_0000.webp # PBR shaded images are also supported
        ...
        ├── depth_0000.exr
        ...
        └── normal_0005.webp

Citation

If SeqTex is used in your work, please cite:

@inproceedings{10.1145/3757377.3763863,
  author = {Yuan, Ze and Yu, Xin and Sun, Yangtian and Guo, Yuan-Chen and Cao, Yan-Pei and Liang, Ding and Qi, Xiaojuan},
  title = {SeqTex: Generate Mesh Textures in Video Sequence},
  year = {2025},
  isbn = {9798400721373},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://dl.acm.org/doi/10.1145/3757377.3763863},
  doi = {10.1145/3757377.3763863},
  booktitle = {Proceedings of the SIGGRAPH Asia 2025 Conference Papers},
  articleno = {25},
  numpages = {12},
  keywords = {Video Diffusion Models, Diffusion Techniques, Texture Generation},
  series = {SA Conference Papers '25}
}

Acknowledgement

We sincerely thank the following open-source projects:

  • MV-Adapter for the coding framework.
  • Wan2.1 for the 3D prior from video sequences
  • SDXL and FLUX for high fidelity text-to-image generation
  • diffusers for the diffusion model implementation
  • lightning for saving my time

Additionally, we thank the following people for their help:

  • Special thanks to Toshihiro Hayashi for his valuable support and assistance in fixing bugs for our HF demo.
  • We thank EasyMode-AI for their efforts in integrating SeqTex into ComfyUI. See here.

About

[SIGGRAPH Asia 2025] Official github repo of SeqTex, an end-to-end 3D texture generation method using video diffusion priors.

Topics

Resources

Stars

Watchers

Forks