SeqTex: Generate Mesh Textures in Video Sequence

(SIGGRAPH Asia 2025, Conference Track)

Ze Yuan^* · Xin Yu^*† · Yangtian Sun · Yuan-Chen Guo · Yan-Pei Cao · Ding Liang · Xiaojuan Qi^✉
The University of Hong Kong | VAST
^*Equal contribution ^†Project lead ^✉Corresponding author

News

2025-12-12: Code release v1.0.0! 🎉

Requirements

This repository is tested with 1 × NVIDIA A100 80GB GPU. For training, at least 8 GPUs with 80GB memory each are recommended. You can refer to the project page for a quick demo. We used 4 nodes to train for 1 week to obtain the results presented in the paper.

Environment

conda create -n seqtex python=3.10 -y
conda activate seqtex
conda install -c nvidia/label/cuda-11.8.0 cuda-toolkit -y
conda install -c conda-forge gxx=11 gcc=11 -y
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

Quick Start

Overfit

python launch.py --config configs/train.yaml --train \
    --gpu 0,1,2,3,4,5,6,7 trainer.num_nodes=1 name="train" tag="overfit" \
    data.scene_list=["data/indices/train_3d.jsonl","data/indices/train_image.jsonl"] \
    data.train_indices=[[0,1],[0,1]] data.task_list=["img2tex","geo2mv"] \
    data.eval_scene_list=["data/indices/train_3d.jsonl","data/indices/train_image.jsonl"] \
    data.val_indices=[[0,1],[0,1]] data.eval_task_list=["img2tex","geo2mv"] \
    data.extra_prompt_db=["data/indices/prompt_extension.json"] \
    data.repeat=1000 trainer.val_check_interval=1.0 

    # # to resume training, add below
    # system.weights=<main_ckpt_path>.ckpt system.ema_kwargs.cloud_or_local_key=<ema_path>.pth

SeqTex training uses FSDP by default; otherwise, it won't fit in 80GB GPU memory.

Testing

python launch.py --config configs/test.yaml --test \
    --gpu 0 trainer.num_nodes=1 name="test" tag="test_dtc" \
    data.eval_scene_list=["data/indices/test_one.jsonl"] \
    system.seqtex_transformer_name_or_path=VAST-AI/SeqTex-Transformer

Try the above command to perform a quick test for image-conditioned texture generation.

Alternatively, if you want more controllability, you can use SDXL to convert your text prompt to an image condition and then generate the texture map.

python launch.py --config configs/test.yaml --test \
    --gpu 0 trainer.num_nodes=1 name="test" tag="test_dtc_sdxl" \
    data.eval_scene_list=["data/indices/test_one.jsonl"] \
    system.seqtex_transformer_name_or_path=VAST-AI/SeqTex-Transformer \
    system.use_generated_img_cond=true

If everything goes well, you should get a result like the one shown here. You can use this as a sanity check.

Result Explanation

Each result contains 3-4 files:

outputs_wan/test/test_dtc_sdxl@20251013-121842/save
├── it0-test-0_0-img_cond.png           # (text2tex ONLY) image condition from SDXL/FLUX.
├── it0-test-0_0-mv-taskimg2tex.png     # please refer to [assets/explain.png](assets/explain.png)
├── it0-test-0_0-prompt.json            # the model id, and text prompt used
└── it0-test-0_0-uv-taskimg2tex.png     # 3 rows: position and normal map, and the final texture map generated

Data Organization

Two types of datasets are supported: 3D dataset and image dataset. You can prepare the data in the following format. The processing scripts may be partially found in our HF space demo (It has some known issues, e.g., fails to do UV unwrapping for some cases).

[Note]: Every 3D model with multiple parts should be merged into a single mesh with a single UV map and a single texture map. The texture map should preferably be an albedo map without lighting/shading. For image datasets, PBR shaded images are also supported.

3D Dataset

For 3D datasets, each directory contains a 3D model and a corresponding texture map. This is the primary data format we need. It is the data used by the img2tex task.

data/examples
└── 99
    └── 999c2f55-0cf6-4d46-913a-7671085d03f6
        ├── model.glb
        └── model.jpeg # preferably an albedo texture map

Image Dataset

For image datasets, each directory contains a set of images. This is an alternative data format to improve the generalization ability of the model. It is the data used by the geo2mv task.

data/examples_train_image
└── f6
    └── f63290551cac423883742cfdb8acc9ff
        ├── meta.json # transform_matrixs for each image
        ├── color_0000.webp # PBR shaded images are also supported
        ...
        ├── depth_0000.exr
        ...
        └── normal_0005.webp

Citation

If SeqTex is used in your work, please cite:

@inproceedings{10.1145/3757377.3763863,
  author = {Yuan, Ze and Yu, Xin and Sun, Yangtian and Guo, Yuan-Chen and Cao, Yan-Pei and Liang, Ding and Qi, Xiaojuan},
  title = {SeqTex: Generate Mesh Textures in Video Sequence},
  year = {2025},
  isbn = {9798400721373},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://dl.acm.org/doi/10.1145/3757377.3763863},
  doi = {10.1145/3757377.3763863},
  booktitle = {Proceedings of the SIGGRAPH Asia 2025 Conference Papers},
  articleno = {25},
  numpages = {12},
  keywords = {Video Diffusion Models, Diffusion Techniques, Texture Generation},
  series = {SA Conference Papers '25}
}

Acknowledgement

We sincerely thank the following open-source projects:

MV-Adapter for the coding framework.
Wan2.1 for the 3D prior from video sequences
SDXL and FLUX for high fidelity text-to-image generation
diffusers for the diffusion model implementation
lightning for saving my time

Additionally, we thank the following people for their help:

Special thanks to Toshihiro Hayashi for his valuable support and assistance in fixing bugs for our HF demo.
We thank EasyMode-AI for their efforts in integrating SeqTex into ComfyUI. See here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SeqTex: Generate Mesh Textures in Video Sequence

(SIGGRAPH Asia 2025, Conference Track)

News

Requirements

Environment

Quick Start

Overfit

Testing

Result Explanation

Data Organization

3D Dataset

Image Dataset

Citation

Acknowledgement

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
configs		configs
data		data
scripts		scripts
seqtex		seqtex
shells		shells
.gitignore		.gitignore
README.md		README.md
launch.py		launch.py
requirements.txt		requirements.txt
third_party_license.md		third_party_license.md

VAST-AI-Research/SeqTex

Folders and files

Latest commit

History

Repository files navigation

SeqTex: Generate Mesh Textures in Video Sequence

(SIGGRAPH Asia 2025, Conference Track)

News

Requirements

Environment

Quick Start

Overfit

Testing

Result Explanation

Data Organization

3D Dataset

Image Dataset

Citation

Acknowledgement

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages