the motion module can not learn temporal information when using a small number of training video.

I have developed a project that replace unet in animatediff with the transformer block in stable diffusion 3.5. Given the hardware resource, I only use 20 videos as training set. And to verify the overfiting ability, I sampled three video from training set as validation set.
In training phrase, firstly, I Initialized the transformer block with the pretrained weight of transformer in sd 3.5, and then the transformer block is frozen. I only train the motion module from scrach.
The issue is that in validation phrase, the videos generated by the model appear to be pieced together from many frames that lack temporal coherence.

And I also train the official animatediff using the same training set and freeze the unet, the same issue has occurred again.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the motion module can not learn temporal information when using a small number of training video. #430

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

the motion module can not learn temporal information when using a small number of training video. #430

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions