Skip to content

the motion module can not learn temporal information when using a small number of training video. #430

@slippersman

Description

@slippersman

I have developed a project that replace unet in animatediff with the transformer block in stable diffusion 3.5. Given the hardware resource, I only use 20 videos as training set. And to verify the overfiting ability, I sampled three video from training set as validation set.
In training phrase, firstly, I Initialized the transformer block with the pretrained weight of transformer in sd 3.5, and then the transformer block is frozen. I only train the motion module from scrach.
The issue is that in validation phrase, the videos generated by the model appear to be pieced together from many frames that lack temporal coherence.

And I also train the official animatediff using the same training set and freeze the unet, the same issue has occurred again.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions