Skip to content

Some issues at a sampling rate of 16k #250

@zhNfly

Description

@zhNfly

I retrained a HiFi-GAN vocoder using data sampled at 16 kHz and replaced the original generator_universal.pth.tar in the FastSpeech2 project with the resulting checkpoint (g_00150000). Subsequently, FastSpeech2 was trained on the same dataset. However, during training, the audio samples logged in TensorBoard exhibited noticeable issues, including speaker drift and missing or dropped phonetic segments. Is there any insight into the possible causes of these issues?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions