Some issues at a sampling rate of 16k

I retrained a HiFi-GAN vocoder using data sampled at 16 kHz and replaced the original generator_universal.pth.tar in the FastSpeech2 project with the resulting checkpoint (g_00150000). Subsequently, FastSpeech2 was trained on the same dataset. However, during training, the audio samples logged in TensorBoard exhibited noticeable issues, including speaker drift and missing or dropped phonetic segments. Is there any insight into the possible causes of these issues?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some issues at a sampling rate of 16k #250

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Some issues at a sampling rate of 16k #250

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions