Skip to content

Inquiry about Continual Training with WavLM and Pre-Training Resources #54

@CantaoSu

Description

@CantaoSu

Hi,

I'm currently working on my master's thesis, which involves developing an Automatic Speech Recognition (ASR) model for Dutch dysarthric speech. My approach is to further pre-train the WavLM Large model (already trained on English) with 400 hours of Dutch normal speech, then fine-tune it with one hour of Dutch dysarthric speech, before comparing it with Wav2Vec 2.0.

However, I've encountered a roadblock. I couldn't find any pre-training resources for WavLM in the S3PRL toolkit or any related documentation. Given that WavLM has been pre-trained on English, I wanted to explore continual training or pre-training in Dutch. I was directed to this repository by the S3PRL team, but I'm not sure if there are existing resources or examples for this type of task.

Would you be able to guide me on how to approach this problem? Specifically, I need to know if it's possible to pre-train WavLM with different datasets and if there are any recipes or scripts available to help with this process. Any advice or pointers to documentation, examples, or other resources would be greatly appreciated.

Thank you in advance for your assistance. I look forward to your response.

Best regards

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions