Skip to content

Sortformer does not pick up audio duration from the manifest #14977

@orech

Description

@orech

Describe the bug

Hi team,

I was experimenting with streaming sortformer and discovered that it was not picking up duration parameter from the input manifest. Based on HF instructions the model expects a manifest file, where offset and duration of the slice that needs to be diarized are set as:

# Example of a line in `multispeaker_manifest.json`
{
    "audio_filepath": "/path/to/multispeaker_audio1.wav",  # path to the input audio file 
    "offset": 0, # offset (start) time of the input audio
    "duration": 600,  # duration of the audio, can be set to `null` if using NeMo main branch
}
{
    "audio_filepath": "/path/to/multispeaker_audio2.wav",  
    "offset": 900,
    "duration": 580,  
}

https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2

I suspect that could be because at this line the code ignores all fields in the manifest except for audio_filepath:

https://github.com/NVIDIA-NeMo/NeMo/blob/main/nemo/collections/asr/parts/mixins/diarization.py#L373

Later _diarize_input_manifest_processing iterates over audio files, where each entry is a string (as this is the only field preserved at the previous step) and populates other entries with:

entry = {'audio_filepath': audio_file, 'duration': 100000, 'text': ''}

https://github.com/NVIDIA-NeMo/NeMo/blob/main/nemo/collections/asr/parts/mixins/diarization.py#L405

I might be reading the code wrong, please help me figure out if my inputs to the model are invalid

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions