Sortformer does not pick up audio duration from the manifest

**Describe the bug**

Hi team,

I was experimenting with streaming sortformer and discovered that it was not picking up `duration` parameter from the input manifest. Based on HF instructions the model expects a manifest file, where offset and duration of the slice that needs to be diarized are set as:

```
# Example of a line in `multispeaker_manifest.json`
{
    "audio_filepath": "/path/to/multispeaker_audio1.wav",  # path to the input audio file 
    "offset": 0, # offset (start) time of the input audio
    "duration": 600,  # duration of the audio, can be set to `null` if using NeMo main branch
}
{
    "audio_filepath": "/path/to/multispeaker_audio2.wav",  
    "offset": 900,
    "duration": 580,  
}
```

https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2


I suspect that could be because at this line the code ignores all fields in the manifest except for `audio_filepath`:

https://github.com/NVIDIA-NeMo/NeMo/blob/main/nemo/collections/asr/parts/mixins/diarization.py#L373

Later `_diarize_input_manifest_processing` iterates over audio files, where each entry is a string (as this is the only field preserved at the previous step) and populates other entries with:

```
entry = {'audio_filepath': audio_file, 'duration': 100000, 'text': ''}
```

https://github.com/NVIDIA-NeMo/NeMo/blob/main/nemo/collections/asr/parts/mixins/diarization.py#L405



I might be reading the code wrong, please help me figure out if my inputs to the model are invalid



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Sortformer does not pick up audio duration from the manifest #14977

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Sortformer does not pick up audio duration from the manifest #14977

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions