-
Couldn't load subscription status.
- Fork 3.2k
Description
Describe the bug
Hi team,
I was experimenting with streaming sortformer and discovered that it was not picking up duration parameter from the input manifest. Based on HF instructions the model expects a manifest file, where offset and duration of the slice that needs to be diarized are set as:
# Example of a line in `multispeaker_manifest.json`
{
"audio_filepath": "/path/to/multispeaker_audio1.wav", # path to the input audio file
"offset": 0, # offset (start) time of the input audio
"duration": 600, # duration of the audio, can be set to `null` if using NeMo main branch
}
{
"audio_filepath": "/path/to/multispeaker_audio2.wav",
"offset": 900,
"duration": 580,
}
https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2
I suspect that could be because at this line the code ignores all fields in the manifest except for audio_filepath:
https://github.com/NVIDIA-NeMo/NeMo/blob/main/nemo/collections/asr/parts/mixins/diarization.py#L373
Later _diarize_input_manifest_processing iterates over audio files, where each entry is a string (as this is the only field preserved at the previous step) and populates other entries with:
entry = {'audio_filepath': audio_file, 'duration': 100000, 'text': ''}
https://github.com/NVIDIA-NeMo/NeMo/blob/main/nemo/collections/asr/parts/mixins/diarization.py#L405
I might be reading the code wrong, please help me figure out if my inputs to the model are invalid