-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Hi all,
Thank you for this nice evaluation framework! When experimenting with different audio segmentations, we noticed some issues with the current delay evaluation within SLTev.
- If a word type occurs multiple times in a segment, all of these tokens get assigned the time stamp of the first occurrence. This means even if the second (or later) occurrences appear with delay, they do not count towards the delay measure:
example:
P 13.61 12.61 13.61 hello
P 14.61 12.61 14.61 hello,
P 15.61 12.61 15.61 hello,
C 16.61 12.61 16.61 hello, hello
vs.
P 13.61 12.61 13.61 hello
C 14.61 12.61 14.61 hello, hello
will result in exactly the same delay even though the second “hello” is generated 2 seconds later in the first translation.
- Some first token occurrences can also be hallucinations. As described above, the time stamps from the hallucinations would then also be assigned to the correct words that occur in a later partial or complete hypothesis. In this case, the translation model is rewarded for hallucinating:
example:
P 13.61 12.61 13.61 This is a hallucination!
P 14.61 12.61 14.61 Peter
P 15.61 12.61 15.61 Peter is a
C 16.61 12.61 16.61 Peter is a music teacher.
will get a shorter delay than:
P 13.61 12.61 13.61 Peter
P 14.61 12.61 14.61 Peter
P 15.61 12.61 15.61 Peter is a
C 16.61 12.61 16.61 Peter is a music teacher.
because “is” and “a” get assigned time stamp 13.61 whereas in the second example they are assigned time stamp 15.61.
- These problems become even more prominent with longer segments that span across multiple sentences because the chance of repeated words is higher.
example:
P 13.61 12.61 13.61 Peter
P 14.61 12.61 14.61 Peter is a
C 15.61 12.61 15.61 Peter is a music teacher.
P 16.61 15.61 16.61 He is
P 17.61 15.61 17.61 He is doing a
C 18.61 15.61 18.61 He is doing a great job.
will get a higher delay than:
P 13.61 12.61 13.61 Peter
P 14.61 12.61 14.61 Peter is a
P 15.61 12.61 15.61 Peter is a music teacher.
P 16.61 12.61 16.61 Peter is a music teacher. He is
P 17.61 12.61 17.61 Peter is a music teacher. He is doing a
C 18.61 12.61 18.61 Peter is a music teacher. He is doing a great job.
even though the content and generation times are exactly the same. Simply because in the longer segment the repeated words “is”, “a” and “.” get assigned the time stamp from their first occurrence in the previous sentence.
Best,
Chantal