problems with delay computation if words are repeated

Hi all,

Thank you for this nice evaluation framework! When experimenting with different audio segmentations, we noticed some issues with the current delay evaluation within SLTev.


1. If a word type occurs multiple times in a segment, all of these tokens get assigned the time stamp of the first occurrence. This means even if the second (or later) occurrences appear with delay, they do not count towards the delay measure:

example:

P 13.61 12.61 13.61 hello
P 14.61 12.61 14.61 hello,
P 15.61 12.61 15.61 hello,
C 16.61 12.61 16.61 hello, hello

vs.
P 13.61 12.61 13.61 hello
C 14.61 12.61 14.61 hello, hello

will result in exactly the same delay even though the second “hello” is generated 2 seconds later in the first translation.


2. Some first token occurrences can also be hallucinations. As described above, the time stamps from the hallucinations would then also be assigned to the correct words that occur in a later partial or complete hypothesis. In this case, the translation model is rewarded for hallucinating:

example:

P 13.61 12.61 13.61 This is a hallucination!
P 14.61 12.61 14.61 Peter
P 15.61 12.61 15.61 Peter is a
C 16.61 12.61 16.61 Peter is a music teacher.

will get a shorter delay than:

P 13.61 12.61 13.61 Peter
P 14.61 12.61 14.61 Peter
P 15.61 12.61 15.61 Peter is a
C 16.61 12.61 16.61 Peter is a music teacher.

because “is” and “a” get assigned time stamp 13.61 whereas in the second example they are assigned time stamp 15.61.


3. These problems become even more prominent with longer segments that span across multiple sentences because the chance of repeated words is higher.

example:

P 13.61 12.61 13.61 Peter
P 14.61 12.61 14.61 Peter is a
C 15.61 12.61 15.61 Peter is a music teacher.
P 16.61 15.61 16.61 He is
P 17.61 15.61 17.61 He is doing a
C 18.61 15.61 18.61 He is doing a great job.

will get a higher delay than:

P 13.61 12.61 13.61 Peter
P 14.61 12.61 14.61 Peter is a
P 15.61 12.61 15.61 Peter is a music teacher.
P 16.61 12.61 16.61 Peter is a music teacher. He is
P 17.61 12.61 17.61 Peter is a music teacher. He is doing a
C 18.61 12.61 18.61 Peter is a music teacher. He is doing a great job.

even though the content and generation times are exactly the same. Simply because in the longer segment the repeated words “is”, “a” and “.” get assigned the time stamp from their first occurrence in the previous sentence.

Best,
Chantal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

problems with delay computation if words are repeated #72

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

problems with delay computation if words are repeated #72

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions