Skip to content

problems with delay computation if words are repeated #72

@chanberg

Description

@chanberg

Hi all,

Thank you for this nice evaluation framework! When experimenting with different audio segmentations, we noticed some issues with the current delay evaluation within SLTev.

  1. If a word type occurs multiple times in a segment, all of these tokens get assigned the time stamp of the first occurrence. This means even if the second (or later) occurrences appear with delay, they do not count towards the delay measure:

example:

P 13.61 12.61 13.61 hello
P 14.61 12.61 14.61 hello,
P 15.61 12.61 15.61 hello,
C 16.61 12.61 16.61 hello, hello

vs.
P 13.61 12.61 13.61 hello
C 14.61 12.61 14.61 hello, hello

will result in exactly the same delay even though the second “hello” is generated 2 seconds later in the first translation.

  1. Some first token occurrences can also be hallucinations. As described above, the time stamps from the hallucinations would then also be assigned to the correct words that occur in a later partial or complete hypothesis. In this case, the translation model is rewarded for hallucinating:

example:

P 13.61 12.61 13.61 This is a hallucination!
P 14.61 12.61 14.61 Peter
P 15.61 12.61 15.61 Peter is a
C 16.61 12.61 16.61 Peter is a music teacher.

will get a shorter delay than:

P 13.61 12.61 13.61 Peter
P 14.61 12.61 14.61 Peter
P 15.61 12.61 15.61 Peter is a
C 16.61 12.61 16.61 Peter is a music teacher.

because “is” and “a” get assigned time stamp 13.61 whereas in the second example they are assigned time stamp 15.61.

  1. These problems become even more prominent with longer segments that span across multiple sentences because the chance of repeated words is higher.

example:

P 13.61 12.61 13.61 Peter
P 14.61 12.61 14.61 Peter is a
C 15.61 12.61 15.61 Peter is a music teacher.
P 16.61 15.61 16.61 He is
P 17.61 15.61 17.61 He is doing a
C 18.61 15.61 18.61 He is doing a great job.

will get a higher delay than:

P 13.61 12.61 13.61 Peter
P 14.61 12.61 14.61 Peter is a
P 15.61 12.61 15.61 Peter is a music teacher.
P 16.61 12.61 16.61 Peter is a music teacher. He is
P 17.61 12.61 17.61 Peter is a music teacher. He is doing a
C 18.61 12.61 18.61 Peter is a music teacher. He is doing a great job.

even though the content and generation times are exactly the same. Simply because in the longer segment the repeated words “is”, “a” and “.” get assigned the time stamp from their first occurrence in the previous sentence.

Best,
Chantal

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions