Felix (rightfully) asks for more details on the individual data sources, specifically thinking of ASR:
Something for all the datasets here: Mention level of preparation/rehearsal and whether the speaker is native or non-native. Maybe something about audio quality? Close-speaking microphone or noisy conditions?
Please keep that in mind when adding any datasets and improve the already included ones if possible.