Skip to content

Conversation

@yanirmr
Copy link

@yanirmr yanirmr commented Sep 19, 2025

Issue #, if available:

Description of changes:
This PR adds several new Hebrew language dataset entries to the Registry of Open Data on AWS, provided by the ivrit.ai project. These datasets are made available for the purpose of AI research and model training, with licensing tailored to permit such uses.

Datasets included:

  • ivrit-ai Crowd-Transcribe v5: Large crowd-sourced Hebrew speech dataset for ASR and language technology development.

  • ivrit-ai Hebrew Audio v2: Curated Hebrew audio corpus designed to support open-source ASR research.

  • ivrit-ai Knesset Plenums: Aligned audio and transcriptions from Israeli Knesset (parliament) plenary proceedings, supporting parliamentary speech research and automatic recognition.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Added ivrit-ai Hebrew Audio v2 dataset with details about its composition, licensing, and resources.
The dataset includes aligned Hebrew speech and transcriptions from Israeli Knesset sessions, supporting research in political discourse and ASR.
Updated tags to include more specific terms related to natural language processing and speech recognition.
@yanirmr
Copy link
Author

yanirmr commented Nov 27, 2025

Notebooks to use the dataset added to YAML.

@pschmied
Copy link
Contributor

pschmied commented Dec 4, 2025

HI @yanirmr Thank you for your submission. I had a look through your notebooks, and I appreciate you providing one for the three main components of this data collection. On the basis of those drafts, this is ready to move forward. One thing that we will ask you to include in at least one of the final notebooks is to provide at least one community challenge: an unanswered research question or an unsolved problem that you think could be addressed with your data. That can be purely narrative, and need not be solved by you.

@pschmied pschmied marked this pull request as draft December 4, 2025 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants