Skip to content

Commit 3b1d85d

Browse files
committed
Add notes on file structure in Voxceleb1 based datasets (#2776)
Summary: The file structure of VoxCeleb1 is as follows: ``` root/ └── wav/ └── speaker_id folders ``` Users who use [Kaldi](https://github.com/kaldi-asr/kaldi/blob/f6f4ccaf213f0fe8b26e633a7dc0c802150626a0/egs/voxceleb/v1/local/make_voxceleb1_v2.pl) to get the VoxCeleb1 dataset have "dev" and "test" folders above "wav" folder. However, in the file lists like https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test.txt or https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/iden_split.txt there is not such differentiation. It's not necessary to put the extracted files into separate folders. This PR adds notes in `VoxCeleb1Identification` and `VoxCeleb1Verification` datasets to inform the file structure to users. Pull Request resolved: #2776 Reviewed By: carolineechen Differential Revision: D40483707 Pulled By: nateanl fbshipit-source-id: ccd1780a72a5b53f0300c2466c3073a293ad7b8d
1 parent 9a013fd commit 3b1d85d

File tree

1 file changed

+24
-0
lines changed

1 file changed

+24
-0
lines changed

torchaudio/datasets/voxceleb1.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,18 @@ class VoxCeleb1Identification(VoxCeleb1):
135135
(Default: ``"https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/iden_split.txt"``)
136136
download (bool, optional):
137137
Whether to download the dataset if it is not found at root path. (Default: ``False``).
138+
139+
Note:
140+
The file structure of `VoxCeleb1Identification` dataset is as follows:
141+
142+
└─ root/
143+
144+
└─ wav/
145+
146+
└─ speaker_id folders
147+
148+
Users who pre-downloaded the ``"vox1_dev_wav.zip"`` and ``"vox1_test_wav.zip"`` files need to move
149+
the extracted files into the same ``root`` directory.
138150
"""
139151

140152
def __init__(
@@ -215,6 +227,18 @@ class VoxCeleb1Verification(VoxCeleb1):
215227
(Default: ``"https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test.txt"``)
216228
download (bool, optional):
217229
Whether to download the dataset if it is not found at root path. (Default: ``False``).
230+
231+
Note:
232+
The file structure of `VoxCeleb1Verification` dataset is as follows:
233+
234+
└─ root/
235+
236+
└─ wav/
237+
238+
└─ speaker_id folders
239+
240+
Users who pre-downloaded the ``"vox1_dev_wav.zip"`` and ``"vox1_test_wav.zip"`` files need to move
241+
the extracted files into the same ``root`` directory.
218242
"""
219243

220244
def __init__(self, root: Union[str, Path], meta_url: str = _VERI_TEST_URL, download: bool = False) -> None:

0 commit comments

Comments
 (0)