Skip to content

[WIP] Dataset management#25

Open
gungui98 wants to merge 6 commits intomainfrom
dataset_management
Open

[WIP] Dataset management#25
gungui98 wants to merge 6 commits intomainfrom
dataset_management

Conversation

@gungui98
Copy link
Collaborator

The fact that UETAI has several datasets, some of them is very large, So it is not feasible to download the dataset everytime the runner is trigger.
I create this submodule as an initiative to manage all those datasets at once.

Usage

The modules will support some function and class to handle dataset management on multiple machine at UETAI.
We create a shared NFS upon our cluster. Every machine with GPU will access and read data from that. The Runner, there for, have to mount to the shared NFS in order to acess the data.

The developer will use data_path function from logger to access to certain dataset given dataset_name
and alias.

from uetai.logger import SummaryWriter
logger = SummaryWriter("my_experiment")
data_path = logger.data_path(path="./my/local/path", dataset_name="echo", alias="latest")

The data_path actually follow the 2 environment scenario:

  1. If developer run the code on local machine. It will return the original path.
  2. If developer run the code on github runner. It will return the full data depend on dataset_name and alias.

Registry

We create a registry to mapping and validate whether dataset path is correct.

Checklists

@gungui98 gungui98 added the enhancement New feature or request label Oct 10, 2021
@gungui98 gungui98 self-assigned this Oct 10, 2021
@gungui98
Copy link
Collaborator Author

@manhdung20112000 ping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants