-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Labels
Description
At the moment, geofetch can download, filter, save metadata for the specific accessions in GEO. But metadata in GEO is stored in different, messy ways. Some of the information can be redundant and some can be stored in different places.
e.g. sample genome information may be stored in 3 (or more) different keys (dictionary keys):
- 'Sample_description': ['assembly: 'hg19', ...]
- "Sample_characteristics_ch1": ['genome build': 'hg19', ...]
- "Sample_data_processing": ['Genome_build': 'hg19', ...]
To create good, standardized PEP .csv metadata file, all information has to be be carefuly preprocessed. Especially this can be useful to create new endpoint in pephub.
In my opinion we have to create new class, or set of function, that will be separated from geofetch and will standardize all GEO metadata.