Skip to content

Metadata standardization #47

@khoroshevskyi

Description

@khoroshevskyi

At the moment, geofetch can download, filter, save metadata for the specific accessions in GEO. But metadata in GEO is stored in different, messy ways. Some of the information can be redundant and some can be stored in different places.

e.g. sample genome information may be stored in 3 (or more) different keys (dictionary keys):

  • 'Sample_description': ['assembly: 'hg19', ...]
  • "Sample_characteristics_ch1": ['genome build': 'hg19', ...]
  • "Sample_data_processing": ['Genome_build': 'hg19', ...]

To create good, standardized PEP .csv metadata file, all information has to be be carefuly preprocessed. Especially this can be useful to create new endpoint in pephub.

In my opinion we have to create new class, or set of function, that will be separated from geofetch and will standardize all GEO metadata.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions