Skip to content

Build a 3-star ChEBI dataset #61

@sfluegel05

Description

@sfluegel05

Status

Currently, we use all of chebi in our dataset. However, not all ChEBI data is equal. ChEBI distinguishes between 2-star and 3-star entities (see ChEBI user manual). 3-star entities are manually added by the ChEBI team, while other entities have been added by external parties.

Goal

The goal is to investigate which effect using only 3-star data has on the classification task. The hypothesis is that, for some classes, the 2-star entities are not classified correctly or completely. For example, tripeptide has 220 subclasses, 195 of which are 3-star. But there are about 8,000 peptides that should be classified as tripeptide, but are not. Most (if not all) of them are 2-star.

Task

Create a chebi dataset that selects only 3-star classes. Selecting the classes should be rather simple. The complicated part are the relations. Since we don't know which relations are 2-star (or if the relation of two 3-star classes can be trusted if they are held together by a 2-star class), we need to make compromises. The easiest solution would be to treat all relations as 3-star.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions