-
Notifications
You must be signed in to change notification settings - Fork 3
Text and data mining
rick70002 edited this page Sep 18, 2015
·
55 revisions
- Textmining phenotypes (Rob H, Hongyan Wu, Yas; interested: S Kawashima, Tudor, R Vos, Mark W, S Kumagai, MichelD, interested: Joe Miyamoto)
- Drug interactions and phenotypes
- complex phenotypes
- environment-phenotype interactions
- host-pathogen interactions and phenotypes for infectious diseases
- microbial phenotypes, environments, genetic information (add to MicrobeDB?)
- Create a X Phenotype Ontology, X \in { frog, chicken, mosquito }
- Natural language processing, literature annotation and QA system
- Interoperability of text mining/annotation resources (Tudor, interested: Takatomo, E. Bolton, Takeru Nakazato, Raoul, MarkT)
- Goal
- to collect annotations to Biomedical Literature
- to align them
- to make them publicly accessible
- REST API
- SPARQL endpoint
- to find and develop appliations
- Resources
- PubTator REST API to become compatible with PubAnnotation Annotation Server API (Wei)
- Phenotype annotation to be registered to PubAnnotation (Tudor)
- TextAE to be used in Patient Archive and Orphanet Knowledge Management (Tudor)
- DisGeNet annotation to be registered to PubAnnotation (Núria, Tudor)
- EuropeanPMC - PubAnnotation Interoperability (Jee-Hyub)
- Nanotea (Alex)
- PubAnnotation - an open repository of literature annotation
- Goal
- Knowledge Graph Annotator for human curation (MarkT, Jee-Hyub, interested: E. Bolton)
- Finished: bug fixes and working nanopub store interface
- New feature ideas (many thanks to Nuria and Erick):
- more authentication options, e.g. LinkedIn, Twitter, Scopus, ResearchID
- options to store annotation (nanopub) in different locations
- register type of evidence with ECO ontology, or
- register source of evidence with PubAnnotation URL
- In progress: RDFa bookmarklet to connect to the annotator from any html page
- (Graph-based) data analytics on top of integrated text mining data sources (disease - gene - phenotype - chemical entities - species) (Tudor, interested: E. Bolton, Atsuko, Joe Miyamoto)
- QA over LOD
- Bio2RDF+UniProt setup for LODQA (Michel, Hongyan, Jin-Dong, Jerven, interested: E. Bolton)
- Interoperability of text mining/annotation resources (Tudor, interested: Takatomo, E. Bolton, Takeru Nakazato, Raoul, MarkT)
- Motivation
- Various annotation projects sharing the same target, PubMed and PMC.
- They are maintained in silos.
- Goal
- To collect annotations to literature and align them
- To estabilish interoperability between text annotation resources
- To make them publicly accessible through dereferenceable URIs, REST API, RDF and SPARQL endpoints
- To find and develop applications
- Participants
- Wei, Tudor, Núria, Mark T., Jee-Hyub, Kevin, Alex G, Jin-Dong ...
- Started integration of several text mined data sets:
- DisGeNET (diseases + genes)
- PubTator (diseases + genes + variants + species)
- HPO Pubmed annotations
- DisGeRel? (diseases + genes)
- PubAnnotation to provide alignment and storage of annotation resources
- API-level interoperability
- PubTator is interoperable with PubAnnotation at API level.
- Phenotype CR
- Issues
- How to represent document level annotation.
- How to align concept labels (ontology alignment problem).
- How to do the quality assessment.
- Goals:
- Naive comparison of text mined concepts - for quality purposes
- Detection of gaps between mined data and curated domain knowledge with focus on disease - gene - phenotype associations (e.g., inferring new phenotypes for rare disorders)
- Clustering of genes based on phenotypes and / or phenotypes based on genes - and perhaps going forward towards BPs | MFs, etc ... (integration with GO)
- Manual curated MESH term annotation
- Integrated the MeSH-PMIDs manual curated annotation in PubAnnotation.
- E-utilites for accessing the title/abstract and MESH terms.
- Disease/Chemical are recognized as two individual categories, otherwise are represented by the general category: "MESH".
- Gang Fu used SPARQL to extract MeSH synonyms for string match.
- Exact match with simple pre- and post-processing.
- To-Do: separates genes, species as two individual categories from the general category.
- Ex. http://textae.pubannotation.org/editor.html?mode=edit&target=http://pubannotation.org/projects/Test_MeSH/docs/sourcedb/PubMed/sourceid/9916105/annotations.json
- Integrated the MeSH-PMIDs manual curated annotation in PubAnnotation.
- Drug labels and human/mouse/rat phenotypes: first attempt at http://aber-owl.net/aber-owl/diseasephenotypes/drugs/