A tool to identify, standardize, and export complex tabular data from scientific papers to facilitate use by other tools, formats, and applications.
Scientific papers often contain complex tabular data related to the pharmacological activities, uses, and effects of drugs, plant extracts, and their chemical constituents. This data includes Minimum Effective Concentrations, Maximum Safe Concentrations, and toxicity measurements, as well as information on plant names, extract types, extraction methods, and the concentration of each detected chemical constituent.
However, there is no standardized method for displaying this data, making it difficult to extract and use for knowledge transfer.
The big challenge in using this data (from hundreds of thousands of papers) is that no single standard exists for how authors display their data. They can often combine different types of data, which should be separated, in the same table. Or they’ll display data for many tested plants (for example) in the same table, some with their own columns of data and sometimes with their own horizontal sections in rows of data. All of this makes data extraction (let alone knowledge extraction) a very difficult task. Currently, no available tool can do this without human intervention, and due to the number of papers involved and the variations in deciphering that will be needed, finding enough humans to do this accurately and cost-effectively is next to impossible. Additionally, extracting, interpreting, and normalizing the data from PDFs, rather than machine-readable formats such as XML, is another impediment that currently makes this task impossible.
The purpose of this project is to develop a tool that can identify, standardize, and export complex tabular data from scientific papers, starting with PDF and XML formats. This will enable the efficient transfer of knowledge to other tools and applications.
- DBT-Cambridge Lecturer at Department of Plant Sciences, University of Cambridge
- Staff Scientist at National Institute of Plant Genome Research (NIPGR)
- Citations:
- Phytomedicine Detective
- Founder/Formulator at Verriclear Natural Skin Essentials Ltd.
- Verriclear participates in non-profit science to create skincae products
- https://www.verriclear.com