Skip to content

Linereck/tabletamer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tabletamer

A tool to identify, standardize, and export complex tabular data from scientific papers to facilitate use by other tools, formats, and applications.

Background:

Scientific papers often contain complex tabular data related to the pharmacological activities, uses, and effects of drugs, plant extracts, and their chemical constituents. This data includes Minimum Effective Concentrations, Maximum Safe Concentrations, and toxicity measurements, as well as information on plant names, extract types, extraction methods, and the concentration of each detected chemical constituent.

However, there is no standardized method for displaying this data, making it difficult to extract and use for knowledge transfer.

The big challenge in using this data (from hundreds of thousands of papers) is that no single standard exists for how authors display their data. They can often combine different types of data, which should be separated, in the same table. Or they’ll display data for many tested plants (for example) in the same table, some with their own columns of data and sometimes with their own horizontal sections in rows of data. All of this makes data extraction (let alone knowledge extraction) a very difficult task. Currently, no available tool can do this without human intervention, and due to the number of papers involved and the variations in deciphering that will be needed, finding enough humans to do this accurately and cost-effectively is next to impossible. Additionally, extracting, interpreting, and normalizing the data from PDFs, rather than machine-readable formats such as XML, is another impediment that currently makes this task impossible.

Our Purpose:

The purpose of this project is to develop a tool that can identify, standardize, and export complex tabular data from scientific papers, starting with PDF and XML formats. This will enable the efficient transfer of knowledge to other tools and applications.

Participants:

Dr Gitanjali (Gita) Yadav

Emanuel Faria

  • Phytomedicine Detective
  • Founder/Formulator at Verriclear Natural Skin Essentials Ltd.
  • Verriclear participates in non-profit science to create skincae products
  • https://www.verriclear.com

About

A tool to identify, standardize, and export complex tabular data from scientific papers to facilitate use by other tools, formats, and applications.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages