Skip to content

episphere/datascience

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Where life becomes numbers, and numbers come to life

Division of Cancer Epidemiology and Genetics (DCEG)
National Institutes of Health (NIH) / National Cancer Institute (NCI)

Established in 2019 with the recruitment of DCEG's inaugural Chief Data Scientist, Jonas Almeida, the Data Science Group seeks to advance research and infrastructure for data-intensive Precision Prevention studies.

Mission statement

To advance Data Science and Engineering for Precision Epidemiology through the development of Computational Commons.

Goals

The main goal of the Data Science Group is to accelerate the investigation of epidemiologic and genetic causes of cancer, and to advance Cloud Computing infrastructure for Precision Prevention. These two aims are pursued as a multidisciplinary research program that combines systems biology, computational statistics, artificial intelligence, and software engineering for biomedical applications.

Training

Outreach through Education and development of trans-disciplinary human resources is the third aim of the Data Science Group, and is articulated by weekly Cloud4Bio Hackathons at NCI's Shady Grove campus.

EpiSphere

The evolution of the Web towards a global data space is creating new opportunities for cancer prevention and understanding its etiology. This is a technology development particularly well suited for Epidemiology research, challenged by a widening diversity of data types, and increasngly sensitive governance of data sources. The data types now range from digital pathology to wearable devices, while its governance needs to traverse environments stretching from federal and state sponsored reference data sources, to consumer-facing cloud-hosted services. EpiSphere is therefore conceived as an epidemiology approach to NIH datacommons initiative with the goal of advancing interoperable data ecosystems in a manner that is driven by specific data-intensive projects at DCEG. Specifically, this practical focus drives the development of Data Science as computational infrastructure, enabled by scalable Cloud Computing and Artificial Intelligence (AI) made available by the NIH STRIDES initiative. As such, EpiSphere was conceived as an umbrella computational epidemiology framework informed, and validated, by the infrastructure for data science projects it develops.

People

Projects we're involved

  • EpiSphere - Web tools to operate Cancer Epidemiology Commons.
  • FeatureScape - Interactive representation and analysis of feature landscapes.
  • Serverless OpenHealth - live demo at bit.ly/loadsparcs.
  • Connect for Cancer Prevention Study - a next generation cohort study design that interoperates with integrated Health Care Systems (~200,000 participants).
  • Confluence - a research resource to uncover breast cancer genetics through genome-wide association studies (GWAS). The resource will include at least 300,000 breast cancer cases.
  • mortalityTracker - Web-based aggregation of CDC data services on causes of death, colated with real-time data on ongoing COVID-19 pandemic.
  • PLCOjs - SDK for GWAS data exploration of the PLCO clinical trial data.

About

Data Science Group at NCI/DCEG

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors