Skip to content

Latest commit

 

History

History

README.md

Carnival Gex Preprocess Building Block

This package provides the Carnival Gex Preprocess Building Block (BB).

Table of Contents

Description

This building block processes (reshapes and scales) gene expression data from the Genomics of Drug Sensitivity in Cancer (GDSC) database for use by other building blocks.

User instructions

Requirements

  • Python >= 3.6
  • Singularity
  • permedcoe base package: python3 -m pip install permedcoe

In addition to the dependencies, it is necessary to generate the associated singularity image (toolset.singularity), located in the Resources folder of this repository.

They MUST be available and exported respectively in the following environment variable before its usage:

export PERMEDCOE_IMAGES="/path/to/images/"

Installation

This package provides an automatic installation script:

./install.sh

Usage

The Carnival_gex_preprocess_BB package provides a clear interface that allows it to be used with multiple workflow managers (e.g. PyCOMPSs, NextFlow and Snakemake).

It can be imported from python and invoked directly from a PyCOMPSs application, or through the command line for other workflow managers (e.g. Snakemake and NextFlow).

The command line is:

Carnival_gex_preprocess_BB -d \
    --tmpdir <working_directory> \
    --input_file <input_file> \
    --col_genes <col_genes> \
    --scale <scale> \
    --exclude_cols <exclude_cols> \
    --tsv <tsv> \
    --remove <remove> \
    --verbose <verbose> \
    --output_file <output_file>

Where the parameters are:

Flag Parameter Type Description
--tmpdir <working_directory> Folder Working directory (temporary files)
Input --input_file <input_file> File csv/url with the GDSC gene expression data
Input --col_genes <col_genes> String Name of the column containing the gene symbols. Default = GENE_SYMBOLS
Input --scale <scale> String Normalize genes across samples (TRUE/FALSE)
Input --exclude_cols <exclude_cols> String Exclude columns containing the given string. Default = GENE_title
Input --tsv <tsv> String Import as TSV instead of CSV (TRUE/FALSE)
Input --remove <remove> String Remove the given substring from columns. Default = .DATA
Input --verbose <verbose> String Verbose output (TRUE/FALSE)
Output --output_file <output_file> File Processed csv file

Here is an example from https://github.com/saezlab/permedcoe/blob/master/containers/workflow_bb.sh preprocessing GDSC data:

wget -O gdsc_gex.zip https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources/Data/preprocessed/Cell_line_RMA_proc_basalExp.txt.zip
unzip gdsc_gex.zip
Carnival_gex_preprocess_BB \
    --input_file Cell_line_RMA_proc_basalExp.txt \
    --col_genes GENE_SYMBOLS \
    --scale FALSE \
    --exclude_cols GENE_title \
    --tsv TRUE \
    --remove DATA. \
    --verbose TRUE \
    --output_file gex.csv

Uninstall

Uninstall can be achieved by executing the following scripts:

./uninstall.sh
./clean.sh

License

Apache 2.0

Contact

https://permedcoe.eu/contact/

This software has been developed for the PerMedCoE project, funded by the European Commission (EU H2020 951773).