This package provides the Carnival Gex Preprocess Building Block (BB).
This building block processes (reshapes and scales) gene expression data from the Genomics of Drug Sensitivity in Cancer (GDSC) database for use by other building blocks.
- Python >= 3.6
- Singularity
permedcoebase package:python3 -m pip install permedcoe
In addition to the dependencies, it is necessary to generate the associated
singularity image (toolset.singularity),
located in the Resources folder of this repository.
They MUST be available and exported respectively in the following environment variable before its usage:
export PERMEDCOE_IMAGES="/path/to/images/"This package provides an automatic installation script:
./install.shThe Carnival_gex_preprocess_BB package provides a clear interface that allows
it to be used with multiple workflow managers (e.g. PyCOMPSs, NextFlow and
Snakemake).
It can be imported from python and invoked directly from a PyCOMPSs application, or through the command line for other workflow managers (e.g. Snakemake and NextFlow).
The command line is:
Carnival_gex_preprocess_BB -d \
--tmpdir <working_directory> \
--input_file <input_file> \
--col_genes <col_genes> \
--scale <scale> \
--exclude_cols <exclude_cols> \
--tsv <tsv> \
--remove <remove> \
--verbose <verbose> \
--output_file <output_file>Where the parameters are:
| Flag | Parameter | Type | Description | |
|---|---|---|---|---|
| --tmpdir | <working_directory> | Folder | Working directory (temporary files) | |
| Input | --input_file | <input_file> | File | csv/url with the GDSC gene expression data |
| Input | --col_genes | <col_genes> | String | Name of the column containing the gene symbols. Default = GENE_SYMBOLS |
| Input | --scale | <scale> | String | Normalize genes across samples (TRUE/FALSE) |
| Input | --exclude_cols | <exclude_cols> | String | Exclude columns containing the given string. Default = GENE_title |
| Input | --tsv | <tsv> | String | Import as TSV instead of CSV (TRUE/FALSE) |
| Input | --remove | <remove> | String | Remove the given substring from columns. Default = .DATA |
| Input | --verbose | <verbose> | String | Verbose output (TRUE/FALSE) |
| Output | --output_file | <output_file> | File | Processed csv file |
Here is an example from https://github.com/saezlab/permedcoe/blob/master/containers/workflow_bb.sh preprocessing GDSC data:
wget -O gdsc_gex.zip https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources/Data/preprocessed/Cell_line_RMA_proc_basalExp.txt.zip
unzip gdsc_gex.zip
Carnival_gex_preprocess_BB \
--input_file Cell_line_RMA_proc_basalExp.txt \
--col_genes GENE_SYMBOLS \
--scale FALSE \
--exclude_cols GENE_title \
--tsv TRUE \
--remove DATA. \
--verbose TRUE \
--output_file gex.csvUninstall can be achieved by executing the following scripts:
./uninstall.sh
./clean.shThis software has been developed for the PerMedCoE project, funded by the European Commission (EU H2020 951773).
