Project

This is a package to help with the analysis of bulk RNA sequencing data. This package is built on scanpy.

State

This project is currently under development.

Installation of this package in another project

To install this library in another Python project, execute simply:

pip install git+https://github.com/idiap/bulkanalysis.git

Scripts

`genes_preprocessing`

genes_preprocessing.py aims at performing basic filtering and normalization on bulk RNA sequencing data. To run it, udpate the file config/genes_preprocessing_template.yaml with the right paths. You can then run the script as follows:

python3 scripts/genes_preprocessing.py --config_file config/genes_preprocessing_template.yaml

The config file should contain:

data_origin: Origin of the transcripts matrix. For now, the only supported option is kallisto_whole_transcriptome, meaning that the transcripts matrix must come from a kallisto quantification on a whole transcriptome. Other options might be supported in the future.
df_counts_path: Path to the matrix of transcripts.
gtf_file_no_focus: Original GTF used for the whole transcriptome quantification.
gene_names_path: GENCODE genes names with gene symbols. Example: "Gencode_geneNames_hg38V44.txt"
gene_info_path: Genes symbol with their information, in particular whether they are protein-coding. Example: "Homo_sapiens.gene_info"
treatments: dictionnary with name of the treatments in keys and list of corresponding sample names in keys.
path_to_results: Directory where to save the results.
figures_extension: Extension you want to save your figures with, e.g "pdf", "png",...
pct_in_treatment: Percentage of samples within a treatment group in which a gene should be reliably expressed to be kept.

`aggregate_featureCounts_output`

aggregate_featureCounts_output.py aims at merging the outputs from featureCounts for multiple samples. To run it, run the script as follows:

python3 scripts/aggregate_featureCounts_output.py -f sample1.txt sample2.txt sample3.txt -n sample_name1 sample_name2 sample_name3 -s df_counts.csv

with:

sample1.txt sample2.txt sample3.txt being the output files of the function featureCounts
sample_name1 sample_name2 sample_name3 being the names of the samples you want to appear in the final matrix
df_counts.csv: name of the file where to save the final matrix.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
bulkanalysis		bulkanalysis
config		config
scripts		scripts
.bandit		.bandit
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project

State

Installation of this package in another project

Scripts

`genes_preprocessing`

`aggregate_featureCounts_output`

About

Uh oh!

Releases

Packages

Languages

License

idiap/bulkanalysis

Folders and files

Latest commit

History

Repository files navigation

Project

State

Installation of this package in another project

Scripts

genes_preprocessing

aggregate_featureCounts_output

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`genes_preprocessing`

`aggregate_featureCounts_output`

Packages