This is an analysis pipeline for GENEVA datasets. Please follow the following steps to execute your dataset.
- The dataset is composed of the gene expression of each single cell (coded by a Cell barcode).
- The same cell barcodes are connected to a HTO barcode (present in a different fastq file).
- Once you process the HTO fastq files, you use the cell barcodes to map the cells to their correct drug labels.
- Lastly, we perform the 3'end mRNA sequencing of cells to collect their genotype.
- We then use their genotype to map each cell BC to their corresponding cell line.
- Follow the script called
cellranger_processing.shin thescriptsfolder to convert fastq files into count matrices. - Follow the notebook called
10x_processing.ipynbin thenotebooksfolder to process your raw table into a processed matrix.
- Follow the notebook called
pymulti_processing.ipynbin thenotebooksfolder to process your HTO fastq files. - Follow the notebook called
10x_pymulti_merging.ipynbin thenotebooksfolder to merge your processed matrix with the demultiplexed data.
-
Follow the script called
3_end_mrna_processing.shin thescriptsfolder to process all your mrna data into deduplicated bam files. -
Follow the script called
process_3_end_vcfs.shin thescriptsfolder to process all your bam files into a final concatenated vcf file ready for demuxlet. -
Follow the script called
demuxlet_processing.shin thescriptsfolder to process the merged vcf files along with the 10x outputs into a demultiplexed vcf files. -
Follow the script called
freemux_processing.shin thescriptsfolder to process the demux outputs into clustered vcf outputs that sceasymode can analyze. -
Follow the notebook called
assign_genotypes_sceasymode.ipynbin thenotebooksfolder to process the final vcf files into dictionaries with the genotyping module of sceasy mode.
Follow the notebooks in the analysis_notebooks directory