Use str_q.sh to quantify transcript usage (requires 3 GB, 14h). Inputs: bams, gtf.
With bsn12, version str_q8.
First runs Stringtie -eB and saves the results of individual samples in intermediates/231208_str_q_outs/quantifications.
Then calls R/summarize_stringtie_q_output.R which compiles matrices of coverage, FPKM, TPM, and saves them in .../summaries.
In particular, saves the TPMs also in a long format, as .../t_exp.tsv, for use with Shiny app. The SLURM log should give the scp command to transfer this file (see repo isoforms_compare).
Finally, calls ~/.utilities/prepDE.py3, provided by Stringtie authors, to save .../summaries/gene_count_matrix.csv and transcripts_count_matrix.csv for other downstream uses (such as R/drimseq_load_data and R/drimseq_test). Note: 231208 had to change to the Python 3 version on McCleary.
For other project, run StringTie with -A option (to create a gene_abundance.tab file for each sample), then use summarize_gene_abundance.R to gather the gene TPM from all samples and save as gene_TPMs.tsv.
No definitive approach at this point, see R/drimseq_load_data to load and pre-filter, saving the object intermediates/2023-03-30_drimseq_fitdms.qs, that can be loaded in R/drimseq_test to perform DTU.
Older versions used gtf augmented with novel transcripts (ref_gtf="intermediates/2022-03-22_str_sc_n/220322_novel_filt_sorted.gtf")
str_sc_n: novel isoforms using mix of short and long reads
str_n: novel isoforms only with short reads (to compare with str_sc_n)
str_n_scOnly: novel isoforms only with long reads (to compare with str_sc_n)
Older versions needed to prepare bam files with prep_alignments.sh (and save them in scratch60).