The list of datasets used in this analysis is provided in supplementary tableNeuroblastomaSI.xlsx of the paper.
To retrieve and pre-process gene-expression data:
- For microarray datasets, follow instructions from EMT_score_calculation.
- For RNA-sequencing datasets, refer to EMT_Scoring_RNASeq.
To explore PC1 variance using random gene combinations:
-
Use
PC1_Variance_Histogram.rto generate histograms for PC1 variance across random combinations of NOR/MES gene lists and housekeeping genes. -
Add pre-processed gene-expression matrix files as tab-delimited
.txtfiles in thedatasets/folder. -
Use
PC1_Swap.rto generate boxplots by swapping NOR/MES genes with housekeeping genes (one at a time).
To visualize the relationship between gene swaps and PC1 variance:
-
Create a
GSEID.csvfile in thePC1_Means/folder with the following structure:- Column 1: Number of Swaps
- Column 2: Mean Variance
-
Run
Linear_fit.pyto generate the linear fit and obtain:- R-squared value
- Mean squared error
- Slope and intercept
To perform dimensionality reduction and enrichment analysis:
-
Add pre-processed gene-expression matrix files as tab-delimited
.txtfiles to thedata/folder. -
Run
PCA.pyfor Principal Component Analysis and K-Means clustering. -
Use
GSEA.pyto perform Gene Set Enrichment Analysis on the resulting clusters. -
Provide a gene signature
signature.gmtfile as input for GSEA. Gene signatures can be obtained from MSigDB. -
Create an
Output/folder to store:- PCA plots
- GSEA results
If you use this repository or its contents in your work, please cite the following publication:
Mutually exclusive teams-like patterns of gene regulation characterize phenotypic heterogeneity along the noradrenergic-mesenchymal axis in neuroblastoma
Cancer Biology & Therapy (2024)
DOI: 10.1080/15384047.2024.2301802