This repository contains instructions and programming code for analyses of the data presented in the preprint "Intrinsic Heterogeneity of Primary Cilia Revealed Through Spatial Proteomics (DOI: 10.1101/2024.10.20.619273)". Please cite our preprint if you reuse data or code.
We also provide a license in this repository for all code where the license is not otherwise specified in the code.
If you do not see this folder system on github, check the following github repo for potential updates: https://github.com/CellProfiling/HPA_Cilia_Study_Code/tree/main.
- Query the XML data for the ENSEMBL gene ID of your interest through a link composed of
https://www.proteinatlas.org/+ ENSEMBL ID (e.g.,ENSG00000137691) +.xml, e.g., https://www.proteinatlas.org/ENSG00000137691.xml for CFAP300. - In the XML search for the subassay with subtype ("ciliated cell lines"):
<subAssay type="human" subtype="ciliated cell lines"> ... </subAssay>
- Nested in the subAssay element you will find different "data" elements, which each represent the images for a specific cell line.
-
This allows you to see the links for all z planes. E.g., for the image in the screenshot, this reveals 21 different z planes with their corresponding links:
- For z_index=1:
https://images.proteinatlas.org/38585/2146_D7_42_blue_red_green.jpg - For z_index=2:
https://images.proteinatlas.org/38585/2146_D7_41_blue_red_green.jpg - ...
- For z_index=20:
https://images.proteinatlas.org/38585/2146_D7_24_blue_red_green.jpg - For z_index=21:
https://images.proteinatlas.org/38585/2146_D7_23_blue_red_green.jpg
- For z_index=1:
-
Extract a download id for each z plane from the link:
- Remove the beginning (
https://images.proteinatlas.org/) from the link. E.g.,https://images.proteinatlas.org/38585/2146_D7_42_blue_red_green.jpgbecomes/38585/2146_D7_42_blue_red_green.jpg - Remove the ending (
blue_red_green.jpg) from the remaining link. E.g.,/38585/2146_D7_42_blue_red_green.jpgbecomes/38585/2146_D7_42_ - The final image id (
/38585/2146_D7_42_) remains
- Remove the beginning (
-
Recombine image id for each z plane to create links for downloading the individual tif images for each channel.
- For the blue DAPI / Nuclei channel, the download link will be:
https://www.proteinatlas.org/download_file.php?filename=+ the image id (e.g.,/38585/2146_D7_42_) +_blue&format=tif.gz, so e.g.,https://www.proteinatlas.org/download_file.php?filename=/38585/2146_D7_42_blue&format=tif.gz - For the red Cilia marker channel, the download link will be:
https://www.proteinatlas.org/download_file.php?filename=+ the image id (e.g.,/38585/2146_D7_42_) +_red&format=tif.gz, so e.g.,https://www.proteinatlas.org/download_file.php?filename=/38585/2146_D7_42_red&format=tif.gz - For the yellow Basal body marker channel, the download link will be:
https://www.proteinatlas.org/download_file.php?filename=+ the image id (e.g.,/38585/2146_D7_42_) +_yellow&format=tif.gz, so e.g.,https://www.proteinatlas.org/download_file.php?filename=/38585/2146_D7_42_yellow&format=tif.gz - For the green Protein of interest marker channel, the download link will be:
https://www.proteinatlas.org/download_file.php?filename=+ the image id (e.g.,/38585/2146_D7_42_) +_green&format=tif.gz, so e.g.,https://www.proteinatlas.org/download_file.php?filename=/38585/2146_D7_42_green&format=tif.gz
- For the blue DAPI / Nuclei channel, the download link will be:
-
Download for all z planes all four channel .tif.gz files and place them all together into one folder.
-
Extract all .tif.gz files so they become .tif files and rename each file based on the z plane and the channel to match the following scheme: <image id after removing the front part between / and /> +
_c+ +_z+ <z_index> +.tif, such as, e.g.,2146_D7_42_c0_z0.tiffor the DAPI channel image of the first plane for the example images above.- Use channel number 0 for the DAPI channel
- Use channel number 1 for the Cilia channel
- Use channel number 3 for the Protein of interest channel
- Use channel number 4 for the Basal body channel
-
Assemble all images into a multi-channel multi-plane tif stack.
Images downloaded and assembled as described above can be subjected to the segmentation script to create cilia, basal body, and nucleus segmentations.
- Run CiliaQ analysis (CiliaQ version V0.2.1, which needs to be manually installed (not through ImageJ update sites)) on all 7-channel images with the following CiliaQ settings
- Collect all CiliaQ output files ending with
_CQs.txtin a folder. - Add the table legend file provided here to the folder with all CiliaQ's
_CQs.txtoutput files. Make sure that this legend file is the first item when alphabetically sorting all files in the folder (if not rename to have it become the first item while keeping the file ending_CQs.txtintact) - Finally, assemble all
_CQs.txt-ending files produced by CiliaQ through concatenating all files ending with_CQs.txtinto a single text file, e.g. by adding this Windows batch file to the folder and executing it; it will produce a new file calledAllCQsFilesMerged.txtwith all concatenatedCQs.txtfiles, which can then be used for analyzing statistics on the cilia like the length or orientation angle (as shown in Figure 2 in the preprint).
- Use the
AllCQsFilesMerged.txtfile from the previous step for this analysis. - Collect all
_CQl.txtfiles created by CiliaQ in the previous steps in a folder. - Run the analysis as described in the readme file and use the scripts in this repository
- Note that you need to have a specific excel file for this that lists all the images that you want to include and has specific columns available as explained in the readme file.
- The output files from this analysis can be further used to cluster and analyze intensity profiles.
- To plot intensity profiles follow this readme file and use the scripts in this repository
- To cluster profiles follow this notebook
- See readme file with instructions in this subfolder.
- A pretrained model file is available here:
.
- Validate GMNN prediction values based on predicting on images with other cell cycle markers stained in an additional channel (related to the preprint Figure 4H). A jupyter notebook showing how to normalize intensities and plot the predicted nuclear GMNN intensity versus the real intensity in the protein channel as well as running statistics is shown here
- Combine measures of nuclear predicted GMNN intensity with CiliaQ measurements of ciliary intensity (or other cilia parameters). The code is producing the preprint Figure 4I). A jupyter notebook showing how to normalize intensities and plot the predicted nuclear GMNN intensity versus the real intensity in the protein channel as well as running statistics is shown here; in case you favor using a python script you can find the same code as in the jupyter notebook here.
Note: Html files provided in this section represent knitted R markdown files. Download them and open in browser to explore them.
- Figures 1I and 2E (and related supplemental tables)
- Figures 3A, 4C, 4D (and related supplemental figures / tables): Functional enrichment analysis of protein lists
- Figure S1C: Test if the distribution of the number of subcellular locations of ciliary proteins is significantly different from all proteins in the whole cell




