Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
7 changes: 7 additions & 0 deletions .markdownlint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
default: true

MD013: false # Line length
MD022: false # Headings should be surrounded by blank lines
MD033: false # Inline HTML
MD051: false # Non-resolving link
MD029: false # ol-prefix
34 changes: 34 additions & 0 deletions 00_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
title: 'Using autoencoders to select representative microscopy images'
short_title: NucleusNet
numbering:
heading_2: false
---

+++ {"part": "abstract"}

Add your abstract here.
Avoid complicated equations / citations here, for crossref compatibility.

+++

+++{"part":"epigraph"}
:::{warning} Pre-print
This article has not yet been peer-reviewed.
_Updated 2025 September 27_
:::

+++

+++ {"part": "acknowledgements"}

Add your acknowledgments, if any, here.

+++

+++ {"part": "competing interests"}

## Competing Interests

Add your competing interests, if any, here.
+++
36 changes: 0 additions & 36 deletions 01_example_markdown.md

This file was deleted.

69 changes: 69 additions & 0 deletions 01_introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
title: Introduction
numbering:
enumerator: 1.%s
label : introduction_page
---

Representative images are visual communication tools used by microscopists to present their research to other scientists.
The earliest representative microscopy images were hand drawings in [Micrographia](<wiki:Micrographia>) in 1665 by Robert Hooke.
Nowadays, roughly three-quarters of publications in biomedical journals report at least one microscopy image [@doi:10.7554/eLife.55133].
In scientific journals, microscopists must choose a representative image to report in a static figure, but image selection is vulnerable to bias and deception [@doi:10.1242/jcs.261567].
Blinding and automation are strategies that can reduce subjective biases in microscopy experimentation [@doi:10.1083/jcb.201812109] but while automated imaging is trivial and accessible with modern microscopes [@doi:10.1101/861856], the process of representative image selection remains a subjective, non-repeatable step in the scientific process.
To address this, <https://doi.org/10.1016/s0006-3495(99)77379-0> developed automated methods for objective representative image selection from microscopy datasets.
The authors implemented a web server that chose typical images from uploaded data, but it is now unsupported and there is no modern equivalent.
It is important to study methods of objective representative image selection because these tools promote research integrity.

The task of objective representative microscopy image selection is a novel use case for artificial intelligence.
It was first demonstrated in a study that used principal component analysis (PCA) and K-means clustering to select representative images from medical ultrasound video series [@doi:10.3389/fonc.2021.673775].
Another study tested a method for objective representative image selection from real-world datasets [@doi:10.1109/BIP60195.2023.10379342], though it did not involve neural networks.
Their proposed method generated average images using measures of central tendency per pixel, then practical images were found in vector space using singular value decomposition (SVD).
We reproduced these results and adapted their approach using the latent space of a convolutional [autoencoder](<wiki:Autoencoder>) model.
Autoencoders are unsupervised deep learning models that compress and reconstruct images through a vector bottleneck referred to as _latent space_.
The structure of latent space is a black box, though it can be shaped to be more useful with the art of representation learning [@doi:10.1109/TPAMI.2013.50].
For example, it was shown that the latent space of autoencoders trained on microscopy data was sensitive to cell orientation, so multi-encoder [@doi:10.1038/s42003-022-03218-x] and orientation-invariant [@doi:10.1038/s41467-024-45362-4] models were engineered to disentangle cell orientation in latent space.
Autoencoders are commonly used for anomaly detection, which is based on the assumption that the autoencoder learns an optimal latent space to describe the normal data, so that when images are reconstructed, anomalous data will have a higher reconstruction error than normal data [@doi:10.1109/WTS.2018.8363930].
Though this assumption is flawed [@doi:10.48550/arXiv.2501.13864] and autoencoders can be unreliable anomaly detectors [@doi:10.1109/ICUFN57995.2023.10199315], it would suggest that autoencoders can determine normal data in a dataset to select an image.

There is no standard definition of a _representative image_ in the literature.
Markey et al. defined it as the image that is most similar to all other images in the dataset.
Soto-Quiros et al. referred to it as an image with the overall content and characteristics of the dataset. In either case, the assumption is that the representative image is not an outlier, nor anomalous.
<https://doi.org/10.1016/s0006-3495(99)77379-0> established an important criterion for evaluating methods of representative image selection; ideal methods will always pick a member of the majority class as the most typical image.
This criterion was established in experiments using contaminated datasets where normal data was the majority class and anomalies were the minority class, but this criterion can also extend to datasets with discrete phenotypes like the [cell cycle](<wiki:Cell cycle>).
For example, <https://doi.org/10.1016/0014-4827(79)90553-6> found that the majority class of asynchronous populations of CV-1 cells was [](#interphase), therefore an objective method to determine the typical image of a [cell nucleus](<wiki:Cell_nucleus>) should always pick a cell in interphase.

To automate the process of sample collection and image selection from traditional fluorescence confocal microscopy experiments, we generated a dataset of one million images of fixed DAPI-stained cell nuclei, sampled from one-hundred glass coverslips on an Olympus Fluoview FV3000 confocal microscope equipped with a motorized stage ([](#fig1)a).
We then trained a convolutional autoencoder model to embed and reconstruct these images ([](#fig1)b) and we analyzed the compressed latent vectors to define representative microscopy images ([](#fig1)c) near theoretical measures of central tendency in latent space, as a method of objective representative microscopy image selection.

```{figure} ./figures/fig1.png
:label: fig1
:align: center
:width: 100%

An overview of data collection and representative image selection in our fluorescence confocal microscopy experiment.
A) Automated grid collection imaging with a motorized stage on a confocal microscope covered large areas at high-resolution.
B) Train an autoencoder to embed and reconstruct the collection of single-cell images of nuclei.
C) Calculate average latent vectors to define representative images in latent space.
```

We sought to generate a large dataset because the performance of neural networks tend to scale with dataset size [@doi:10.48550/arXiv.1712.00409].
Further, a large dataset presents a conceptual challenge to the task of representative image selection, because it is unreasonable for a human to evaluate 1,000,000 images and only choose one to summarize the dataset.
To address these limitations, we used interactive visualization strategies to present large panoramas and single-cell fluorescence microscopy images in a dynamic format.
Interactive figures allow for comprehensive evaluation and it simulates the process that microscopists undergo to select a representative image.
Consider that the criteria used to make subjective determinations about "representativeness" are unknown as discussed by <https://doi.org/10.1016/s0006-3495(99)77379-0>.
We implement an autoencoder-based method of automatic image selection to report average images of cell nuclei from a single-cell microscopy dataset.
Similarly, with insufficient representation learning, the criteria used to make objective determinations with an autoencoder model are also unknown.
Prototypical images were selected based on distance metrics that defined each image with respect to the centroid of latent space in the bottleneck of an autoencoder model.
The intuition behind this approach is consistent with previous reports on representative image selection and we found that it reliably identified normal data from our collection.

---

## Contributions

1. A confocal microscopy image dataset of over 1,000,000 masked cell nuclei.
Large regions of coverslips were sampled with automated imaging methods on a confocal microscope, yielding over 250,000 high-magnification fields that were stitched into 1,600 panoramas to mask and crop 1,061,277 ROIs.
See [data availability](#data-availability) for repositories and archives.

2. A novel use case for autoencoders: representative image selection.
We show that autoencoders can select representative images from datasets.
To our knowledge, this is among the first described methods of representative image selection to use a neural network [@doi:10.3389/fonc.2021.673775], especially an unsupervised deep learning model.
72 changes: 72 additions & 0 deletions 02_preliminaries.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
title: Preliminaries
numbering:
enumerator: 2.%s
label : preliminaries_page
---

## Definitions

A _grayscale image_ is defined as a matrix $M*N$ dimensions where each pixel is a single intensity value ranging from $0-1$ that represents the amount of light or intensity information at a specific point [@doi:10.1109/BIP60195.2023.10379342].

A _latent vector_ $z$ is a _n_-dimensional vector in the bottleneck of an autoencoder model between the encoder and decoder. It represents a compressed representation of an image.

A _latent space_ is a collection of latent vectors that form a reduced-dimensionality vector embedding of the data, fit by a machine learning model [@doi:10.1111/cgf.13672].

A _representative image_ is defined as an image with the overall quality and characteristics of the dataset [@doi:10.1109/BIP60195.2023.10379342].

## Representative images from the MNIST collection

### Literature reproduction

Objective representative image selection was recently demonstrated with real-world data including the [MNIST database](<wiki:MNIST_database>) [@doi:10.1109/BIP60195.2023.10379342].
The authors proposed a two-step approach to objective representative image selection.
First, theoretical representative images were calculated using measures of central tendency, then representative images were selected from the dataset based on distance to theoretical images in vector space using SVD.

The MNIST dataset consists of 70,000 images of handwritten numbers that were manually annotated into ten classes corresponding to the digits 0-9.
Soto-Quiros et al. tested their approach to representative image selection on a sub-set of n=720 images labelled "four".
The three chosen measures of central tendency were the arithmetic mean and median which computed the average value per pixel independently, as well as the geometric median.
We reproduced the theoretical average images of the MNIST digit "four" with all N=6824 images labelled "four" ([](#fig2a)) and we found the results to be consistent with the primary literature [@doi:10.1109/BIP60195.2023.10379342], though the exemplars were not the same ([](#fig2b)).

:::{figure} #fig2a_data
:label: fig2a
:placeholder: ./figures/fig2a.png
Computation of theoretical representative images of the MNIST digit '4'.
N=6824 grayscale images with the label '4' were flattened to 784-dimensional vectors to compute then reshape reconstructed images of the arithmetic mean (left), median (middle) and geometric median (right).
:::

:::{figure} #fig2b_data
:label: fig2b
:placeholder: ./figures/fig2b.png
Computation of practical representative images of the MNIST digit '4' using the arithmetic mean (left), median (middle) and geometric median (right).
:::

### Proposed method of representative image selection

We adapted the two-step method to representative image selection [@doi:10.1109/BIP60195.2023.10379342] using the latent space of an autoencoder.
First, theoretical average latent vectors were calculated using measures of central tendency, like the arithmetic mean, median and geometric median.
Then, practical examples were determined in latent space by ranking latent vectors by distance to the calculated centroids.

1. Calculation of average latent vectors

We trained a convolutional autoencoder model ({ref}`mnist-AE`) on the MNIST dataset (Supplementary [](#sfig1a) and [](#sfig1b)) and saved the encoder and decoder weights after one-hundred training epochs and we encoded a latent space.
6824 latent vectors with the label "four" were averaged by arithmetic mean, median and geometric median, then the latent vectors were reconstructed using the decoder weights to synthesize theoretical representative images of the digit "four" ([](#fig3a)).

:::{figure} #fig3a_data
:name: fig3a
:placeholder: ./figures/fig3a.png
Decoded latent vectors: arithmetic mean (left), median (middle) and geometric median (right).
:::

2. Define a practical representative image

The behaviour of the theoretical image generally does not correspond to a distinct image, therefore it is not considered the final representative image [@doi:10.1109/BIP60195.2023.10379342].
However, the theoretical average latent vectors can be used to determine practical representative images from the dataset.
The closest latent vector to each average was measured by Euclidean distance in the vector embedding ([](#fig3a)).
It was found to be the same image for all three measures, and it is remarkably similar to the practical representative images chosen in [](#fig2b), which suggests that the methods are comparable.

:::{figure} #fig3b_data
:name: fig3b
:placeholder: ./figures/fig3b.png
Closest examples to the arithmetic mean (left), median (middle) and geometric median (right).
:::
Loading