Maximum likelihood and variational Bayes estimation for mixtures of regular Brownian motions
Clone the repo, then
cd emdiff
pip install -e .
Then make sure you can run the example:
cd samples
python sample_script.py
emdiff is a Python analysis tool for trajectories generated by
stroboscopic single particle tracking (SPT) experiments.
It takes a set of trajectories as a pandas.DataFrame, then returns the
estimated fraction of molecules in distinct states and the
diffusion coefficient corresponding to each state.
There are actually two commands in emdiff:
emdiffis an expectation-maximization routine that only estimates the maximum likelihood state occupancies and diffusion coefficients of each state.vbdiffis a variational Bayes treatment of the same problem. It estimates the full posterior distribution over occupancies, diffusion coefficients, and the assignment of each trajectory to a state.
emdiff is simpler, but vbdiff is highly recommended. vbdiff performs
better when the number of states in the model doesn't match the true
number of states in the data. Both run at about the same speed.
emdiff and vbdiff do not check the quality of your raw data.
emdiff and vbdiff do not filter out any data. They will consider whatever
data you give them. There is a single optional filter to exclude
trajectories before a start frame, but that's it.
emdiff does not decide how many diffusive states are present
in your data. You decide that.
emdiff and vbdiff do not consider transitions between states. For that,
we highly recommend vbSPT
from the Elf group.
emdiff and vbdiff do not provide any visualizations of the result beyond
some simple histograms provided by the plot option to emdiff.
emdiff and vbdiff expect you to understand your own experiment. You need to
know the frame interval, approximate localization error, and pixel size.
emdiff and vbdiff only launch the estimator from a single initial parameter
guess. They do not identify when the estimator has converged to a
local rather than global maximum. The user can set the initial guess
via the guess argument.
emdiff and vbdiff do not deal with identifiability issues arising from
interchangeability of the states. For instance, if we fit
with n states, there will be at least n! maxima in parameter space
that come from permutations of the state identities. emdiff and vbdiff
will simply converge to one of these maxima, then order the output
by increasing diffusion coefficient.
No. emdiff and vbdiff place no limit on the number of states.
Of course, more states mean more error involved in the estimation.
No, with one fat caveat. emdiff and vbdiff support analysis in any
number of spatial dimensions. Exactly how many is set by the pos_cols
argument, which tells the tools where to look for the spatial coordinates.
See the ``Example usage'' section below.
The fat caveat is that both tools model localization error as identical along each spatial dimension. This is generally true for 1D and 2D analyses. But for most SPT experiments in 3D, tracking the position of moving objects in the axial (Z) direction is harder than in the lateral (XY) directions. So the assumption of identical localization error along each axis will be wrong for those setups.
numpy, pandas, scipy, matplotlib, seaborn. These are all
part of the standard Anaconda distribution. If you have
matplotlib_scalebar,
the plots will have scalebars. So that's recommended too.
Both emdiff and vbdiff take trajectories as a pandas.DataFrame. Each row of
the dataframe should represent one detection from an SPT experiment.
The dataframe must contain at minimum:
- a
trajectorycolumn, with the index of the trajectory to which that localization corresponds; - a
framecolumn, with the index of the corresponding frame; - columns such as
yandxwith the spatial coordinates of the localization. These must be specified in pixels.
See samples/sample_tracks.csv for a sample input.
from emdiff import emdiff
state_occs, diff_coefs = emdiff(
tracks,
n_states=2,
pos_cols=["y", "x"], # columns in *tracks* with
# the spatial coordinates
pixel_size_um=0.16, # microns
frame_interval=0.00748, # seconds
loc_error=0.035 # microns
)
state_occs[j] and diff_coefs[j] are the fractional occupancy
and diffusion coefficient (in microns squared per second)
of the jth state.
The states are always ordered in terms of increasing diffusion coefficient.
You can find another example at samples/sample_script.py.
from emdiff import vbdiff
state_occs_mean, diff_coefs_mean = vbdiff(
tracks,
n_states=2,
pos_cols=["y", "x"], # columns in *tracks* with
# the spatial coordinates
pixel_size_um=0.16, # microns
frame_interval=0.00748, # seconds
loc_error=0.035 # microns
)
state_occs_mean[j] and diff_coefs_mean[j] are the mean fractional
occupancy and mean diffusion coefficient for the jth state under the
posterior model.
The states are always ordered in terms of increasing diffusion coefficient.
Yes. This can be useful, for instance, if you want the probability that each
trajectory is in each diffusive state. Use the return_posterior argument:
from emdiff import vbdiff
r, n, A, B, occs_mean, diff_coefs_mean, elbo, likelihood = vbdiff(
tracks,
n_states=2,
pos_cols=["y", "x"], # columns in *tracks* with
# the spatial coordinates
pixel_size_um=0.16, # microns
frame_interval=0.00748, # seconds
loc_error=0.035, # microns
return_posterior=True
)
As you can see, we get some more outputs. What do these mean?
ris a 2Dndarrayof shape(n_states, n_tracks), wherer[j,i]is the posterior probability for trajectoryito inhabit diffusive statej.nis a 1Dndarrayof shape(n_states,). This is the parameter for the posterior (Dirichlet) distribution over state occupancies.AandBare the parameters for the posterior (inverse gamma) distribution over the spatial variance of each state. Both are 1Dndarrays, whereA[j],B[j]are the parameters for the jth state. The spatial variance is defined as4 * (D * frame_interval + loc_error**2).elbois the evidence lower bound for the posterior model.likelihoodis the likelihood of the posterior model given the data.
from emdiff import emdiff
# The output prefix for the plots, which are saved as PNGs
plot_prefix = "my_dataset_name_emdiff_fits"
# Run the core EM routine
state_occupations, diff_coefs = emdiff(
tracks,
n_states=2,
pos_cols=["y", "x"],
pixel_size_um=0.16,
frame_interval=0.00748,
loc_error=0.035,
plot=True,
plot_prefix=plot_prefix
)
Up to you.
There are various ways to estimate the localization
error in an SPT experiment. emdiff provides one:
from emdiff import calc_le
# Use the covariance method to calculate localization error
loc_error = calc_le(
tracks,
pixel_size_um=0.16,
pos_cols=['y', 'x'],
method="covariance"
)
# Use the MSD method to calculate localization error
loc_error = calc_le(
tracks,
pixel_size_um=0.16,
pos_cols=['y', 'x'],
method="msd"
)
The localization error is defined as the root 1D variance associated
with estimating a particle's position at any given frame. The two
methods (covariance and msd) are two different ways to go about
estimating it.
If this localization error seems large, probably that's because it actually is. Moving molecules produce a lot more error than in traditional fixed cell PALM/STORM experiments where all the molecules are stationary.
Does emdiff have a defocalization correction similar to Spot-On?
Yes. Use the dz argument to emdiff to set the microscope's focal
depth in microns.
For example:
from emdiff import vbdiff
state_occs_mean, diff_coefs_mean = vbdiff(
tracks,
n_states=2,
pos_cols=["y", "x"], # columns in *tracks* with
# the spatial coordinates
pixel_size_um=0.16, # microns
frame_interval=0.00748, # seconds
loc_error=0.035, # microns
dz=0.7 # depth of field in microns
)
For now, see the docstrings to the functions emdiff.em.emdiff and
emdiff.vb.vbdiff.
In the future, you'll find a user guide in the doc folder to
this repository.