RePsychLing · dmbates · Feb 24, 2021 · Feb 25, 2021 · Feb 28, 2021 · Mar 1, 2021
diff --git a/docs/Project.toml b/docs/Project.toml
@@ -1,4 +1,7 @@
 [deps]
+AlgebraOfGraphics = "cbdf2221-f076-402e-a563-3d30da359d67"
+CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
+CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
 DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
 Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
 Gadfly = "c91e804a-d5a3-530f-b6f0-dfbca275c004"

diff --git a/docs/notebook.jl b/docs/notebook.jl
@@ -0,0 +1,339 @@
+### A Pluto.jl notebook ###
+# v0.12.21
+
+using Markdown
+using InteractiveUtils
+
+# ╔═╡ 40bf1d04-76d0-11eb-0d98-cf2945aa3fd4
+begin   # set up the local package environment
+	import Pkg
+	Pkg.activate(".")
+	Pkg.instantiate()
+	using AlgebraOfGraphics, CairoMakie, DataFrames, MixedModels
+	using MixedModelsSim, Random
+end
+
+# ╔═╡ 09276d18-76d2-11eb-3b91-b77f2d532e82
+md"""
+# Simulation of linear mixed models
+
+This notebook shows the process of simulating responses from a mixed-effects model with crossed random effects for subject (`subj`) and item (`item`).
+The general approach is to create a data frame with the appropriate experimental and grouping factors and with an _inert_ dependent variable (`dv`), meaning that, at this stage `dv` is just random noise.
+
+This model is then used in a parametric bootstrap simulation with non-inert coefficient values (β) and random-effects variances and covariances.
+
+Begin by loading the packages to be used.
+"""
+
+# ╔═╡ 654920fa-79de-11eb-0dbb-15a85e6d3c71
+const AOG = AlgebraOfGraphics;  # short form for expressions like AOG.density
+
+# ╔═╡ 28a121e2-76d3-11eb-0896-43da695be0a4
+md"""
+And initialize a random number generator.
+"""
+
+# ╔═╡ b64a889c-76d0-11eb-1936-5545db219def
+rng = MersenneTwister(8675309);
+
+# ╔═╡ ce3be2ec-76e0-11eb-3fb5-67bd31816bca
+md"""
+Next we set the number of subjects, `sub_n`, and the number of items, `item_n`
+"""
+
+# ╔═╡ 21958b24-76ed-11eb-11a2-41223c4864f8
+begin
+	sub_n = 200
+	item_n = 50
+end;
+
+# ╔═╡ b8fe4dd2-76e2-11eb-3a7b-21e2b17043f7
+md"A between-subject factor, `cond`, with two levels, `easy` and `hard` is specified as"
+
+# ╔═╡ feb89de4-76d0-11eb-115e-89e7ed2fc483
+subj_bt = Dict(:cond => ["easy", "hard"]);
+
+# ╔═╡ eb911720-76e2-11eb-002d-21a35fd7d4b2
+md"These settings are used to create a Table with an _inert_ dependent variable, `dv`, as"
+
+# ╔═╡ 286561a4-76d1-11eb-2211-61d84bc1b9a9
+frm = simdat_crossed(rng, sub_n, item_n, subj_btwn = subj_bt);
+
+# ╔═╡ aa915fd2-76ec-11eb-2cec-bff137f10b18
+md"This data table is in the form of a _row table_, which is a vector of `NamedTuple`s"
+
+# ╔═╡ cb5f962a-76ec-11eb-3b1e-6f7f1f68bca5
+typeof(frm)
+
+# ╔═╡ 6914997e-76e3-11eb-2f2b-0501a82bee66
+md"It is generally best to convert this table to a `DataFrame` for display or summary"
+
+# ╔═╡ 9f327062-76e3-11eb-16dc-7be3a2b1c09f
+df = DataFrame(frm)
+
+# ╔═╡ b0202450-76e3-11eb-3218-8bddeed1c50d
+describe(df)
+
+# ╔═╡ 34b8677c-76e4-11eb-1807-9306323a7c9f
+md"""
+## Fitting a model to the inert data
+
+In the `MixedModels` package the formula language is similar to that in the `lme4` package for `R` with the exception that the formula must be wrapped in a call to the macro `@formula`
+"""
+
+# ╔═╡ 48e92618-76d1-11eb-053c-e77449325b9c
+m0form = @formula dv ~ 1 + cond + (1|subj) + (1|item);
+
+# ╔═╡ 7292d5fa-76e4-11eb-243f-8f7ad404d683
+md"Also, contrasts are specified in the call to fit the model instead of being part of the metadata for the factor"
+
+# ╔═╡ 761fb192-76d1-11eb-2f8e-890af7ae08d8
+m0 = fit(MixedModel, m0form, frm, contrasts=Dict(:cond => EffectsCoding()))
+
+# ╔═╡ 91169880-76d1-11eb-28da-0f4135f0e995
+md"""
+As this dependent variable is inert (i.e. simulated from a standard Normal distribution without any fixed- or random-effects contributions) it is not surprising the the parameter estimates for the variance components for subject and item are close to zero as are the estimates for the fixed-effects.
+
+## Simulated model fits for non-inert data
+
+To simulate responses from non-intert data we must specify values for the parameters in the form of a scalar, `σ`, the standard deviation of the per-observation noise, and two vectors: `β`, the fixed-effects parameter vector and `θ`, the vector of relative standard deviations.
+
+The fixed-effects parameter is relative to the model matrix, `X`, which, in this case, is the overall mean DV for a typical condition and half the difference between the `hard` and `easy` conditions.  It is half the difference because we use a +/- 1 coding for the `cond` factor, producing a model matrix, `X`, of
+"""
+
+# ╔═╡ d217cc16-76e8-11eb-2f64-6ff4931c87e9
+Int.(m0.X')   # convert to integers to save space in the display
+
+# ╔═╡ 9d46c892-76e9-11eb-008e-67d5100eb3fa
+md"Suppose we set the population mean response time to 400 ms. and the difference between `hard` and `easy` to be 50 ms.
+The β vector is then
+"
+
+# ╔═╡ dbd15424-76e9-11eb-2306-436929c4e756
+β = [400.0, 25.0]   # Include the decimal point to force Float64 values
+
+# ╔═╡ fbdfaf02-76e9-11eb-2404-d9a288cf1d31
+md"We also set"
+
+# ╔═╡ 167234ae-76ea-11eb-2589-93a3e6940453
+begin
+	σ  =  25.0  # s.d. of per-observation noise
+	σ₁ = 100.0  # s.d. of random effects for subject
+	σ₂ =  50.0  # s.d. of random effects for item
+	θ = [σ₁, σ₂] ./ σ  # relative standard deviations
+end;
+
+# ╔═╡ 5d1b6440-7783-11eb-3bae-45d008439b01
+md"""
+To simulate a single response vector from `m0` with these parameters and re-estimate the parameters we use
+"""
+
+# ╔═╡ ac43c242-7783-11eb-17d4-19939f282c52
+simulate!(MersenneTwister(12321), m0; β=β, σ=σ, θ=θ) |> refit!
+
+# ╔═╡ fabb4eae-7783-11eb-13de-157f033b8c12
+md"""
+We see that the estimates of the standard deviations of the random effects and the residual standard deviation are very close to the values used in the simulation: 100.0, 50.0, and 25.0.
+
+The fixed-effects coefficients are also close to those used in the simulation, [400.0, 25.0].
+The standard errors of these estimates indicate that this design and data will have considerable power in testing whether the main effect for `cond` could be zero.
+
+In general we are not interested in the power of a test of whether the `(Intercept)` could be zero.
+We do not expect response times of zero or, even more peculiar, negative response times.
+"""
+
+# ╔═╡ b75c8580-76eb-11eb-2cca-37a0702e44a5
+md"""
+## Simulation of many cases
+
+We could repeat this process of `simulate!` and `refit!` to create a collection of parameter estimates and derived statistics but it is easier to use the `parametricbootstrap` function to do this simulation and package the results.
+
+We start the random number generator at the same seed as before so we can check that the individual parameter estimates are as expected.
+
+We will simulate 2000 cases.
+"""
+
+# ╔═╡ cd924f92-76eb-11eb-3275-eb4f55282d5d
+sim = parametricbootstrap(MersenneTwister(12321), 2000, m0; β=β, σ=σ, θ=θ);
+
+# ╔═╡ 91c8aa1a-777c-11eb-0989-45fd3982ce8c
+md"""
+`sim` contains results of the simulation in a compact form with several "properties" defined that allow for extraction of quantities of interest.
+"""
+
+# ╔═╡ 0a8b0560-777d-11eb-2a90-25e0581a9407
+typeof(sim)
+
+# ╔═╡ 10ce3212-777d-11eb-13dd-2b02eaa97ef7
+propertynames(sim)
+
+# ╔═╡ 761e4f92-792d-11eb-2792-8fc2ef350344
+md"""
+A bootstrap sample like this can be used to create confidence intervals on the parameters.
+"""
+
+# ╔═╡ 97789364-792d-11eb-2289-af7b5c20812a
+combine(groupby(DataFrame(sim.allpars), [:type, :group, :names]),
+	:value => shortestcovint => :interval)
+
+# ╔═╡ d31d6872-792d-11eb-1864-2ba47afd2c2f
+md"""
+The name "shortestcovint" is a contraction of _shortest coverage interval_.
+We have simulated responses from the model and extracted the parameter estimates then, for each parameter, find the shortest interval that contains 95% of the parameter values.
+
+If we were doing this in a Bayesian context where we sample from the posterior density, these would be _Highest Posterior Density (HPD)_ intervals.
+
+Here we just regard them as 95% individual confidence intervals on the parameters.
+
+We could also consider a density plot of the estimates of individual parameters.
+"""
+
+# ╔═╡ d6b9b05c-792e-11eb-2672-6bbd417ecef9
+data(sim.β) * mapping(:β; layout_x = :coefname) * AOG.density |> draw
+
+# ╔═╡ 22151cfa-777d-11eb-2769-75cc409a9902
+md"""
+If, for example, we wanted to plot the p-values for the fixed-effects coefficients, we extract them as
+"""
+
+# ╔═╡ 598ccad6-777d-11eb-3cac-3fcd0f0f3969
+pvals = sim.coefpvalues;
+
+# ╔═╡ 6a6a3e0e-777d-11eb-09f7-99de6d4a20be
+typeof(pvals)
+
+# ╔═╡ 6fe7f0c6-777d-11eb-0d78-dd28ca754de3
+length(pvals)
+
+# ╔═╡ 802d53a4-777d-11eb-25cc-efc4d9dcf292
+pvalscond = last(groupby(select(DataFrame(pvals), :iter, :coefname, :p), :coefname))
+
+# ╔═╡ 49df45ee-7782-11eb-2363-d562aa91ca14
+extrema(pvalscond.p)
+
+# ╔═╡ 6c74cd86-7928-11eb-1f7f-a3bb10421811
+data(pvalscond) * mapping(:p) * AOG.density |> draw
+
+# ╔═╡ 203b821c-79dd-11eb-2eb0-ebc06dd7e0d5
+md"The _power_ of the z-test at β = 25.0 is the proportion of these p-values that are less than 0.05"
+
+# ╔═╡ 3f7a0d3e-76ec-11eb-368e-17428098bc72
+DataFrame(power_table(sim))  # wrap in DataFrame for display
+
+# ╔═╡ 9622a4f4-79dd-11eb-269c-6dba4d3a708f
+md"""
+To check whether the nominal level of 0.05 for such a test is appropriate we create another simulation with the "true" value of $\beta_1$ set to zero.
+"""
+
+# ╔═╡ ef406a26-79dd-11eb-16fc-ad985d357b0e
+sim1 = parametricbootstrap(MersenneTwister(32123), 2000, m0; σ=σ, θ=θ, β=[400.0,0.0]);
+
+# ╔═╡ 1708c210-79de-11eb-249f-bd0d341cacff
+combine(
+	groupby(DataFrame(sim1.allpars), [:type, :group, :names]),
+	:value => shortestcovint => :interval,
+)
+
+# ╔═╡ 37defa2c-79de-11eb-3932-5d34875a8926
+data(sim1.β) * mapping(:β) * mapping(layout_x = :coefname) * AOG.density |> draw
+
+# ╔═╡ 0b022d3e-79df-11eb-3c67-c91a878166ca
+begin
+	pvals1last = last(groupby(DataFrame(sim1.coefpvalues), :coefname))
+	data(pvals1last) * mapping(:p) * AOG.density |> draw
+end
+
+# ╔═╡ ad5023fc-79df-11eb-1b86-2f8e770e8489
+md"""
+This shape is a good sign because the simulated p-values should have a uniform [0, 1] distribution under the null hypothesis.
+
+A better check than a density plot, which suffers from "edge effects" on a finite range, is a histogram or a quantile-quantile plot versus the $\mathcal{U}[0,1]$ distribution.
+"""
+
+# ╔═╡ b4fc6398-79e1-11eb-2f58-e546d806fe3e
+data(pvals1last) * mapping(:p) * AOG.histogram |> draw
+
+# ╔═╡ f0d2fa88-79e1-11eb-1b84-3116cdcac338
+data((observed = sort(pvals1last.p), expected = collect(inv(4000):inv(2000):1))) * mapping(:observed, :expected) * visual(:QQPlot) |> draw
+
+# ╔═╡ 3651a920-79e8-11eb-080e-49db3336a2fa
+md"""
+Finally, the observed power for the test of $H_0: \beta_1=0$ versus $H_1: \beta_1\ne0$ should be close to 0.05.
+"""
+
+# ╔═╡ 73e34db6-79e8-11eb-261d-878e68e4cfb1
+DataFrame(power_table(sim1))
+
+# ╔═╡ 91e30ad6-79e8-11eb-24af-91147d0ddbb6
+md"""
+## Summary
+
+Determining power and related quantities associated with an experimental design is performed in the following steps:
+
+1. Specify the number of subjects, the number of items, and the names and levels of any between-subject, between-item and within-subject and item experimental factors.
+2. Create the design table using `simdat_crossed`.  A random number generator (rng) can be specified at this point but is not important because the response, `dv`, is _inert_ in this table.
+3. Specify the model formula from which to simulate and create that model.  Recall that the contrasts for the experimental factors are specified in the call to `LinearMixedModel`.
+4. At this point it is a good idea to examine the model matrix for the fixed-effects parameters using, e.g. `Int.(m0.X')` to ensure that the contrasts are coded as expected.
+5. Determine the values of β, σ and θ for the simulation and obtain the simulation object using `parametricbootstrap`.
+6. Perform various diagnostic plots and examine the `power_table` output.
+"""
+
+# ╔═╡ Cell order:
+# ╟─09276d18-76d2-11eb-3b91-b77f2d532e82
+# ╠═40bf1d04-76d0-11eb-0d98-cf2945aa3fd4
+# ╠═654920fa-79de-11eb-0dbb-15a85e6d3c71
+# ╟─28a121e2-76d3-11eb-0896-43da695be0a4
+# ╠═b64a889c-76d0-11eb-1936-5545db219def
+# ╟─ce3be2ec-76e0-11eb-3fb5-67bd31816bca
+# ╠═21958b24-76ed-11eb-11a2-41223c4864f8
+# ╟─b8fe4dd2-76e2-11eb-3a7b-21e2b17043f7
+# ╠═feb89de4-76d0-11eb-115e-89e7ed2fc483
+# ╟─eb911720-76e2-11eb-002d-21a35fd7d4b2
+# ╠═286561a4-76d1-11eb-2211-61d84bc1b9a9
+# ╟─aa915fd2-76ec-11eb-2cec-bff137f10b18
+# ╠═cb5f962a-76ec-11eb-3b1e-6f7f1f68bca5
+# ╟─6914997e-76e3-11eb-2f2b-0501a82bee66
+# ╠═9f327062-76e3-11eb-16dc-7be3a2b1c09f
+# ╠═b0202450-76e3-11eb-3218-8bddeed1c50d
+# ╟─34b8677c-76e4-11eb-1807-9306323a7c9f
+# ╠═48e92618-76d1-11eb-053c-e77449325b9c
+# ╟─7292d5fa-76e4-11eb-243f-8f7ad404d683
+# ╠═761fb192-76d1-11eb-2f8e-890af7ae08d8
+# ╟─91169880-76d1-11eb-28da-0f4135f0e995
+# ╠═d217cc16-76e8-11eb-2f64-6ff4931c87e9
+# ╟─9d46c892-76e9-11eb-008e-67d5100eb3fa
+# ╠═dbd15424-76e9-11eb-2306-436929c4e756
+# ╟─fbdfaf02-76e9-11eb-2404-d9a288cf1d31
+# ╠═167234ae-76ea-11eb-2589-93a3e6940453
+# ╟─5d1b6440-7783-11eb-3bae-45d008439b01
+# ╠═ac43c242-7783-11eb-17d4-19939f282c52
+# ╟─fabb4eae-7783-11eb-13de-157f033b8c12
+# ╟─b75c8580-76eb-11eb-2cca-37a0702e44a5
+# ╠═cd924f92-76eb-11eb-3275-eb4f55282d5d
+# ╟─91c8aa1a-777c-11eb-0989-45fd3982ce8c
+# ╠═0a8b0560-777d-11eb-2a90-25e0581a9407
+# ╠═10ce3212-777d-11eb-13dd-2b02eaa97ef7
+# ╟─761e4f92-792d-11eb-2792-8fc2ef350344
+# ╠═97789364-792d-11eb-2289-af7b5c20812a
+# ╟─d31d6872-792d-11eb-1864-2ba47afd2c2f
+# ╠═d6b9b05c-792e-11eb-2672-6bbd417ecef9
+# ╟─22151cfa-777d-11eb-2769-75cc409a9902
+# ╠═598ccad6-777d-11eb-3cac-3fcd0f0f3969
+# ╠═6a6a3e0e-777d-11eb-09f7-99de6d4a20be
+# ╠═6fe7f0c6-777d-11eb-0d78-dd28ca754de3
+# ╠═802d53a4-777d-11eb-25cc-efc4d9dcf292
+# ╠═49df45ee-7782-11eb-2363-d562aa91ca14
+# ╠═6c74cd86-7928-11eb-1f7f-a3bb10421811
+# ╟─203b821c-79dd-11eb-2eb0-ebc06dd7e0d5
+# ╠═3f7a0d3e-76ec-11eb-368e-17428098bc72
+# ╟─9622a4f4-79dd-11eb-269c-6dba4d3a708f
+# ╠═ef406a26-79dd-11eb-16fc-ad985d357b0e
+# ╠═1708c210-79de-11eb-249f-bd0d341cacff
+# ╠═37defa2c-79de-11eb-3932-5d34875a8926
+# ╠═0b022d3e-79df-11eb-3c67-c91a878166ca
+# ╟─ad5023fc-79df-11eb-1b86-2f8e770e8489
+# ╠═b4fc6398-79e1-11eb-2f58-e546d806fe3e
+# ╠═f0d2fa88-79e1-11eb-1b84-3116cdcac338
+# ╟─3651a920-79e8-11eb-080e-49db3336a2fa
+# ╠═73e34db6-79e8-11eb-261d-878e68e4cfb1
+# ╟─91e30ad6-79e8-11eb-24af-91147d0ddbb6