diff --git a/.RData b/.RData new file mode 100644 index 0000000..f1ef589 Binary files /dev/null and b/.RData differ diff --git a/DESCRIPTION b/DESCRIPTION index 83d08a9..2d5fb81 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -13,4 +13,6 @@ Imports: compositions, robCompositions, testthat, ggplot2 -RoxygenNote: 7.1.1 +RoxygenNote: 7.1.0 +Suggests: knitr, rmarkdown +VignetteBuilder: knitr diff --git a/README.md b/README.md index 7e168d1..e2eb59e 100644 --- a/README.md +++ b/README.md @@ -1,169 +1,37 @@ # The `deltacomp` package -Functions to analyse compositional data and produce predictions (with confidence intervals) for relative increases and decreases in the compositional components +The functions in the `deltacomp` package produce predictions (with confidence intervals) for relative increases and decreases in the compositional parts. -## 1. Background +The development of the package was initiated by Ty Stanford and Dorothea Dumuid in 2018 and is still under development. Changes and corrections are expected to be made during 2021. -For an outcome variable `Y`, *D* compositional variables (`x_1, ..., x_D`) and *C* covariates (`z_1, ..., z_C`); this package fits the compositional data analysis model (notation inexact): +## Installing `deltacomp` -`Y = b_0 + b_1 ilr_1 + ... + b_{D-1} ilr_{D-1} + a_1 z_1 + ... + a_C z_C + e` - -where `ilr_i` are the *D-1* isometric log ratio variables derived from the *D* compositional variables (`x_1, ..., x_D`), `b_0, ..., b_{D-1}, a_1, ..., a_C` are *D+C* parameters to be estimated and `e ~ N(0, sigma)` is the error. The package then makes predictions in alterations of the time-use variables (the linearly dependent set of compositional components) based on this model. - - -For a starting point to learn about compositional data analysis please see [Aitchison (1982)](https://doi.org/10.1111/j.2517-6161.1982.tb01195.x) or [van den Boogaart and Tolosana-Delgado (2013)](https://link.springer.com/book/10.1007%2F978-3-642-36809-7). However the articles [Dumuid et al. (2017a)](https://doi.org/10.1177/0962280217710835) and [Dumuid et al. (2017b)](https://doi.org/10.1177%2F0962280217737805) may be more approachable introductions. - - -## 2. Reallocation of time-use component options - -Please note that the use of 'mean composition' means the geometric mean on the compositional simplex and *not* the arithmetic mean. If these words have little meaning to you, that is no problems as these differently calculated means likely do not differ much in your dataset. `deltacomp` only uses the simplex geometric mean in its calculations from version 0.2.0 onwards. - -### 2.1. Option `comparisons = "prop-realloc"` - -Information on outcome prediction with time-use exchange between one component and the remaining compositional components proportionally (`comparisons = "prop-realloc"` option of the `predict_delta_comps()` function), please see [Dumuid et al. (2017a)](https://doi.org/10.1177/0962280217710835). - -### 2.1.1. Example - -Suppose you have three (predictor) components in a day summing to 1 (e.g., a day) to predict an outcome variable. The three components are `sedentary`, `sleep` and `activity`. Let's assume the mean sampled composition is: - -* `sedentary = 0.5` (i.e., half a day) -* `sleep = 0.3` (i.e., 30% a day) -* `activity = 0.2` (i.e., 20% a day) - -If you wanted to predict the change in the outcome variable from the above mean composition with `delta = +0.05` (5% of the day) is added to `sedentary`, the option `comparisons = "prop-realloc"` reduces the remaining components by the 5% proportionately based on their mean values, illustrated below: - -* `sedentary* = 0.5 + delta = 0.5 + 0.05 = 0.55` -* `sleep* = 0.3 - delta * sleep / (sleep + activity) = 0.3 - 0.05 * 0.3 / (0.3 + 0.2) = 0.3 - 0.03 = 0.27` -* `activity* = 0.2 - delta * activity / (sleep + activity) = 0.2 - 0.05 * 0.2 / (0.3 + 0.2) = 0.2 - 0.02 = 0.18` - -Noting that the new compsition: `sedentary* + sleep* + activity* = 0.55 + 0.27 + 0.18 = 1`. - -Note for the example above, the option `comparisons = "prop-realloc"` in `predict_delta_comps()` will actually automatically produce seperate predictions for a `delta = +0.05` on each of the components against the remaining components. i.e., not only the `sedentary* = 0.5 + delta` scenario as illustrated above but also `sleep* = 0.3 + delta` and `activity* = 0.2 + delta` cases. - -### 2.2. Option `comparisons = "one-v-one"` - -For information on outcome prediction with time-use exchange between two compositional components (i.e., the `comparisons = "one-v-one"` option of the `predict_delta_comps()` function), please see -[Dumuid et al. (2017b)](https://doi.org/10.1177%2F0962280217737805). - -### 2.2.1. Example - -Similarily to the previous example, suppose you have three (predictor) components in a day summing to 1 (i.e. a day) to predict an outcome variable. The three components are `sedentary`, `sleep` and `activity`. Let's assume the mean sampled composition is: - -* `sedentary = 0.5` (i.e., half a day) -* `sleep = 0.3` (i.e., 30% a day) -* `activity = 0.2` (i.e., 20% a day) - -If you wanted to predict the change in the outcome variable from the above mean composition with `delta = +0.05` (5% of the day), the option `comparisons = "one-v-one"` looks at all pairwise exchanges between the components `(sedentary*, sleep*, activity*)`: - -* `(0.5 + 0.05, 0.3 - 0.05, 0.2 )` -* `(0.5 + 0.05, 0.3 , 0.2 - 0.05)` -* `(0.5 , 0.3 + 0.05, 0.2 - 0.05)` -* `(0.5 - 0.05, 0.3 + 0.05, 0.2 )` -* `(0.5 - 0.05, 0.3 , 0.2 + 0.05)` -* `(0.5 , 0.3 - 0.05, 0.2 + 0.05)` - - -### 2.3. Option `comparisons = "one-v-all"` - -Depreciated. - - -## 3. Datasets in package - -Two datasets are supplied with the package: - -* `fairclough` and -* `fat_data`. - -The `fairclough` dataset was kindly provided by the authors of [Fairclough et al. (2017)](https://doi.org/10.1186/s12966-017-0521-z). `fat_data` is a randomly generated test dataset that might roughly mimic a real dataset. - -## 4. Example usage +Run the following code to install and load the `deltacomp` package ```R library(devtools) # see https://www.r-project.org/nosvn/pandoc/devtools.html devtools::install_github('tystan/deltacomp') library(deltacomp) -### see help file to run example -?predict_delta_comps - -predict_delta_comps( - dataf = fat_data, - y = "fat", - comps = c("sl", "sb", "lpa", "mvpa"), - covars = c("sibs", "parents", "ed"), - deltas = seq(-60, 60, by = 5) / (24 * 60), - comparisons = "prop-realloc", - alpha = 0.05 -) - -# OR - -predict_delta_comps( - dataf = fat_data, - y = "fat", - comps = c("sl", "sb", "lpa", "mvpa"), - covars = c("sibs", "parents", "ed"), - deltas = seq(-60, 60, by = 5) / (24 * 60), - comparisons = "one-v-one", - alpha = 0.05 -) ``` - -## 5. Output and plotting results - -Output is a `data.frame` that can be turned into the plot below using the following code. +The following code are run to see help files: ```R - -pred_df <- - predict_delta_comps( - dataf = fairclough, - y = "z_bmi", - comps = c("sleep", "sed", "lpa", "mvpa"), - covars = c("decimal_age", "sex"), - # careful deltas greater than 25 min in magnitude induce negative compositions - # predict_delta_comps() will warn you about this :-) - deltas = seq(-20, 20, by = 5) / (24 * 60), - comparisons = "prop-realloc", # or try "one-v-one" - alpha = 0.05 - ) - -plot_delta_comp( - pred_df, # provide the returned object from predict_delta_comps() - # x-axis can be converted from propotion of composition to meaningful units - comp_total = 24 * 60, # minutes available in the composition - units_lab = "min" # just a label for plotting -) - - +?predict_delta_comps ``` +## How to use `deltacomp`? -![](https://github.com/tystan/deltacomp/blob/master/inst/img/delta_comps2.png) - - -### 5.1. Prediction for the mean composition - -The function `predict_delta_comps()` now outputs the predicted outcome value (with `100 * (1 - alpha)`% confidence interval). This data is printed to the console but also can be extracted from the output of `predict_delta_comps()` as per the below code: +Please see the package vignette for examples of what the `deltacomp` package can do. To view, run the following: ```R - -# produces a 1 line data.frame that contains -# the (simplex/geometric) mean composition, -# the "average" covariates (the median of the factor variables in order of the levels are taken as default), -# the ilr coords of the (simplex/geometric) mean composition, and -# the predicted outcome value with 100*(1-alpha)% confidence interval -attr(pred_df, "mean_pred") - - +vignette("deltacomp vignette") ``` - -## 6. Release notes +## Release notes See [/change-notes.md](https://github.com/tystan/deltacomp/blob/master/change-notes.md). diff --git a/tests/testthat/test_create_seq_bin_part.R b/tests/testthat/test_create_seq_bin_part.R new file mode 100644 index 0000000..9a1a3e1 --- /dev/null +++ b/tests/testthat/test_create_seq_bin_part.R @@ -0,0 +1,10 @@ +context("create_seq_bin_part() checks") + +test_that("create_seq_bin_part() throws error if wrong inputs", { + + expect_error(create_seq_bin_part("b")) + expect_error(create_seq_bin_part(c(2,3))) + +}) + + diff --git a/tests/testthat/test_extract_lm_quantities.R b/tests/testthat/test_extract_lm_quantities.R new file mode 100644 index 0000000..03187d6 --- /dev/null +++ b/tests/testthat/test_extract_lm_quantities.R @@ -0,0 +1,28 @@ +context("extract_lm_quantities() checks") + +x <- runif(10) +y <- 3 * x + 7 + rnorm(10) +example_lm1 <- lm(y ~ x) + +test_that("extract_lm_quantities() correctly throws errors for bad input", { + + expect_error( + extract_lm_quantities(example_lm1, "alpha") + ) + + expect_error( + extract_lm_quantities(y~x, "alpha") + ) + +}) + +test_that("extract_lm_quantities() is a list", { + + expect_output(str(extract_lm_quantities(example_lm1)), "List of 5") + + expect_output(str(extract_lm_quantities(example_lm1)), "List of 5", fixed=T) + +}) + + + diff --git a/vignettes/Deltacomp vignette.Rmd b/vignettes/Deltacomp vignette.Rmd new file mode 100644 index 0000000..5a45f85 --- /dev/null +++ b/vignettes/Deltacomp vignette.Rmd @@ -0,0 +1,203 @@ +--- +title: "deltacomp vignette" +author: "Ty Stanford, Charlotte Lund Rasmussen, Dorothea Dumuid" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{deltacomp vignette} + %\VignetteEngine{knitr::rmarkdown} + \usepackage[UTF-8]{inputenc} +--- + +```{r setup, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +_ADD NAME EXPLANATION AND WHY THIS NAME_ +CLR likes this name: codarealloclm (coda = compositional data analysis, reallocation = what where doing, lm = model type) + +The goal of deltacomp is to provide ready-to-use functions enabling analysis of compositional data and produce predictions for relative increases and decreases in the compositional parts. + + +In the following, we provide examples of how to use the package main functions: `predict_delta_comps` and `plot_delta_comp`. + +## 1. Background: compositional isotemporal substitution analysis + +For an outcome variable Y, D compositional parts (x_1, ..., x_D) and C covariates (z_1, ..., z_C); this package fits the compositional data analysis model (notation inexact): + +Y = b_0 + b_1 ilr_1 + ... + b_{D-1} ilr_{D-1} + a_1 z_1 + ... + a_C z_C + e + +where ilr_i are the D-1 isometric log ratio variables derived from the D compositional parts (x_1, ..., x_D), b_0, ..., b_{D-1}, a_1, ..., a_C are D+C parameters to be estimated and e ~ N(0, sigma) is the error. The package then makes predictions in alterations of the compositional variables (the linearly dependent set of compositional parts) based on this model. + +For a starting point to learn about compositional data analysis please see [Aitchison (1982)](https://doi.org/10.1111/j.2517-6161.1982.tb01195.x) or [van den Boogaart and Tolosana-Delgado (2013)](https://link.springer.com/book/10.1007%2F978-3-642-36809-7). + +However, the articles [Dumuid et al. (2017a)](https://doi.org/10.1177/0962280217710835), [Dumuid et al. (2017b)](https://doi.org/10.1177%2F0962280217737805), and [Dumuid et al. (2020)](https://doi.org/10.3390/ijerph17072220) may be more approachable introductions both to compositional data analysis and compositional isotemporal substitution analysis. + +## 2. Datasets in package + +Two datasets are supplied with the package: + +* `fairclough` and +* `fat_data`. + +The `fairclough` dataset was kindly provided by the authors of [Fairclough et al. (2017)](https://doi.org/10.1186/s12966-017-0521-z). `fat_data` is a randomly generated test dataset that might roughly mimic a real dataset. + +## 3. Options for reallocating of time between compositional parts + +The deltacomp package enables either one-to-remaining or one-to-one reallocation between the compositional parts. Both types of time-use reallocations are done proportionally. + +One-from-remaining reallocation allows for time-use exchange between one compositional part and the remaining compositional parts. In the `predict_delta_comps` function, this type of reallocation is chosen when using the `comparisons = "prop-realloc"` option. +For a detailed description of one-to-remaining time-use reallocation, please see [Dumuid et al. (2017a)](https://doi.org/10.1177/0962280217710835) + +One-to-one reallocation enables time-use exchange between two compositional parts. Note that this reallocation can be done independent of the number of parts in the composition. This type of reallocation is chosen when using the `comparisons = "one-v-one"` option of the `predict_delta_comps()` function. +For a detailed description of one-to-one time-use reallocation, please see +[Dumuid et al. (2017b)](https://doi.org/10.1177%2F0962280217737805). + +Note that the `predict_delta_comps()` function removes rows with `NA` values in the input dataset (as warned). + +### 3.1. Example of one-from-remaning realloction (`comparisons = "prop-realloc"` option) + +Suppose you have 3-part composition summing to 1 (e.g. a day) to predict an outcome variable. The three compositonal parts are time spent `active` `sedentary`, and `sleeping`. Let's assume the mean sampled composition is: + +* `active = 0.2` (i.e., 20% a day) +* `sedentary = 0.5` (i.e., 50% a day) +* `sleeping = 0.3` (i.e., 30% a day) + +If you wanted to predict the change in the outcome variable from the above mean composition when `delta = +0.05` (5% of the day) is added to `sedentary`, the option `comparisons = "prop-realloc"` reduces the remaining parts by the 5% proportionately based on their mean values, illustrated below: + +* `sedentary* = 0.5 + delta = 0.5 + 0.05 = 0.55` +* `active* = 0.2 - delta * active / (sleeping + active) = 0.2 - 0.05 * 0.2 / (0.3 + 0.2) = 0.2 - 0.02 = 0.18` +* `sleeping* = 0.3 - delta * sleeping / (sleeping + active) = 0.3 - 0.05 * 0.3 / (0.3 + 0.2) = 0.3 - 0.03 = 0.27` + +By reducing the parts with 5% proportionally, the new composition still sums to 1: `sedentary* + active* + sleeping* = 0.55 + 0.18 + 0.27 = 1`. + +Of note, in the example above, the option `comparisons = "prop-realloc"` in `predict_delta_comps()` will automatically produce separate predictions for a `delta = +0.05` on each of the parts against the remaining parts. I.e. not only the `sedentary* = 0.5 + delta` scenario as illustrated above but also `sleep* = 0.3 + delta` and `active* = 0.2 + delta` cases. + +### 3.2. Example of one-to-one realloction (`comparisons = "one-v-one"` option) + +Similarily to the previous example, suppose you have 3-part composition summing to 1 (e.g. a day) to predict an outcome variable. The three compositonal parts are time spent `active` `sedentary`, and `sleeping`. Let's assume the mean sampled composition is: + +* `active = 0.2` (i.e., 20% a day) +* `sedentary = 0.5` (i.e., 50% a day) +* `sleeping = 0.3` (i.e., 30% a day) + +If you wanted to predict the change in the outcome variable from the above mean composition with `delta = +0.05` (5% of the day), the option `comparisons = "one-v-one"` looks at all pairwise exchanges between the parts `(sedentary*, sleeping*, active*)`: + +* `(0.5 + 0.05, 0.3 - 0.05, 0.2 )` +* `(0.5 + 0.05, 0.3 , 0.2 - 0.05)` +* `(0.5 , 0.3 + 0.05, 0.2 - 0.05)` +* `(0.5 - 0.05, 0.3 + 0.05, 0.2 )` +* `(0.5 - 0.05, 0.3 , 0.2 + 0.05)` +* `(0.5 , 0.3 - 0.05, 0.2 + 0.05)` + + +## 4. Example usage + +The following code will install the package. + +```{r} + +library(devtools) # see https://www.r-project.org/nosvn/pandoc/devtools.html +devtools::install_github('tystan/deltacomp') +library(deltacomp) + +``` + + +The following code will run either a one-from-remaining realloction or a one-to-one reallocation. + +```{r} + + +#example of one-from-remaining reallocation + +predict_delta_comps( + dataf = fat_data, + y = "fat", + comps = c("sl", "sb", "lpa", "mvpa"), + covars = c("sibs", "parents", "ed"), + deltas = seq(-60, 60, by = 5) / (24 * 60), + comparisons = "prop-realloc", + alpha = 0.05 +) + + +#example of one-to-one reallocation + +predict_delta_comps( + dataf = fat_data, + y = "fat", + comps = c("sl", "sb", "lpa", "mvpa"), + covars = c("sibs", "parents", "ed"), + deltas = seq(-60, 60, by = 5) / (24 * 60), + comparisons = "one-v-one", + alpha = 0.05 +) +``` + +The following details are provided in the console output: +- quartiles of the summed composition +- details on 'average' case of the considered covariates used for prediction +- ilr transformation of the compositional parts +- summary of linear model results +- results of a statistical test for the ilrs being collectively significant in the model +- details on the input variables considered for the predictions (e.g. geometric mean compostion, 'average' covariates) as well as the predicted outcome variables. + +## 5. Output and plotting results + +Output of the `predict_delta_comp` is a `data.frame` that can be turned into the plot below using the following code. + +```{r} + +pred_df <- + predict_delta_comps( + dataf = fairclough, + y = "z_bmi", + comps = c("sleep", "sed", "lpa", "mvpa"), + covars = c("decimal_age", "sex"), + # careful deltas greater than 25 min in magnitude induce negative compositions + # predict_delta_comps() will warn you about this :-) + deltas = seq(-20, 20, by = 5) / (24 * 60), + comparisons = "prop-realloc", # or try "one-v-one" + alpha = 0.05 + ) + +plot_delta_comp( + pred_df, # provide the returned object from predict_delta_comps() + comp_total = 24 * 60, # minutes available in the composition + units_lab = "min" # just a label for plotting +) +``` +In this example we choose to to reallocate proportions of the time-use composition in minutes. +However, the x-axis can be converted from proportions of composition to meaningful units. + +### 5.1. Prediction for the mean composition + +It is also possible to extract predictions for the mean composition from the output of `predict_delta_comps()` using the following code: + +```{r} +attr(pred_df, "mean_pred") +``` + +This will provide a one-line `data frame` that contains: +- the (geometric) mean composition, +- the ilr coordinates of the (geometric) mean composition, +- the "average" covariates (i.e. the median of the factor variables in order of the levels are taken as default), +- the predicted outcome value ("fit") +- the 100*(1-alpha)% confidence intervals related to the predicted outcome ("lwr" and "upr", respectively) + + +## 6. Future extensions + +We envision to build on the work done for developing the `deltacomp` package and aim for future extensions. + +These will include: +- modelling of with log-transformed outcomes +- modelling of binary outcomes +- non-linear modelling, and +- considerations of repeated measurements + +