IntroductionToStats/practical.tex at master · ZamaniShahriar/IntroductionToStats · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
\documentclass[]{article}
\usepackage{lmodern}
\usepackage{amssymb,amsmath}
\usepackage{ifxetex,ifluatex}
\usepackage{fixltx2e} % provides \textsubscript
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
  \usepackage[T1]{fontenc}
  \usepackage[utf8]{inputenc}
\else % if luatex or xelatex
  \ifxetex
    \usepackage{mathspec}
  \else
    \usepackage{fontspec}
  \fi
  \defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase}
\fi
% use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
% use microtype if available
\IfFileExists{microtype.sty}{%
\usepackage{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\usepackage[margin=1in]{geometry}
\usepackage{hyperref}
\hypersetup{unicode=true,
            pdftitle={Introduction to Statistical Analysis},
            pdfauthor={D.-L. Couturier and M. Eldridge (with contributions of M. Dunning and S. Vowler)},
            pdfborder={0 0 0},
            breaklinks=true}
\urlstyle{same}  % don't use monospace font for urls
\usepackage{longtable,booktabs}
\usepackage{graphicx,grffile}
\makeatletter
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
\makeatother
% Scale images if necessary, so that they will not overflow the page
% margins by default, and it is still possible to overwrite the defaults
% using explicit options in \includegraphics[width, height, ...]{}
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
}
\setlength{\emergencystretch}{3em}  % prevent overfull lines
\providecommand{\tightlist}{%
  \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
\setcounter{secnumdepth}{0}
% Redefines (sub)paragraphs to behave more like sections
\ifx\paragraph\undefined\else
\let\oldparagraph\paragraph
\renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
\fi
\ifx\subparagraph\undefined\else
\let\oldsubparagraph\subparagraph
\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
\fi

%%% Use protect on footnotes to avoid problems with footnotes in titles
\let\rmarkdownfootnote\footnote%
\def\footnote{\protect\rmarkdownfootnote}

%%% Change title format to be more compact
\usepackage{titling}

% Create subtitle command for use in maketitle
\newcommand{\subtitle}[1]{
  \posttitle{
    \begin{center}\large#1\end{center}
    }
}

\setlength{\droptitle}{-2em}

  \title{Introduction to Statistical Analysis}
    \pretitle{\vspace{\droptitle}\centering\huge}
  \posttitle{\par}
    \author{D.-L. Couturier and M. Eldridge (with contributions of M. Dunning and S.
Vowler)}
    \preauthor{\centering\large\emph}
  \postauthor{\par}
    \date{}
    \predate{}\postdate{}


\begin{document}
\maketitle

{
\setcounter{tocdepth}{3}
\tableofcontents
}
\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

In this practical, we will use several `real-life' datasets to
demonstrate some of the concepts you have seen in the lectures. We will
guide you through how to analyse these datasets in Shiny and the kinds
of questions you should be asking yourself when faced with similar data.

To answer the questions in this practical we will be using apps that we
have developed using the \href{http://shiny.rstudio.com/gallery/}{Shiny}
add-on for the \emph{R} statistical package. \textbf{R} is a
freely-available open-source software that is popular within academic
and commercial communities. The functionality within the software
compares favourably with other statistical packages (SAS, SPSS and
Stata). The downside is that \textbf{R} has a steep learning-curve and
requires a basic familiarity with command-line software. To ease the
transition we have chosen to present this course using a series of
online tools that will allow you to perform statistical analysis without
having to worry about learning R. At the same time, the R code required
for the analysis will be recorded in the background. You will therefore
be able to repeat the analysis at a later date, or pass on to others. As
you gain familiarity with R through other courses, you will see how the
code generated by Shiny can be adapted to your own needs.

The datasets you will need for this practical should be downloaded and
unzipped now:-
\url{https://rawgit.com/bioinformatics-core-shared-training/IntroductionToStats/master/CourseData.zip}

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

\hypertarget{central-limit-theorem}{%
\section{Central limit theorem}\label{central-limit-theorem}}

{\textbf{Question (i):}}

The tab \textbf{Estimated coverage of Student's CI} in the shiny app
\textbf{central-limit-theorem} displays the confidence intervals of 100
simulated datasets.

\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\tightlist
\item
  Assuming that the simulated data are normally distributed, what is the
  probability of the \textbf{true} mean belonging to a confidence
  interval?
\item
  Let X denote a random variable that equals 1 if the \textbf{true mean
  belongs to the confidence interval} and 0 otherwise. What is the
  distribution of X?\\
\end{enumerate}

{\textbf{Question (ii):}}

Using the shiny app \textbf{central limit theorem}, answer the following
questions:
\url{http://bioinformatics.cruk.cam.ac.uk/apps/stats/central-limit-theorem/}

\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\tightlist
\item
  Simulate \textbf{1000 samples} of \textbf{size n=10} of
  \textbf{Poisson} random variates, first assuming a \textbf{mean of
  0.25}, and then assuming a \textbf{mean of 100}. Compare the coverage
  level of Student's confidence intervals for the mean of these 2
  simulation sets: How do you explain that the latter is better than the
  first one?
\item
  Now consider \textbf{zero-inflated Poisson} variates with a
  \textbf{mean of 30} and a \textbf{10\% probability of belonging to the
  clump-at-zero}. Can you think of a random variable having such a
  distribution? How large should the sample size be for the Student's
  confidence intervals to have good properties?\\
\item
  A student lost a few points in the statistic exams as the use of
  Student's confidence intervals for the probability of success of a
  \textbf{Bernoulli} variable with \textbf{pi = 40\%} and a
  \textbf{n=100} was not considered as suitable. Should he/she contact
  the University to dispute his/her mark?
\end{enumerate}

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

~

\hypertarget{one-sample-tests}{%
\section{One-Sample Tests}\label{one-sample-tests}}

Use our Shiny app
\url{http://bioinformatics.cruk.cam.ac.uk/stats/OneSampleTest} to
perform one-sample location tests.

~

\hypertarget{the-effect-of-disease-on-height}{%
\subsection{The effect of disease on
height}\label{the-effect-of-disease-on-height}}

A scientist knows that the mean height of females in England is
\textbf{165cm} and wants to know whether her patients with a certain
disease ``X'' have heights that differ significantly from the population
mean - we will use a one-sample t-test to test this. The data are
contained in the file \textbf{\texttt{diseaseX.csv}}.

To import the file \texttt{diseaseX.csv}; you will need to select the
\texttt{Choose\ File} option from the \texttt{Data\ Input} tab and
navigate to where the course data are located on your laptop. The
right-hand panel of the \texttt{Data\ Input} tab should update to show
the Heights of various individuals in the study.

Also, on the \texttt{Data\ Input} tab you will need to change the value
of \textbf{Hypothesized mean} to the correct value.

{\textbf{Question:}} What are your null and alternative hypotheses?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

A histogram and boxplot of the \texttt{Height} variable will be
automatically generated for you. To view it, click on the
\textbf{\emph{Data Distribution}}. You can toggle whether to overlay a
density plot on top of the boxplot, or choose different bin sizes for
the histogram.

{\textbf{Question:}} Do the data look normally distributed? Based on the
plots, is the parametric one-sample t --test appropriate?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

We are interested in knowing whether the mean height in our sample of
patients with disease X is different from that of the general
population. Perform a \textbf{one-sample t-test} by clicking the
\textbf{\emph{Statistical Analysis}} tab.

{\textbf{Question:}} What is the mean height in your sample? What is
your value of t? What is the p-value? How do you interpret the p-value?

~

\hypertarget{blood-vessel-formation}{%
\subsection{Blood vessel formation}\label{blood-vessel-formation}}

In blood plasma cancer, there is an increase in blood vessel formation
in the bone marrow. A stem cell transplant can be used as a treatment
for blood plasma cancer. The column \emph{Difference} of the file
\textbf{\texttt{bloodplasmacancer1.csv}} reports the difference in bone
marrow micro vessel density after and before treatment for 7 patients.

We are interested in seeing whether there is a decrease in the bone
marrow micro vessel density after treatment with a stem cell transplant.

Import the file \textbf{\texttt{bloodplasmacancer1.csv}} as described
previously.

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

{\textbf{Question:}} What are your null and alternative hypotheses?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

View the histogram and boxplot of the paired differences on the
\textbf{\emph{Differences}} tab.

{\textbf{Question:}} Do the differences look normally distributed? Is
the parametric t--test appropriate?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

We are interested in seeing whether there is a decrease in the bone
marrow micro vessel density after treatment with a stem cell transplant.

{\textbf{Question:}} Is this a one-tailed or two-tailed test?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

Now select the correct options in the \textbf{\emph{Statistical
Analysis}} tab in order to perform the analysis. Ensure you select the
one- or two-tailed test as appropriate.

{\textbf{Question:}} What is the mean difference? What is your value of
t? What is the p-value? How do you interpret the p-value?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

~

\hypertarget{two-sample-tests}{%
\section{Two-Sample Tests}\label{two-sample-tests}}

Use our Shiny app
\url{http://bioinformatics.cruk.cam.ac.uk/stats/TwoSampleTest} to
perform tests of equality of means/medians.

~

\hypertarget{biological-processes-duration}{%
\subsection{Biological processes
duration}\label{biological-processes-duration}}

In the file \textbf{\texttt{bp\_times.csv}}, we have the durations of a
biological process for two samples of wild-type and knock-out cells
(times in seconds). We are interested in seeing whether there is a
difference in the durations for the two types of cells -- we shall use
an \textbf{independent t-test} to compare the two cell-types.

Import the data using \textbf{\emph{Choose File}} as before. Make sure
that the \textbf{\emph{1st column is a factor?}} checkbox is ticked.

{\textbf{Question:}} What are your null and alternative hypotheses?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

Histograms and boxplots to compare the two groups will be created for
you automatically. You can also see a basic numerical summary of the
data distribution.

{\textbf{Question:}} Do the data look normally distributed for each
cell-type? Is the independent t-test appropriate? What statistics are
appropriate to report the location (mean or median) and spread (sd or
IQR) of the data?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

In order to apply the correct statistical test, we need to test to see
if the variances of the two groups are comparable. This is tested for us
automatically in the Shiny app. Click the \textbf{\emph{Statistical
Analysis}} tab to see the result of the ``F-test''. However, it is often
easier to eye-ball the data to assess the variances.

{\textbf{Question:}} What do you conclude from the p-value of this test.
Does this agree with your impression of the variances from the boxplot
and histograms? How does it influence what test to use?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

Now use the appropriate two-sample t-test to compare the durations of
the two groups.

{\textbf{Question:}} What is your value of the test statistic? What is
the p-value? How do you interpret the p-value?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

~

\hypertarget{blood-vessel-formation-1}{%
\subsection{Blood vessel formation}\label{blood-vessel-formation-1}}

In blood plasma cancer, there is an increase in blood vessel formation
in the bone marrow. A stem cell transplant can be used as a treatment
for blood plasma cancer. The bone marrow micro vessel density was
measured before and after treatment for 7 patients with blood plasma
cancer.

We are interested in seeing whether there is a decrease in the bone
marrow micro vessel density after treatment with a stem cell transplant.
We will use a paired two-sample t-test to compare the before and after
bone marrow micro vessel densities.

The data are contained in the file
\textbf{\texttt{bloodplasmacancer2.csv}}. Import the data, making sure
that \textbf{\emph{\texttt{1st\ column\ is\ a\ factor}}} is \emph{not}
ticked. Now choose whether you will be performing a paired test or not
by ticking the \textbf{Paired Samples?} box under \textbf{\emph{Are your
samples paired?}}.

{\textbf{Question:}} What are your null and alternative hypotheses?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

Check the test statistic and p-value of the test of interest.

{\textbf{Question:}} How is it that they match the ones of ** Exercise
2.2** ?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

~

\hypertarget{gene-expression-in-breast-cancer-patients}{%
\subsection{Gene Expression in Breast Cancer
patients}\label{gene-expression-in-breast-cancer-patients}}

A gene expression study was performed on patients categorised into
positive and negative Estrogen Receptor (ER) groups. It is well-known
that ER positive patients have more treatment options available and thus
have more better prognosis.

The gene NIBP was measured as part of this study and the results are
available in the file \texttt{NIBP.expression.csv}. We are interested to
see if the expression level of the gene is different between ER positive
and negative patients.

{\textbf{Question:}} What are your null and alternative hypotheses?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

Now conduct an independent two-sample t-test to see if there is a
difference in expression between the two groups.

{\textbf{Question:}} What is the p-value from the test? Do we achieve
statistical significance at the 0.05 level?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

Look closely at data distribution, calculated means for each group and
the estimated confidence interval

{\textbf{Question:}} Is the finding likely to hold Biological
significance? Would you be willing to put further resources into
validating the finding?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

~

\hypertarget{vitamin-d-levels}{%
\subsection{Vitamin D levels}\label{vitamin-d-levels}}

The file \textbf{\texttt{vitd.csv}} contains data on vitamin D levels
for subjects with (``Y''), and without (``N'') fibrosis. To import these
data, you will need to select the \textbf{\emph{1st column in a factor}}
option.

{\textbf{Question:}} State the null and alternative hypotheses

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

Examine the distribution of the data.

{\textbf{Question:}} Why doesn't a parametric analysis seem appropriate?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

By un-ticking the \textbf{\emph{Use Parametric Test?}} option in the
\textbf{\emph{Statistical Analysis}} tab you will see the results of a
Mann-Whitney U (/ Wilxcoxon rank-sum test) test.

{\textbf{Question:}} How do you interpret the value of the test?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

~

\hypertarget{birth-weight-of-twins}{%
\subsection{Birth-weight of twins}\label{birth-weight-of-twins}}

Dr D. R. Peterson of the Department of Epidemiology, University of
Washington, collected the data found in file
\textbf{\texttt{twins.csv}}. It consists of the birth-weights of each of
20 dizygous twins. One twin suffered Sudden Infant Death Syndrom (SIDS),
and the other twin did not. The hypothesis to be tested is that the SIDS
child of each pair had a lower birth-weight.

{\textbf{Question:}} State the null and alternative hypothesises

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

Decide on the level of significance to be used and whether the test
should be one-sided or two-sided.

{\textbf{Question:}} Would be appropriate to treat these as paired or
independant samples?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

Recall from the lectures that for paired data we have to consider
whether the differences are symmetrical about zero. Carry out both the
sign-test and Wilcoxon signed rank tests on the data.

{\textbf{Question:}} Do both tests draw the same conclusion about the
data? Which test is the most appropriate?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

~

\hypertarget{disease-association}{%
\subsection{Disease association}\label{disease-association}}

The following table gives the frequencies of wild-type and knock-out
mice developing a disease thought to be associated to the absence of the
knock-out gene.

\begin{longtable}[]{@{}lrrr@{}}
\toprule
~ & WT & KO & Total\tabularnewline
\midrule
\endhead
Disease & 1 & 7 & 8\tabularnewline
No disease & 9 & 3 & 12\tabularnewline
Total & 10 & 10 & 20\tabularnewline
\bottomrule
\end{longtable}

{\textbf{Question:}} What are your null and alternative hypotheses?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

{\textbf{Question:}} What are your expected frequencies?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

Enter the data into the
\href{http://bioinformatics.cruk.cam.ac.uk/stats/contingency-table/}{Shiny
app}. Select the \textbf{Fisher's exact test} option to compare the
proportion of mice in each group that developed the disease.

{\textbf{Question:}} What is your p-value? How do you interpret the
result?

\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}

~

\hypertarget{small-group-exercise-choosing-a-test}{%
\section{Small-Group Exercise: Choosing a
test}\label{small-group-exercise-choosing-a-test}}

In this section, we invite you to form small groups. Each group will be
assigned one of the exercises.

At the end of the time assigned for the exercise we will go through each
of the problems in turn and invite a representative of each group to
present the problem to the rest of the class along with the analysis
(descriptive analysis, statistical tests) the group felt was most
appropriate and any conclusions made.

If time allows, it would be beneficial for groups to familiarize
themselves with some of the other exercises so that they can contribute
to the presentations made by other groups.

You should use this
\href{https://public.etherpad-mozilla.org/p/2019-02-12-intro-to-stats}{interactive
document} to record your observations.

~

\hypertarget{group-exercise-1-plant-growth-data1.csv}{%
\subsection{\texorpdfstring{Group Exercise 1: Plant Growth
\texttt{data1.csv}}{Group Exercise 1: Plant Growth data1.csv}}\label{group-exercise-1-plant-growth-data1.csv}}

Darwin (1876) studied the growth of \emph{pairs} of zea may (aka corn)
seedlings, one produced by cross-fertilization and the other produced by
self-fertilization, but otherwise grown under identical conditions. His
goal was to demonstrate the greater vigour of the cross-fertilized
plants. The data recorded are the final height (inches, to the nearest
1/8th) of the plants in each pair.

{\emph{Is there evidence to support the hypothesis of greater growth in
Cross-fertilized plants?}}

~

\hypertarget{group-exercise-2-florence-nightingale-data2.csv}{%
\subsection{\texorpdfstring{Group Exercise 2: Florence Nightingale
\texttt{data2.csv}}{Group Exercise 2: Florence Nightingale data2.csv}}\label{group-exercise-2-florence-nightingale-data2.csv}}

In the history of data visualization, Florence Nightingale is best
remembered for her role as a social activist and her view that
statistical data, presented in charts and diagrams, could be used as
powerful arguments for medical reform.

After witnessing deplorable sanitary conditions in the Crimea, she wrote
several influential texts (Nightingale, 1858, 1859), including
polar-area graphs (sometimes called ``Coxcombs'' or rose diagrams),
showing the number of deaths in the Crimean from battle compared to
disease or preventable causes that could be reduced by better
battlefield nursing care.

Her Diagram of the Causes of Mortality in the Army in the East showed
that most of the British soldiers who died during the Crimean War died
of sickness rather than of wounds or other causes. It also showed that
the death rate was higher in the first year of the war, before a
Sanitary Commissioners arrived in March 1855 to improve hygiene in the
camps and hospitals.

{\emph{Do the data support the claim that deaths due to avoidable causes
decreased after a change in regime?}}

~

\hypertarget{group-exercise-3-effect-of-bran-on-diet-data3.csv}{%
\subsection{\texorpdfstring{Group Exercise 3: Effect of bran on diet:
\texttt{data3.csv}}{Group Exercise 3: Effect of bran on diet: data3.csv}}\label{group-exercise-3-effect-of-bran-on-diet-data3.csv}}

The addition of bran to the diet has been reported to benefit patients
with diverticulosis. Several different bran preparations are available,
and a clinician wants to test the efficacy of two of them on patients,
since favourable claims have been made for each. Among the consequences
of administering bran that requires testing is the transit time through
the alimentary canal. By random allocation the clinician selects two
groups of patients aged 40-64 with diverticulosis of comparable
severity. Sample 1 contains 15 patients who are given treatment A, and
sample 2 contains 12 patients who are given treatment B.

{\emph{Does transit time differ in the two groups of patients taking
these two preparations?}}

~

\hypertarget{group-exercise-4-effect-of-autism-drug-data4.csv}{%
\subsection{\texorpdfstring{Group Exercise 4: Effect of Autism drug
\texttt{data4.csv}}{Group Exercise 4: Effect of Autism drug data4.csv}}\label{group-exercise-4-effect-of-autism-drug-data4.csv}}

Consider a clinical investigation to assess the effectiveness of a new
drug designed to reduce repetitive behaviors in children affected with
autism. If the drug is effective, children will exhibit fewer repetitive
behaviors on treatment as compared to when they are untreated. A total
of 8 children with autism enroll in the study. Each child is observed by
the study psychologist for a period of 3 hours both before treatment and
then again after taking the new drug for 1 week. The time that each
child is engaged in repetitive behavior during each 3 hour observation
period is measured. Repetitive behavior is scored on a scale of 0 to 100
and scores represent the percent of the observation time in which the
child is engaged in repetitive behavior. For example, a score of 0
indicates that during the entire observation period the child did not
engage in repetitive behavior while a score of 100 indicates that the
child was constantly engaged in repetitive behavior.

{\emph{Is there statistically significant improvement in repetitive
behavior after 1 week of treatment?}}

~

\hypertarget{group-exercise-5-cd4-data5.csv}{%
\subsection{\texorpdfstring{Group Exercise 5: CD4
\texttt{data5.csv}}{Group Exercise 5: CD4 data5.csv}}\label{group-exercise-5-cd4-data5.csv}}

CD4 cells are carried in the blood as part of the human immune system.
One of the effects of the HIV virus is that these cells die. The count
of CD4 cells is used in determining the onset of full-blown AIDS in a
patient. In this study of the effectiveness of a new anti-viral drug on
HIV, 20 HIV-positive patients had their CD4 counts recorded and then
were put on a course of treatment with this drug. After using the drug
for one year, their CD4 counts were again recorded.

{\emph{Do patients taking the drug have increased CD4 counts?}}

~

\hypertarget{group-exercise-6-drink-driving-data6.csv}{%
\subsection{\texorpdfstring{Group Exercise 6: Drink Driving
\texttt{data6.csv}}{Group Exercise 6: Drink Driving data6.csv}}\label{group-exercise-6-drink-driving-data6.csv}}

Drunk driving is one of the main causes of car accidents. Interviews
with drunk drivers who were involved in accidents and survived revealed
that one of the main problems is that drivers do not realize that they
are impaired, thinking ``I only had 1-2 drinks \ldots{} I am OK to
drive.''

A sample of 100 drivers was chosen, and their reaction times in an
obstacle course were measured \emph{before} and \emph{after} drinking
two beers. The purpose of this study was to check whether drivers are
impaired after drinking two beers

{\emph{Does drinking beer alter the reaction time of the driver?}}

~

\hypertarget{group-exercise-7-pollution-in-trees-data7.csv}{%
\subsection{\texorpdfstring{Group Exercise 7: Pollution in Trees
\texttt{data7.csv}}{Group Exercise 7: Pollution in Trees data7.csv}}\label{group-exercise-7-pollution-in-trees-data7.csv}}

Laureysens et al. (2004) measured metal content in the wood of 13 poplar
clones growing in a polluted area, once in August and once in November.
Concentrations of aluminum (in micrograms of Al per gram of wood) are
shown below.

{\emph{Is there any evidence for an increase in pollution between
November and August?}}

~

\hypertarget{group-exercise-8-salaries-for-professors-data8.csv}{%
\subsection{\texorpdfstring{Group Exercise 8: Salaries for Professors
\texttt{data8.csv}}{Group Exercise 8: Salaries for Professors data8.csv}}\label{group-exercise-8-salaries-for-professors-data8.csv}}

The 2008-09 nine-month academic salary for Assistant Professors,
Associate Professors and Professors in a college in the U.S. The data
were collected as part of the on-going effort of the college's
administration to monitor salary differences between male and female
faculty members. (salary given as nine-month salary, in dollars.)

{\emph{Is there evidence that Female professors are paid differently to
their Male counterparts?}}


\end{document}