From 2c321134f69c285bad7a8cb93c287c2c0c1a6e15 Mon Sep 17 00:00:00 2001
From: maibennett <mb3863@tc.columbia.edu>
Date: Thu, 29 Sep 2016 00:38:27 -0400
Subject: [PATCH] Assignment for class 7

---
 Class 7 Instructions.Rmd | 285 ++++++++++++++++++++-------------------
 Rplot.png                | Bin 0 -> 3065 bytes
 2 files changed, 148 insertions(+), 137 deletions(-)
 create mode 100644 Rplot.png

diff --git a/Class 7 Instructions.Rmd b/Class 7 Instructions.Rmd
index 5ae641a..fbe0051 100644
--- a/Class 7 Instructions.Rmd	
+++ b/Class 7 Instructions.Rmd	
@@ -1,137 +1,148 @@
----
-title: "Assignment 3"
-author: "Charles Lang"
-date: "February 13, 2016"
----
-##In this assignment you will be practising data tidying. You will be using the data we have collected from class and data generated from the instructor wearing a wristband activity tracker.
-
-##First, you need to import into R a data set containing information about Charles' activity for the last three weeks. You can find this data set within the Assignment 3 repository you cloned to create this project.
-
-##Install packages for manipulating data
-We will use two packages: tidyr and dplyr
-```{r}
-#Insall packages
-install.packages("tidyr", "dplyr")
-#Load packages
-library(tidyr, dplyr)
-```
-
-##Upload wide format instructor data (instructor_activity_wide.csv)
-```{r}
-data_wide <- read.table("~/Documents/NYU/EDCT2550/Assignments/Assignment 3/instructor_activity_wide.csv", sep = ",", header = TRUE)
-
-#Now view the data you have uploaded and notice how its structure: each variable is a date and each row is a type of measure.
-View(data_wide)
-
-#R doesn't like having variable names that consist only of numbers so, as you can see, every variable starts with the letter "X". The numbers represent dates in the format year-month-day.
-
-
-```
-
-##This is not a convenient format for us to analyze. What we need is for each type of measure to be a column. Your fisrt task is to convert wide format to long format data. To do this we will use the "gather" function: gather(data, time, variables)
-
-The gather command requires the following input arguments:
-
-- data: Data object
-- key: Name of new key column (made from names of data columns)
-- value: Name of new value column
-- ...: Names of source columns that contain values
-
-```{r}
-data_long <- gather(data_wide, date, variables)
-#Rename the variables so we don't get confused about what is what!
-names(data_long) <- c("variables", "date", "measure")
-#Take a look at your new data, looks weird huh?
-View(data_long)
-```
-##Now convert this long format into separate columns using the "spread" function to separate by the type of measure
-
-The spread function requires the following input:
-
-- data: Data object
-- key: Name of column containing the new column names
-- value: Name of column containing values
-
-```{r}
-instructor_data <- spread(data_long, variables, measure)
-```
-
-##Now we have a workable instructor data set!The next step is to create a workable student data set. Upload the data "student_activity.csv". View your file once you have uploaded it and then draw on a piece of paper the structure that you want before you attempt to code it. Write the code you use in the chunk below. (Hint: you can do it in one step)
-
-```{r}
-
-```
-
-##Now that you have workable student data set, subset it to create a data set that only includes data from the second class. 
-
-To do this we will use the dplyr package (We will need to call dplyr in the command by writing dplyr:: because dplyr uses commands that exist in other packages but to do different operations.) 
-
-Notice that the way we subset is with a logical rule, in this case date == 20160204. In R, when we want to say that something "equals" something else we need to use a double equals sign "==". (A single equals sign means the same as <-).
-
-```{r}
-student_data_2 <- dplyr::filter(student_data, date == 20160204)
-```
-
-Now subset the student_activity data frame to create a data frame that only includes students who have sat at table 4. Write your code in the following chunk:
-
-```{r}
-
-```
-
-##Make a new variable
-
-It is useful to be able to make new variables for analysis. We can either apend a new variable to our dataframe or we can replace some variables with a new variable. Below we will use the "mutate" function to create a new variable "total_sleep" from the light and deep sleep variables in the instructor data.
-
-```{r}
-instructor_data <- dplyr::mutate(instructor_data, total_sleep = s_deep + s_light)
-```
-
-Now, refering to the cheat sheet, create a data frame called "instructor_sleep" that contains ONLY the total_sleep variable. Write your code in the following code chunk:
-
-```{r}
-
-```
-
-Now, we can combine several commands together to create a new variable that contains a grouping. The following code creates a weekly grouping variable called "week" in the instructor data set:
-
-```{r}
-instructor_data <- dplyr::mutate(instructor_data, week = dplyr::ntile(date, 3))
-```
-
-Create the same variables for the student data frame, write your code in the code chunk below:
-```{r}
-
-```
-
-##Sumaraizing
-Next we will summarize the student data. First we can simply take an average of one of our student variables such as motivation:
-
-```{r}
-student_data %>% dplyr::summarise(mean(motivation))
-
-#That isn't super interesting, so let's break it down by week:
-
-student_data %>% dplyr::group_by(date) %>% dplyr::summarise(mean(motivation))
-```
-
-Create two new data sets using this method. One that sumarizes average motivation for students for each week (student_week) and another than sumarizes "m_active_time" for the instructor per week (instructor_week). Write your code in the following chunk:
-
-```{r}
-
-```
-
-##Merging
-Now we will merge these two data frames using dplyr. 
-
-```{r}
-merge <- dplyr::full_join(instructor_week, student_week, "week")
-```
-
-##Visualize
-Visualize the relationship between these two variables (mean motivation and mean instructor activity) with the "plot" command and then run a Pearson correlation test (hint: cor.test()). Write the code for the these commands below:
-
-```{r}
-
-```
-
-Fnally save your markdown document and your plot to this folder and comit, push and pull your repo to submit.
+---
+  title: "Assignment 3"
+author: "Charles Lang"
+date: "February 13, 2016"
+output: pdf_document
+---
+  ##In this assignment you will be practising data tidying. You will be using the data we have collected from class and data generated from the instructor wearing a wristband activity tracker.
+  
+  ##First, you need to import into R a data set containing information about Charles' activity for the last three weeks. You can find this data set within the Assignment 3 repository you cloned to create this project.
+  
+  
+  ##Install packages for manipulating data
+  We will use two packages: tidyr and dplyr
+```{r}
+#Insall packages
+install.packages("tidyr", "dplyr")
+#Load packages
+library(tidyr, dplyr)
+```
+
+##Upload wide format instructor data (instructor_activity_wide.csv)
+```{r}
+data_wide <- read.table("C:/Users/Magdalena Bennett/Dropbox/PhD Columbia/Fall 2016/Core Methods in EDM/class7-master/instructor_activity_wide.csv", sep = ",", header = TRUE)
+
+#Now view the data you have uploaded and notice how its structure: each variable is a date and each row is a type of measure.
+View(data_wide)
+
+#R doesn't like having variable names that consist only of numbers so, as you can see, every variable starts with the letter "X". The numbers represent dates in the format year-month-day.
+
+
+```
+
+##This is not a convenient format for us to analyze. What we need is for each type of measure to be a column. Your fisrt task is to convert wide format to long format data. To do this we will use the "gather" function: gather(data, time, variables)
+
+The gather command requires the following input arguments:
+  
+  - data: Data object
+- key: Name of new key column (made from names of data columns)
+- value: Name of new value column
+- ...: Names of source columns that contain values
+
+```{r}
+data_long <- gather(data_wide, date, variables)
+#Rename the variables so we don't get confused about what is what!
+names(data_long) <- c("variables", "date", "measure")
+#Take a look at your new data, looks weird huh?
+View(data_long)
+```
+##Now convert this long format into separate columns using the "spread" function to separate by the type of measure
+
+The spread function requires the following input:
+  
+  - data: Data object
+- key: Name of column containing the new column names
+- value: Name of column containing values
+
+```{r}
+instructor_data <- spread(data_long, variables, measure)
+```
+
+##Now we have a workable instructor data set!The next step is to create a workable student data set. Upload the data "student_activity.csv". View your file once you have uploaded it and then draw on a piece of paper the structure that you want before you attempt to code it. Write the code you use in the chunk below. (Hint: you can do it in one step)
+
+```{r}
+data_stu_raw <- read.table("C:/Users/Magdalena Bennett/Dropbox/PhD Columbia/Fall 2016/Core Methods in EDM/class7-master/student_activity.csv", sep = ",", header = TRUE)
+
+View(data_stu_raw)
+
+student_data<-spread(data_stu_raw, variable, measure)
+
+```
+
+##Now that you have workable student data set, subset it to create a data set that only includes data from the second class. 
+
+To do this we will use the dplyr package (We will need to call dplyr in the command by writing dplyr:: because dplyr uses commands that exist in other packages but to do different operations.) 
+
+Notice that the way we subset is with a logical rule, in this case date == 20160204. In R, when we want to say that something "equals" something else we need to use a double equals sign "==". (A single equals sign means the same as <-).
+
+```{r}
+student_data_2 <- dplyr::filter(student_data, date == 20160204)
+```
+
+Now subset the student_activity data frame to create a data frame that only includes students who have sat at table 4. Write your code in the following chunk:
+  
+```{r}
+student_data_3 <- dplyr::filter(student_data, table == 4)
+```
+
+##Make a new variable
+
+It is useful to be able to make new variables for analysis. We can either apend a new variable to our dataframe or we can replace some variables with a new variable. Below we will use the "mutate" function to create a new variable "total_sleep" from the light and deep sleep variables in the instructor data.
+
+```{r}
+instructor_data <- dplyr::mutate(instructor_data, total_sleep = s_deep + s_light)
+```
+
+Now, refering to the cheat sheet, create a data frame called "instructor_sleep" that contains ONLY the total_sleep variable. Write your code in the following code chunk:
+  
+```{r}
+instructor_sleep<-dplyr::select(instructor_data,total_sleep)
+```
+
+Now, we can combine several commands together to create a new variable that contains a grouping. The following code creates a weekly grouping variable called "week" in the instructor data set:
+  
+```{r}
+instructor_data <- dplyr::mutate(instructor_data, week = dplyr::ntile(date, 3))
+```
+
+Create the same variables for the student data frame, write your code in the code chunk below:
+
+```{r}
+student_data <- dplyr::mutate(student_data, week = dplyr::ntile(date, 3))
+```
+
+##Sumaraizing
+Next we will summarize the student data. First we can simply take an average of one of our student variables such as motivation:
+  
+```{r}
+student_data %>% dplyr::summarise(mean(motivation))
+
+#That isn't super interesting, so let's break it down by week:
+
+student_data %>% dplyr::group_by(date) %>% dplyr::summarise(mean(motivation))
+```
+
+Create two new data sets using this method. One that sumarizes average motivation for students for each week (student_week) and another than sumarizes "m_active_time" for the instructor per week (instructor_week). Write your code in the following chunk:
+  
+```{r}
+student_week<-student_data %>% dplyr::group_by(week) %>% dplyr::summarise(mean(motivation))
+instructor_week<-instructor_data %>% dplyr::group_by(week) %>% dplyr::summarise(mean(m_active_time))
+```
+
+##Merging
+Now we will merge these two data frames using dplyr. 
+
+```{r}
+merge <- dplyr::full_join(instructor_week, student_week, "week")
+names(merge)<-c("week","mean_act_time","mean_motiv")
+```
+
+##Visualize
+Visualize the relationship between these two variables (mean motivation and mean instructor activity) with the "plot" command and then run a Pearson correlation test (hint: cor.test()). Write the code for the these commands below:
+  
+```{r}
+plot(merge$mean_act_time,merge$mean_motiv,xlim=c(5500,7000),ylim=c(1.5,1.9))
+cor.test(merge$mean_act_time,merge$mean_motiv)
+```
+
+Fnally save your markdown document and your plot to this folder and comit, push and pull your repo to submit.
diff --git a/Rplot.png b/Rplot.png
new file mode 100644
index 0000000000000000000000000000000000000000..e5b5a1f58500a7789998165999f2b9f6ef96a0b0
GIT binary patch
literal 3065
zcmdT`c{J2}8=p{gZB0!^n`JDeaG7Z=gAvV)QFbAcb(-$jrb(2Hq__>j3>AZrNUp7m
z#xk<h#I&HuQbUS~GS+0z%=_a#_rLp|`=0au_5RNJp6B~4=lOk~=kq+z_Z#nEXC?c~
z?q5J4kgPQx;{*b2;sT2*B@Rfa>O3|;hz>R<umA;tPyho$(LqEOutR}RC=d|^d_*FO
z&LeK<fE|&=BC<qu6o`&O(dk4wPsAdE)@3|8OT=P{L_9i*$6FWkL?S>3@BjhC0GAru
z76$xh>p8r8C<r9?Y<+ENBR?d7K+?w67>g5^vM28*h2Li>;O=rr<Voh`kfY=g6V>D)
zkw&6=bBBbU)V|G)6Hi>7w@cprWVV7t1$C&jj8=(2cySo^NP!d$vWWpxagx9!T*ZU8
zX^H(`3A-_kYyFqxaM7|Dgf(A4wjZ0D6)t<85VhoIz`jqmb*xo9Y*gUo46EoX!0aQ3
z3Y3(+I?mYNE_h{oGoJeZ7jkqJRb!+W6VIKSozl8+t0>C58&zmjo0P;2bw_@<D6Rx%
z?p9;eC5mGbd~{M<>@?wh+;81VijuY^QnbB(SRF~2WPi5X_ZvB2W6tQfU|8QH@QF--
zt7!n1zdYGimQv9>v!K$&oH&)0cyKB$FXcjF*)v1Q(vdaacbROX|CE34Bm9Kk_r$zk
z&)XgvLeQage{|*(H}JXEfM4M@dQS{x_-{9~IDjh~Y&scx*OfW^)Mz-<wI(nGQQ`-K
z>%WcSPwx5B#fhUhs9`g7yNcf2OYz+8?elI0`6k!u3IOy|45MhReDL10hhf%$4NY-^
zDy`Pr$S1^)0IskLUj#Fdp2OXp&MU0%DsX*gtU|g@DO>0r{!K!<#?N8C7_n{_$OoZ|
zeZE0uv%B0WzI$<}I^}JZk7ah+aA?%1hClIENw4i=xm}CNZTECu1S`9r4Y>=3wt}mE
zQyTcugOtDDJHXM-lr(;=290w}vVlTdeY>r^RNBvK+ThB>3+bcXQVQ+sRSJe>$*b>r
ztQ()<C@B3r{;pOpc`1Sqf|1Z}*q^57rIHTkmca5;esn?_VsM$ANU+*>3dhGvt$i$k
zM)Ih$h=|FM33qCZqx8#!ZwH8?Y<So6I11-**#N2W$jQK+zIV7aBAj6*%|_;LJK&HB
z3`6^o6?!u8v?3Um=)R>0@WP?4w2U#NV@)J5pK|tr^I1I_4^sgT7o}*;rt2-$%tC?v
z6eN_ql6iB2ky2H8*p3}FDh}uddXK+5M@OnbD4Y-pK36ws?PA2o^WdN07`cR>BeSh2
zCBX8nRsB2T=OGVCVP(^lrA@!mY;0ctxZhvT5H#l@OH0y^$_p$Aem0|wNlA4LV~O*=
zhv2ppy=#gL7xv1GlT588;ff!6tv2$HEhvP!H;&*p{6IN`gIcPkT>!A6NQ>F=YdP$K
z<<}T$lD6NSulP9<_JXo8e0FLA$S>DXK%7FH1~V1WCl!DhkujipB(w7;=cWf)-pzWB
zx$nM7J3z8(BlyAiIVO{aPW8NSTph>H;ILl+<=D?r6XT6z9_^k`Pn1R)qJQ&wRJGk;
z?6YA~aG1a+bUy486qkNW24g4AfE7T$k!6hJOt(aOQ1nvYMe*N8<$1b#_q(PutMB)A
zU*TzQ#SB&<%ye)1=X3o++iz~wfbBrk;ifEK9ZT|5bYA5S3yrHwzhtk_xO^Ms#~}v3
z11R!6IYzv%()vi#d+(YK*S&c?I1Kiv%uYfO999KXFPvPGr37tl0YK_^-7hw`KL4KH
z%&H&`s$b=DI5Ob|s@(&=KDNzfWo^MaUni}>%#0-+@euw8L-ARq!O4mzHHB(${n=An
zUapQcA~$)vNKdmnkpOZ{XPSGsUW^-~eJh@TMd%7w80B7Wpj6B-j`!Mjj&v*1?Ua4w
zMJs${WM_7*o?J-;VVCBCC#ST4Y_oden{jSnzbz)tXqr0E3h?O^_4cBj+L?*K1gPET
zBoT50(!vkNcTq>%M%M|k?dt~DI=;dE&QrzuY=lkQ7O|=&t_6hZhbyZJpSDSo9{B`>
zS$p(IEC3gW4Zx*tzw4w55CR-%2fco9OiQwqOgM&1g%17z4%X}OU)&KsFnG{4og3(J
zOM>47W<~n~P*QPBu7|4iW`wV1<&`Z65*lHV1Zlw&(sorEE=t07_#@XIdVVyMHEn|a
zh0!AgYhZ2=${7XA)z==l{B<E+`}|Ulgbwv{R@<n#2e%LB*8MrIMt-my%TbKa#*~j+
zXZNQmI2cfzva7FF#-|MzY`$fi4cV`g7c^X8MA~#_$NuNm(g(7PwCp^NyysX`y2M^;
z+9jm<MEt|8XKBd@PKY^P9&_H=Jc@0GkBQvNTR2_Jq6AeGl}<;RzGmmd!gyB8)sls;
z3_l6!3o4txJUc%BF@&8^6YozoOJ43vb0B0z78y&W@D8%luR~&=CNFj>CvIq!pu~^!
z>X3*5;rVT;hg_olBk8c$g~QPYK58b<&c~)+46WFIk2`Gca=eL+k|Ja0r|Zn{;0Fz&
zvqR1ewtGhe6iE5Qh`uyCK54qrcPIOm#czWp6_XYTMr}#t-&y;f^EmpAIiXqMp}q43
zIafR{-olY*2A#cU4Jc1*h<%}rPM5rrxu|b>=8s4Tmg#TZ8*H;@%HGePfS{{}oHBcx
z=l61m=eCuXR9(t4`h2QNKtinQoO!e(|BmfaW#AS1HF|F4x~jFWyLaWejXAYBz~-Fs
zBjyn{<i2ow5>(>Dy6R5#t<5o^V1>+bx#X<Nam8kM3wQfa)V~uH%rhRleDW~#e}90`
z5=^6_mnY0s5t*<=VqdH0%V(?PrwU~nFs;g3(b%d;c}K&S=>{FQ{G*P1!N|P;k7ov-
zB2?hWij?Sj$R1tj?IF&d%#QAwCePO2|4@Ch`f3tg61^9*kkb>AW->YWYI?$xoW~0q
zjGr`@6t#CABYGF-z6!v&=2LTIYnX4okj@p8#;JRBix@oIrTLNE$K3L&pRF#7?IiEq
zcU=umO%1CKYW7})v&Ffu129v%q`2`Q&BackO3;ArRL%$G8f*ULj+F&SM?K1HMtD2a
zs(Uvk{C4K_ldnW1Og{ZrUj!=}D&L;fuZh^|XNT@{Ty!sQ=t3q6e|SftMaTSWtO&q~
zUL3?4EJM~-sX<t;F75gVf@2Y7>D(Pb&y;AfU!yXj@WmH>OTNn+`!ly;Ss`U5xZVnC
zH0gUW)u?3cn<QiMkaHkk?Bsp!!sNVU7<_l1Rj+yZ$AM#n`Nyl;<i?2rj|#6{6Kd>Z
zYeBsS;4<2YxIwmK;1UHHuvRy;kiGNZ2(j<9+S(!E#i1)fd+sSMwo5XCi<gJe7E7H9
za@j5`0b6_c(?YTvb6nZ&7t7X>sVd8kw9%oKil~C<!W^sm{E9m}6NpOh-N*M=ZD$xR
zpAl=$UtXE%89u#dNyEK~OCOACr9abAzJxNV|Lm52{VM+4KlJe5K4+gctzw_&Y-ZMP
QUVp+{W9=~ImOj`11%BasmH+?%

literal 0
HcmV?d00001