example_data/README.Rmd at master · messDiv/example_data · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
output:
    pdf_document:
        keep_tex: true
---

```{r setup, include=FALSE}
library(knitr)
opts_chunk$set(echo = FALSE)
```

## Example data formats for the sDIV project *sEcoEvo: Biodiversity Dynamics – The Nexus Between Space & Time*

Here we provide example data for all the different data types we are interested in gathering for communities.

* Time calibrated trees (with branch lengths even if unresolved)

    > An example tree in Newick format is in the file **example.newick**. Please note that trees can include both sampled and unsampled lineages (the example tree includes 7 lineages that were not sampled in the local community). The structure of the tree looks like this:


```{r}
include_graphics('https://github.com/continuousity/example_data/raw/master/example_tree.jpeg',
                        auto_pdf = FALSE)
```

* Abundances

    > We provide an example site-by-species file including 3 sites and the 5 sampled species (B, D, E, F, & K) in the file **example_abundances.csv**. This is a simple CSV file with rows corresponding to sites, and columns corresponding to species, cells contain the abundance of each species at each site.

```{r}
x <- read.csv('https://raw.githubusercontent.com/continuousity/example_data/master/example_abundances.csv', row.names = 1)
kable(x, caption = '')
```

* Per location per taxon sequence data (including identical sequences).

    > Example sequence data is shown in the **fastqs/** directory, with one fastq file per site, including all sequences for all individuals sequenced at that site. We will use these sequences to calculate nucleotide diversity per species per site.

* Sample/species/site mapping file

    > Sequence data need to be matched to species and sites. We include an example file called **example_pops.csv** which does this. This is CSV where each row contains 3 items, the sample id (which should match the sample name in the fastq files), the species ID (which should match the ID in the newick tree), and the sample site (which should match the site name in the abundances file).

```{r}
seqMap <- read.csv('https://raw.githubusercontent.com/continuousity/example_data/master/example_pops.csv')
kable(head(seqMap), caption = '')
```

* Site metadata file

    > We need to know a few things about sites, most importantly their goegraphic locations, and hopefully also something about sampling methodology such as area, effort, sampling technique.  The included example file **site_metadata.csv** refers to arthropod pitfall trapping as an example.

```{r}
siteMap <- read.csv('https://raw.githubusercontent.com/continuousity/example_data/master/site_metadata.csv')
kable(siteMap, caption = '')
```