From 15f3fd6639be2184ca827f7fa1e38bd22a3f7d9e Mon Sep 17 00:00:00 2001 From: piotrblaut <59977962+piotrblaut@users.noreply.github.com> Date: Wed, 3 Mar 2021 20:32:31 +0100 Subject: [PATCH] Add files via upload --- Atacama_tutorial.ipynb | 769 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 769 insertions(+) create mode 100644 Atacama_tutorial.ipynb diff --git a/Atacama_tutorial.ipynb b/Atacama_tutorial.ipynb new file mode 100644 index 0000000..ed24a43 --- /dev/null +++ b/Atacama_tutorial.ipynb @@ -0,0 +1,769 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Atacama soil microbiome” tutorial" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "note:_ This guide assumes you have QIIME 2 installed (e.g. using this [procedure](https://docs.qiime2.org/2019.10/install/native/)). To execute the script properly, open this notebook in a Jupyter Notebook from within a conda QIIME 2 environment." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "note:_ This tutorial is an adaptation of the same tutorial that may be found on the [official QIIME 2 docs website](ing-pictures/). The original tutorial uses the QIIME 2 CLI interface." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Instead of CLI interface, this tutorial uses [Artifact API](https://docs.qiime2.org/2019.10/interfaces/artifact-api/) - a Python 3 application programmhttps://docs.qiime2.org/2019.10/tutorials/mover interface (API) for QIIME 2. The Artifact API supports interactive computing with QIIME 2 using the Python 3 programming language. The API is automatically generated, and its availability depends on which QIIME 2 plugins are currently installed. It has been optimized for use in the Jupyter Notebook. The Artifact API is a part of the QIIME 2 framework; no additional software needs to be installed to use it." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The notebook was tested using the ` 2020.2 ` version of QIIME 2." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Before you start: close this notebook and Jupyter session, and run `jupyter serverextension enable --py qiime2 --sys-prefix`. Then, restart this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This tutorial is designed to serve two purposes. First, it illustrates the initial processing steps of paired-end read analysis, up to the point where the analysis steps are identical to single-end read analysis. This includes the importing, demultiplexing, and denoising steps, and results in a feature table and the associated feature sequences. Second, this is intended to be a self-guided exercise that could be run after the [moving pictures tutorial](https://docs.qiime2.org/2020.2/tutorials/moving-pictures/) to gain more experience with QIIME 2. For this exercise, we provide some questions that can be used to guide your analysis, but do not provide commands that will allow you to address each. Instead, you should apply the commands that you learned in the [moving pictures tutorial](https://docs.qiime2.org/2020.2/tutorials/moving-pictures/)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this tutorial you’ll use QIIME 2 to perform an analysis of soil samples from the Atacama Desert in northern Chile. The Atacama Desert is one of the most arid locations on Earth, with some areas receiving less than a millimeter of rain per decade. Despite this extreme aridity, there are microbes living in the soil. The soil microbiomes profiled in this study follow two east-west transects, Baquedano and Yungay, across which average soil relative humidity is positively correlated with elevation (higher elevations are less arid and thus have higher average soil relative humidity). Along these transects, pits were dug at each site and soil samples were collected from three depths in each pit." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Importing necessary modules" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "import qiime2" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "from qiime2.plugins import demux, dada2, metadata, feature_table" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "from qiime2.plugins.demux.methods import emp_paired" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Creating a new directory" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Create a directory to work in called `qiime2-atacama-tutorial` and change to that directory:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "workdir='/home/user/Documents/qiime2-ATACAMA-tutorial/'" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "!mkdir -p $workdir\n", + "!cd $workdir" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Obtaining and importing data files" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Before starting the analysis, explore the [sample metadata](https://docs.google.com/spreadsheets/d/1AFtHGlLIHy4-hwAyAL0EQUMLvZtONK5bgZ0JSInSRYc/edit#gid=0) to familiarize yourself with the samples used in this study. The sample metadata is available as a Google Sheet. This `sample-metadata.tsv` file is used throughout the rest of the tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2021-03-03 18:49:06-- https://data.qiime2.org/2020.2/tutorials/atacama-soils/sample_metadata.tsv\n", + "Resolving data.qiime2.org... 52.35.38.247\n", + "Connecting to data.qiime2.org|52.35.38.247|:443... connected.\n", + "HTTP request sent, awaiting response... 302 FOUND\n", + "Location: https://docs.google.com/spreadsheets/d/1AFtHGlLIHy4-hwAyAL0EQUMLvZtONK5bgZ0JSInSRYc/export?gid=0&format=tsv [following]\n", + "--2021-03-03 18:49:06-- https://docs.google.com/spreadsheets/d/1AFtHGlLIHy4-hwAyAL0EQUMLvZtONK5bgZ0JSInSRYc/export?gid=0&format=tsv\n", + "Resolving docs.google.com... 216.58.215.78, 2a00:1450:401b:802::200e\n", + "Connecting to docs.google.com|216.58.215.78|:443... connected.\n", + "HTTP request sent, awaiting response... 307 Temporary Redirect\n", + "Location: https://doc-04-6o-sheets.googleusercontent.com/export/l5l039s6ni5uumqbsj9o11lmdc/bijtejtl2efmfir522bk555m2k/1614793745000/103995680502445084602/*/1AFtHGlLIHy4-hwAyAL0EQUMLvZtONK5bgZ0JSInSRYc?gid=0&format=tsv [following]\n", + "Warning: wildcards not supported in HTTP.\n", + "--2021-03-03 18:49:07-- https://doc-04-6o-sheets.googleusercontent.com/export/l5l039s6ni5uumqbsj9o11lmdc/bijtejtl2efmfir522bk555m2k/1614793745000/103995680502445084602/*/1AFtHGlLIHy4-hwAyAL0EQUMLvZtONK5bgZ0JSInSRYc?gid=0&format=tsv\n", + "Resolving doc-04-6o-sheets.googleusercontent.com... 172.217.20.193, 2a00:1450:401b:807::2001\n", + "Connecting to doc-04-6o-sheets.googleusercontent.com|172.217.20.193|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: unspecified [text/tab-separated-values]\n", + "Saving to: ‘/home/user/Documents/qiime2-ATACAMA-tutorial//sample-metadata.tsv’\n", + "\n", + "/home/user/Document [ <=> ] 9.21K --.-KB/s in 0s \n", + "\n", + "2021-03-03 18:49:08 (24.3 MB/s) - ‘/home/user/Documents/qiime2-ATACAMA-tutorial//sample-metadata.tsv’ saved [9433]\n", + "\n" + ] + } + ], + "source": [ + "!wget -O $workdir/\"sample-metadata.tsv\" \\\n", + " \"https://data.qiime2.org/2020.2/tutorials/atacama-soils/sample_metadata.tsv\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, you’ll download the multiplexed reads. You will download three `fastq.gz` files, corresponding to the forward, reverse, and barcode (i.e., index) reads. These files contain a subset of the reads in the full data set generated for this study, which allows for the following commands to be run relatively quickly. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Create a directory to work in called `emp-paired-end-sequences` and change to that directory:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "mkdir: cannot create directory ‘/home/user/Documents/qiime2-ATACAMA-tutorial//emp-paired-end-sequences’: File exists\r\n" + ] + } + ], + "source": [ + "!mkdir $workdir/emp-paired-end-sequences\n", + "!cd $workdir" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2021-03-03 18:58:06-- https://data.qiime2.org/2020.11/tutorials/atacama-soils/10p/forward.fastq.gz\n", + "Resolving data.qiime2.org... 52.35.38.247\n", + "Connecting to data.qiime2.org|52.35.38.247|:443... connected.\n", + "HTTP request sent, awaiting response... 302 FOUND\n", + "Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2020.11/tutorials/atacama-soils/10p/forward.fastq.gz [following]\n", + "--2021-03-03 18:58:07-- https://s3-us-west-2.amazonaws.com/qiime2-data/2020.11/tutorials/atacama-soils/10p/forward.fastq.gz\n", + "Resolving s3-us-west-2.amazonaws.com... 52.218.144.68\n", + "Connecting to s3-us-west-2.amazonaws.com|52.218.144.68|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 143193967 (137M) [binary/octet-stream]\n", + "Saving to: ‘/home/user/Documents/qiime2-ATACAMA-tutorial//emp-paired-end-sequences/forward.fastq.gz’\n", + "\n", + "/home/user/Document 100%[===================>] 136.56M 237KB/s in 9m 34s \n", + "\n", + "2021-03-03 19:07:43 (243 KB/s) - ‘/home/user/Documents/qiime2-ATACAMA-tutorial//emp-paired-end-sequences/forward.fastq.gz’ saved [143193967/143193967]\n", + "\n" + ] + } + ], + "source": [ + "!wget -O $workdir/\"emp-paired-end-sequences/forward.fastq.gz\" \\\n", + " \"https://data.qiime2.org/2020.11/tutorials/atacama-soils/10p/forward.fastq.gz\"" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2021-03-03 19:07:53-- https://data.qiime2.org/2020.11/tutorials/atacama-soils/10p/reverse.fastq.gz\n", + "Resolving data.qiime2.org... 52.35.38.247\n", + "Connecting to data.qiime2.org|52.35.38.247|:443... connected.\n", + "HTTP request sent, awaiting response... 302 FOUND\n", + "Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2020.11/tutorials/atacama-soils/10p/reverse.fastq.gz [following]\n", + "--2021-03-03 19:07:54-- https://s3-us-west-2.amazonaws.com/qiime2-data/2020.11/tutorials/atacama-soils/10p/reverse.fastq.gz\n", + "Resolving s3-us-west-2.amazonaws.com... 52.218.136.232\n", + "Connecting to s3-us-west-2.amazonaws.com|52.218.136.232|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 161032441 (154M) [binary/octet-stream]\n", + "Saving to: ‘/home/user/Documents/qiime2-ATACAMA-tutorial//emp-paired-end-sequences/reverse.fastq.gz’\n", + "\n", + "/home/user/Document 100%[===================>] 153.57M 829KB/s in 5m 20s \n", + "\n", + "2021-03-03 19:13:15 (492 KB/s) - ‘/home/user/Documents/qiime2-ATACAMA-tutorial//emp-paired-end-sequences/reverse.fastq.gz’ saved [161032441/161032441]\n", + "\n" + ] + } + ], + "source": [ + "!wget -O $workdir/\"emp-paired-end-sequences/reverse.fastq.gz\" \\\n", + " \"https://data.qiime2.org/2020.11/tutorials/atacama-soils/10p/reverse.fastq.gz\"" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2021-03-03 19:13:15-- https://data.qiime2.org/2020.11/tutorials/atacama-soils/10p/barcodes.fastq.gz\n", + "Resolving data.qiime2.org... 52.35.38.247\n", + "Connecting to data.qiime2.org|52.35.38.247|:443... connected.\n", + "HTTP request sent, awaiting response... 302 FOUND\n", + "Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2020.11/tutorials/atacama-soils/10p/barcodes.fastq.gz [following]\n", + "--2021-03-03 19:13:16-- https://s3-us-west-2.amazonaws.com/qiime2-data/2020.11/tutorials/atacama-soils/10p/barcodes.fastq.gz\n", + "Resolving s3-us-west-2.amazonaws.com... 52.218.185.120\n", + "Connecting to s3-us-west-2.amazonaws.com|52.218.185.120|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 19976093 (19M) [binary/octet-stream]\n", + "Saving to: ‘/home/user/Documents/qiime2-ATACAMA-tutorial//emp-paired-end-sequences/barcodes.fastq.gz’\n", + "\n", + "/home/user/Document 100%[===================>] 19.05M 850KB/s in 24s \n", + "\n", + "2021-03-03 19:13:41 (827 KB/s) - ‘/home/user/Documents/qiime2-ATACAMA-tutorial//emp-paired-end-sequences/barcodes.fastq.gz’ saved [19976093/19976093]\n", + "\n" + ] + } + ], + "source": [ + "!wget -O $workdir/\"emp-paired-end-sequences/barcodes.fastq.gz\" \\\n", + " \"https://data.qiime2.org/2020.11/tutorials/atacama-soils/10p/barcodes.fastq.gz\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Importing data as a qiime2 artifact" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "All data that is used as input to QIIME 2 is in form of QIIME 2 artifacts, which contain information about the type of data and the source of the data. So, the first thing we need to do is import these sequence data files into a QIIME 2 artifact." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "sample_metadata = qiime2.Metadata.load(workdir+'/sample-metadata.tsv')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Paired-end read analysis commands" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To analyze these data, the sequences that you just downloaded must first be imported into an artifact of type `EMPPairedEndSequences`" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "paired_end_sequences = qiime2.Artifact.import_data('EMPPairedEndSequences', workdir+'/emp-paired-end-sequences/')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You next can demultiplex the sequence reads. This requires the sample metadata file, and you must indicate which column in that file contains the per-sample barcodes. In this case, that column name is `barcode-sequence`. In this data set, the barcode reads are the reverse complement of those included in the sample metadata file, so we additionally include the `rev_comp_mapping_barcodes` parameter." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "After demultiplexing, we can generate and view a summary of how many sequences were obtained per sample." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "demux_sequences = demux.methods.emp_paired(paired_end_sequences,\n", + " sample_metadata.get_column('barcode-sequence'),\n", + " rev_comp_mapping_barcodes = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let’s subsample the data. We will perform this subsampling in this tutorial for two reasons - one, to speed up the tutorial run time, and two, to demonstrate the functionality." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "demux_subsample = demux.methods.subsample_paired(demux_sequences.per_sample_sequences,\n", + " fraction = 0.3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let’s take a look at the summary in `demux-subsample.qzv`. In the “Per-sample sequence counts” table on the “Overview” tab, there are 75 samples in the data. If we look at the last 20 or so rows in the table, though, we will observe that many samples have fewer than 100 reads in them - let’s filter those samples out of the data:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/user/miniconda2/envs/qiime2-2021.2/lib/python3.6/site-packages/seaborn/distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).\n", + " warnings.warn(msg, FutureWarning)\n", + "/home/user/miniconda2/envs/qiime2-2021.2/lib/python3.6/site-packages/seaborn/distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).\n", + " warnings.warn(msg, FutureWarning)\n" + ] + }, + { + "data": { + "text/html": [ + "