E. Alpha Diversity
...continued from Alignments and Trees
Goals
Perform multiple rarefactions at different depths
Analyze alpha diversity of individual samples
In this step we'll look into alpha diversity - a measure of species richness or diversity within an individual sample.
Perform Multiple Rarefactions
You may think of diversity, or species richness, as "the number of species" that are present in the system. That is the general definition of diversity. However, the deeper you sequence, the more species you will find. That is problematic, especially if you gathered 500 reads from one sample and only 100 reads from another sample. You would expect to find more species if you sequenced 5x as many reads! To account for this, we perform an in-silico (i.e., on your computer) experiment called rarefaction. A rarefaction is a random collection of sequences from a sample, with a specified depth (number of sequences). For example, a rarefaction with a depth of 75 reads per sample is a simulation of what your sequencing results would look like if you sequenced exactly 75 reads from each sample. To look at alpha diversity systematically, we can perform many rarefactions: at multiple depths and repeat many times at each depth. In QIIME, this task is performed on your OTU table. The QIIME script multiple_rarefactions.py takes your OTU table and makes a folder full of many OTU tables, all of which are repeats of rarefactions at specific depths. Let's use multiple_rarefactions.py as follows:
multiple_rarefactions.py -i otu_table.biom -m 20 -x 100 -s 20 -n 10 -o rare_20-100/
Some options for multiple_rarefactions.py:
-i [file] The input OTU table file
-m [number] The lowest rarefaction depth in the series of depths
-x [number] The highest rarefaction depth in the series of depths
-s [number] The step size to increment from the low to high depths
-n [number] The number of replicates to perform at each depth
-0 [name] The name of the new directory to be created
From that command above, you can see that we are performing rarefactions starting at 20 seqs/sample, and stepping up to 100 seqs/sample in increments of 20. In other words, we'll perform rarefactions at 20, 40, 60, 80, and 100 seqs/sample. The -n option specifies that we'll do 10 replicates at each of those depths. It is important to chose the rarefaction depths based on how many total sequences per sample you have. For example, it would not make sense to do rarefactions from 20 to 100 seqs/sample if I had a larger data set with an average of 5000 seqs/sample. If I did that, I would be throwing out a lot of my data and statistical power!
Run the command, and it will make that directory named rare_20-100/. If you look inside that new directory with ls, you will see that we've created 50 new files. Each of those files is a new OTU table with all the samples rarefied at the specified level! We're going to look at alpha diversity in those rarefied OTU tables, not in our original OTU table.
Calculate Alpha Diversity
There are many measures of alpha diversity. Depending on your ecological allegiances, you may have a preference for Chao1, Simpson's Diversity, Shannon Index, etc. These all measure different things, so it's important to think about what is most meaningful for your experiment, and your question. The QIIME script for calculating alpha diversity in samples is called alpha_diversity.py. There are many options for what metrics to use, and you can chose to run a bunch of metrics all at once if you like. All the possible alpha diversity metrics available in QIIME are listed here.
alpha_diversity.py -i rare_20-100/ -o alpha_rare/ -t rep_set_tree.tre -m observed_species,chao1,PD_whole_tree
Some options:
-i [directory] The name of the directory containing rarefied OTU tables
-o [name] The name of the directory to create for output
-t [file] The file for your phylogenetic tree
-m [list] The list of metrics, separated with commas and no spaces
If you run the above command, it will calculate alpha diversity metrics for all of your rarefied OTU tables and place the results in a new directory called alpha_rare. The metric PD_whole_tree is Faith's Phylogenetic Diversity, and it is based on the phylogenetic tree. Basically, it adds up all the branch lengths as a measure of diversity. So, if you find a new OTU and it's closely related to another OTU in the sample, it will be a small increase in diversity. However, if you find a new OTU and it comes from a totally different lineage than anything else in the sample, it will contribute a lot to increasing the diversity.
There are still a ton of separate files, and we need to "collate" them together into a nice, neat collection of results that are easy to graph.
Summarize the Alpha Diversity Data
The QIIME script collate_alpha.py takes the output directory from alpha_diversity.py as its input, and creates a new output directory containing files that are much easier to look at in a spreadsheet.
collate_alpha.py -i alpha_rare/ -o alpha_collated/
Take a look at the new files in the alpha_collated/ folder-- they are each organized quite nicely for spreadsheet analysis. The observed_species and chao1 metrics don't seem to show us much in this particular experiment, but check out the data in PD_whole_tree.txt (graphed below).
It looks like the fasted samples had a higher phylogenetic diversity than the control mice microbiomes. I made this graph in the OpenOffice spreadsheet program, by importing the alpha_collated/PD_whole_tree.txt file. The diversity, however, depends very strongly on sequencing depth. That reinforces the fact that you need to perform rarefactions if you're comparing diversity between samples.