D. Alignments and Trees

...continued from Taxonomy and the OTU Table

Goals
  • Align sequences to the Greengenes core reference alignment
  • Filter and Lane mask your alignment
  • Build an approximately-maximum-likelihood phylogenetic tree

Make a Multiple Sequence Alignment

To test the evolutionary distance between your OTUs, taxonomy assignment is not the best way to go... it depends too much on whether the taxonomy database had representatives of everything in your sample, and it depends on whether or not the taxonomic hierarchy used accurately reflects evolution. Instead, it's better to build a real phylogenetic tree, and to do that we'll need to first align our sequences. We'll be using a reference alignment to align our sequences.  In MacQIIME, this reference alignment is located at /macqiime/greengenes/core_set_aligned.fasta.imputed and QIIME already knows where it is, so the command line you need is actually quite simple (Also true in the VB, but it's located in /data/greengenes_core_sets/). 

The script we're using is called align_seqs.py and we use it as follows:

align_seqs.py -i rep_set.fna -o alignment/

Some of the options:
    -i [file]  The fasta file of queries, usually your representative seqs
    -o [name]  The name of the new directory that should be created
    -t [file]  The template file to use (used default GG core in our command above; didn't have to specify)

You can see the alignment results in that alignment/ directory, file name rep_set_aligned.fasta.  This alignment contains lots of gaps, and it includes hypervariable regions that make it difficult to build an accurate tree. So, we'll filter it.  Filtering an alignment of 16S rRNA gene sequences can involve a Lane mask. In MacQIIME, this Lane mask for the GG core is located at /macqiime/greengenes/lanemask_in_1s_and_0s and MacQIIME already knows where it is. (Also true in the VB, but it's located in /data/greengenes_core_sets/). The script you use to filter an alignment is filter_alignment.py as follows:

filter_alignment.py -i alignment/rep_set_aligned.fasta -o alignment/

Some of the options:
    -i [file]  The fasta file of queries, usually your representative seqs
    -o [name]  The name of the new directory that should be created
    -m [file]  The Lane mask file to use (used default GG Lane mask in our command above; didn't have to specify)

This created a new file in the alignment/ directory called rep_set_aligned_pfiltered.fasta -- this is the file we can use to build a phylogenetic tree!  If you want to visually check the alignment, I suggest using a free program called SeaView to open the rep_set_aligned_pfiltered.fasta file.

Build a phylogenetic tree

Okay, let's make a tree out of that alignment!  This is actually quite easy in QIIME, using the make_phylogeny.py script (which uses the FastTree approximately maximum likelihood program, a good model of evolution for 16S rRNA gene sequences). The input for this script is our filtered alignment.

make_phylogeny.py -i alignment/rep_set_aligned_pfiltered.fasta -o rep_set_tree.tre


How do you look at this tree?  You could try something like FigTree, or TreeViewX, but actually Topiary Explorer may be a better option - it is meant to be able to import your QIIME OTU table and mapping file to display data as well as the tree - try it out!


Next steps: Analyzing Diversity

Comments