Software Resources

MacQIIME

Check out the MacQIIME page for more information on an easy way to install and use QIIME on your Mac OS X computer. QIIME is a python-based software pipeline for analyzing high-throughput sequencing data (It goes all the way from raw SFF files to final figures!). It can be difficult to install, unless you are very familiar with Unix and compiling code. To make things simpler for myself and collaborators, I put together this pre-compiled package.

Some great open software that students should try out

Students - Please let me know if you're interested in experience analyzing complex bioinformatics data. The only prerequisites are an interest in biological questions and an enthusiasm for learning a little computer programming!

For researchers to make sense of high-throughput molecular data, and to know what to look for, they need computational techniques and bioinformatics. Students working with me have an opportunity to learn a number of bioinformatics approaches and pipelines, and to apply them to interesting questions about microbial communities. This page is a list of some free software that I recommend students try out.

Processing High-Throughput Sequencing Data

Quantifying and comparing the structures of microbial communities among many samples is a research approach that has expanded rapidly in the last five years. Lots of great software pipelines have been developed.

QIIME (Quantitative Insights Into Microbial Ecology) is a powerful pipeline developed by the Knight Lab in the Chemistry department at UC-Boulder for high-throughput analysis of 16S rRNA gene amplicons. With QIIME, you have all the steps you need in one pipeline, and one consistent set of data formats. It can be difficult to install, and only works natively in Unix-like environments; However, I have had students use the Virtual Box version in previous courses, and that has worked quite easily and well on their personal Windows machines. SUNY-Cortland students doing research in my lab have access to a high-performance compute cluster running QIIME, as well as many other great bioinformatics tools. I maintain this server along with several collaborators at Cornell University.

MacQIIME: I developed a custom compilation of QIIME and all its dependencies that is easy to install in Mac OS X (10.5 and 10.6). It is hosted at our shared computational resource at Cornell University. The MacQIIME package will install on any Mac in about 2 minutes, with very little work (compared to several days of trouble-shooting if you install QIIME from scratch). If you are a Mac person, you should use MacQIIME instead of the virtual box.

Recommended Ways to Learn About Data Mining

When you have thousands of variables, and hundreds of samples, it can be daunting to try and find interesting or predictive relationships in the data. Here are some really fun visual tools for data mining that include example data, and are easy to get started with:

Orange is a collection of some really nice data mining tools, including machine learning, principal component analysis, and pretty visualizations. It's all graphical, with logical workflows that you build by dragging around icons.

Ggobi has my favorite data-interaction innovation ever: "brushing." You can have dozens of graphs open at once (if you have a big enough screen), and when you highlight a group of samples in one graph they light up in all the other graphs!