Syllabus for Math 5233, Mathematical Foundations of Bioinformatics

This page will be updated throughout the semester. Comments on improvements or typos are appreciated.

Instructor: Marshall Hampton

Office: 172 SCC

Email: mhampton at d.umn.edu (preferred contact method)

Telephone: 726-6329

Office hours: M 4-5, Tu, W: 3-5, and by appointment.

Class homepage: http://www.d.umn.edu/~mhampton/m5233s8.html

Lecture Times: 2 - 2:50 pm, M, W, F. On Wednesdays we will be working in a computer lab, although some lectures will be given then as well.

Prerequisites: Any two of the following: Biol 5233, Math 3355, CS 1511, Stat 3611, or equivalents. Please come see me if you have any questions about the preparation required for this course. Since Biol 5233 has not been offered recently, other biology coursework in genetics and biochemistry is acceptable.

Textbook: Bioinformatics and Molecular Evolution, by Paul Higgs and Teresa Attwood. Blackwood Publishing, ISBN-13: 978-1405106832.

Topics: The official course description is "Mathematical, algorithmic, and computational foundations of common tools used in genomics and proteomics. Topics include: sequence alignment algorithms and implementations (Needleman-Wunsch, Smith-Waterman, BLAST, Clustal), scoring matrices (PAM, BLOSUM), statistics of DNA sequences (SNPs, CpG islands, isochores, satellites), and phylogenetic tree methods (UPGMA, parsimony, maximum likelihood). Other topics will be covered as time permits: RNA and protein structure prediction, microarray analysis, post-translational modification prediction, gene regulatory dynamics, and whole-genome sequencing techniques." One thing not mentioned there that I would like to at least briefly cover is hidden Markov models (HMMs). We will be using the programming language Python as our primary computational tool (see below), with the biopython module. All of the software we will be using is free.

A common theme for our projects and examples will be the evolution of the class Mammalia (mammals). Sub-themes may include hibernation, disease (malaria, AIDS), vision, and speech, although these depend on the time available and your interest. Last year the theme was malaria and the Plasmodium genus organisms that cause it (particularly P. falciparum), and we may also use some examples from that topic.

Optional complementary texts: While our text is the overall best book I could find for the course, some aspects of the subject are covered better or differently elsewhere. I will probably use these to supplement examples in lectures and for assignments:

Grading: You will have the opportunity to be evaluated in a variety of ways: homework, class participation and presentations, and exams. The homework and class presentations will be the primary factors for determining grades (roughly 70%, each exam about 15%).

Exams: There will be a takehome team midterm (tentative date: given March 10th, due March 12th), and an individual final exam. The final is 10 - 11:55 am, Wednesday, May 14th.

Projects: There will be several projects and presentations for small groups.

Note to site visitors: If you are not on the University of Minnesota network, some of the links below to articles may not work since they require a subscription. If you are U of Mn affliated, you should be able to access everything here with a VPN connection.

Homework and labs: Most of the assignments will be either readings or group lab assignments.

Web Resources:

Introductory biology material:

MIT OpenCourseWare Intro Biology Videos Very nice video lectures. Numbers 9-14 and 31-32 are probably most relevant to this course. 24-27 are good if you would like a better understanding of recombinant DNA technology.

A biology primer

Wikipedia entries for molecular biology and biology. There are a lot of links to other related topics on these pages. In general, I find wikipedia to be a pretty good place to get started or to look up specific terms.

Human Genome Project primer. This has some good basic background on DNA and the human genome project.

Intro to Bio a 16-slide pdf for the Stanford course listed below.

Molecular Genetics and Sequencing primer Nice overview with a focus on genome sequencing.

NCBI education site. This has many links to tutorials and primers on a variety of subjects.

Parables of the differences between geneticists and biochemists: geneticist version and biochemist version. Of course, they both need to learn more math.

General bioinformatics:

NCBI Entrez Often the right place to start

Ensembl Genome browsers and other tools

EMBL-EBI European Bioinformatics Institute main page with links to many tools.

AmiGo Gene Ontology search

Bioinformatics: Building Bridges a symposium at the twin cities campus April 12-13.

Nature

Science

Nature reviews: genetics

Nature reviews: molecular biology

KEGG Pathway database

UCSC Genome Browser

ExPASy

Sequence-Evolution-Function a free online bioinformatics text on NCBI. There are many other books there as well!

Jobs abound for the interdisciplinarians.

Phylogenetics

Phylogeny software . A huge list from J. Felsenstein, a leader in the field and author of Phylip and the text 'Inferring Phylogenies'.

Protein tools

ExPASy Uniprot

ExPASy Prosite

NCBI Structure Database

NCBI Protein Database

NCBI Homologene

EMBL-EBI Interpro

Sequence alignment:

Course notes from a sequence analysis course at McMaster University. A lot of useful information.

Sequence alignment applets. These do a nice job illustrating some of the basic alignment algorithms

BLAST statistics tutorial

NCBI BLAST page

Other somewhat similar course pages

MIT Computational Functional Genomics OpenCourseWare site

Stanford Representations and Algorithms for Computational Molecular Biology course page

University of Colorado Bioinformatics course taught in python.

Malaria and Plasmodium links

PlasmoDB a database specializing in the genus Plasmodium.

Plasmocyc A Stanford University database on malarial Plasmodium.

Gene expression database for Plasmodium

Plasmodium review This is a good one.

Plasmodium genome paper Genome release papers usually have a wealth of information.

Python and biopython installation:

Official Python Site You can download python from links here, as well as find good documentation and other resources. If possible, try to use version 2.5.

wxPython download page. The docs-demos download has a pre-built PyCrust shell; the main package also includes that but it may be less obvious. I recommend installing both (after python itself).

Numeric package This is needed for a very small number of biopython pieces. The biopython developers are working on updating their code so that it works with the more supported "numpy" project, which replaced Numeric. If you install Python 2.5 on a Windows computer, install numpy instead of Numeric since there is a nice installer for that.

Biopython download page. I am using version 1.43 or 1.44. The download page has good installation instructions.

mxTextTools Required for biopython installation.

Report Lab Optional for biopython installation. I never actually use this, but I always install it anyway. It supplies some PDF output routines.

SAGE This is a python-based open-source math project that now includes biopython as an optional package.

Python and biopython documentation and tutorials:

Python Tutorial A short tutorial by the creator of python, Guido van Rossum. In addition, the Python Library Reference has more complete documentation.

Python for beginners. A collection of tutorials for beginners.

Python video tutorials Quality varies but some of these are very helpful.

If you want a hard copy books, these three are among the best I know: Core Python Programming, Programming Python, and Learning Python. Personally I liked the Core Python Programming book the best, but the best for you will depend on your taste and background.

Informatics in Biology Pasteur Institute Course

Python for Bioinformatics at the Pasteur Institute

Biopython Documentation The "Tutorial and Cookbook" is extremely helpful, although sometimes the code examples get a little out of date as biopython evolves.

Dive Into Python Site The text is pretty advanced, but very interesting. If you feel you have reached an intermediate level of programming with Python, this could help tip you into the expert category.

Text Processing in Python Same comments as above: this is not for beginners, but there is a lot of great stuff that is relevant to bioinformatics applications.

Andrew Dalke Python and Bioinformatics Resources Lots of good stuff here

Student Conduct Code: see the full description at http://www.d.umn.edu/assl/conduct/code/.

Policy statement: The University of Minnesota is committed to the policy that all persons shall have equal access to its programs, facilities, and employment without regard to race, religion, color, sex, national origin, handicap, age, veteran status, or sexual orientation.

Disabilities: An individual who has a disability, either permanent or temporary, which might affect his/her ability to perform in this class should contact the instructor as soon as possible so that he can adapt methods, materials and/or tests as needed to provide for equitable participation.