The questions on the final will be drawn from the entire semester. For examples of questions covering the material presented before midterms 1 and 2 check out these links:

There will also be questions related to the papers presented by the students in class. At most 80 points of the final will relate to these papers. Below are some potential questions based on the papers presented by students. The questions regarding those papers will be drawn from THIS list:

1. How does the Sequential Minimal Optimization (SMO) method for training a Support Vector Machine work? Make sure to discuss how the notion of the support vectors figure into this, what is meant by the term "smallest possible optimization problems" (and what such a problem represents), how the method chooses from amongst these problems (and how that choice affects the solution method). You should also compare the SMO method to the Chunking method for training SVMs. 2. How does the method for Support Vector Clustering introduced by Ben-Hur et al. work? What does the support vector machine attempt to do in this method? Also, how is the graph produced from the resulting machine and what part does the connected components algorithm play in producing the clusters? 3. Kondor and Jebara introduced a kernel function that would be useful in what types of situations? Give an example of such a situation. What are the advantages of this kernel over other kernels for comparing data? 4. Explain the basic idea behind Principal Component Analysis (PCA)? In what types of domains would be PCA be very effective? What type of problem did Suykens et al. propose to solve to perform PCA and what mechanism do they use in solving this problem? 5. Explain how Valentini and Dietterich proposed to control the Bias-Variance tradeoff in SVMs and how they proposed to use the resulting SVMs in an ensemble learning method. 6. In the Jaakola et al. paper "A discriminative framework for detecting remote protein homologies" explain what is meant by the term remote homology? In this paper Jaakola et al. proposed to use compare generative models - what type of model and how would this models be compared (what mechanism would be used in the comparison and what would be compared)? 7. Vinokourov et al. proposed a method that would allow (amongst other things) for cross language queries (a query in one language that could find a relevant document in another language). Explain the basics of their method and how such a query would be possible. 8. In the Bayes meets Bellman paper, Engel et al. proposed to learn a policy function as a Gaussian process. What functions did they propose to learn to produce a policy (and what assumptions are built into this approach)? What are the advantages of this approach? What do they mean by the term on-line sparsification and how is it important to their approach? 9. What is meant by the term "model" in Wang and Dietterich's "Model-based policy gradient reinforcement learning"? How is this model learned and how is it used in their approach? What are the advantages and disadvantages of this approach with respect to other reinforcement learning methods? 10. What is the difference between an Markov Decision Process (MDP) and a Semi Markov Decision Process (SMDP)? Explain how this relates to the use of temporally extended actions such as those introduced by Options (Sutton, Precup and Singh)? 11. What is an Option as introduced by Sutton, Precup and Singh? How are options used in a reinforcement learner? How can an option help speedup learning? 12. What is a Hierarchical Abstract Machine (HAM) as introduced by Parr and Russell? Who creates HAMs? How does a reinforcement learner work in a system involving HAMs? What is a choice point?