Final Exam Sample Questions

The final for 8751 will be comprehensive and out of 300 points. The format of the final is 5 definitions, each worth 12 points on the first two pages, followed by 8 pages, each with one question to give you plenty of room to write. Exam questions will be drawn from material related to your presentations, material presented after midterm and some questions each from the material covered in midterms 1 and 2. I will give one question below for each of the nine presentations made in class. Three of these questions will be repeated exactly on the final.

There will be questions covering material from the first two midterms (sample questions can be found at these links):

Additional sample questions for the remaining material covere:


Sample questions from class lecture:


1. Briefly define the following terms:

   Linear programming  
     
   Slack variable       

   Margin (of a SVM decision surface) 
      
   Support vector 
      
   Domain theory  
     
   Bagging     
  
   Boosting     
  
   Stacking  

   Market Basket

   Itemset

   The Apriori Properties

2. Explain the fundamental difference between the Bagging and Ada-Boosting
    ensemble learning methods?  How do these notions relate to the concept of
    generating a good ensemble?  What are the advantages and disadvantages of
    each method?
    
3. How is a problem phrased as a linear program in a support vector machine?
    Give an example.  What are slack variables used for and how are they
    represented in the linear program?

4. Explain the concept of a Kernel function in Support Vector Machines.
    Why are kernels so useful?  What properties should a kernel have to be
    used in an SVM?

5. How does the Apriori algorithm learn an association rule (give the
    algorithm)?  Give two examples of ways to speedup this algorithm.
    Show an example of how the algorithm works.


Questions from student presentations (the questions regarding students
presentations will be limited to this set):

1. In the paper on Netflix Prize prediction, Singular Value Decomposition
   was used to perform a feature transformation.  What does SVD do and why
   is it so useful in cases such as these?
   
2. Give the DIET algorithm for performing feature weighting in nearest-
   neighbor algorithms.  What are the advantages and disadvantages of this
   algorithm.
   
3. The Osmot system is a search engine that allows researchers to gather
   data about users' online behavior.  In the paper presented in class,
   these researchers attemped to infer behavior and feedback from the
   searching done.  Explain how they proposed to do this and indicate any
   potential problems you see with this approach.
   
4. SVMTool proposes to learn a Part-Of-Speech tagger from a dictionary
   and samples of statements in that language.  Give five examples of the
   types of features SVMTool considers and samples of such features.

5. Explain how wrapper and filter algorithms work for variable elimination.
   How do Stracuzzi and Utgoff propose to use random samples of sets of
   variables to set key parameters for selecting a good sample of variables.
   
6. How does Semi-Supervised Support Vector Machines propose to make use of
   unlabeled data.  How might this lead to better generalization?
   
7. What is a "hard" learning problem for skewing?  How does skewing make it
   possible to effectively learn a decision tree for such a problem?
   
8. Caruana uses multiple metrics in evaluating supervised learning algorithms,
   why?  Define four of the metrics Caruana used in his work and why these
   metrics might be of interest.
   
9. What is meant by saliency in Optimal Brain Damage?  How is it defined?
   What are the potential problems with this approach?