Final Exam Sample Questions

The final for 8751 will be comprehensive and out of 300 points. The format of the final will be 6 or 7 pages similar to the midterms, where the first page will have 5 definitions and the remaining pages will each have to questions.

There will be between 170 and 220 points of questions covering material from the first two midterms (sample questions can be found at these links):

Additional sample questions for the remaining material covered:

Sample questions from class lecture:

1. Briefly define the following terms:

   Market Basket

   Itemset

   The Apriori Properties

2. How does the Apriori algorithm learn an association rule (give the
    algorithm)?  Give two examples of ways to speedup this algorithm.
    Show an example of how the algorithm works.


Questions from student presentations (the questions regarding students
presentations will be limited to this set):

3. What is a loss function?  Give examples of three loss functions to use
   in hierarchical classification.  What are the strengths and weaknesses
   of these functions?

4. How does cancer prognosis and prediction learning differ from cancer
   diagnosis and detection learning?  What aspects of the former problem(s)
   make these tasks harder than the latter tasks?

5. The NEAT system uses genetic algorithms to evolve a network, explain how
   this works (especially how mutation might work).  How does the Whiteson
   and Stone method alter the original NEAT system?

6. One approach to filtering spam involves compression models.  Explain
   how this method works.  How does the resulting system determine if a new
   message is spam or not?  How does this method adapt over time?

7. Tao et al. (2007) propose a method for generating new GO terms for
   annotating genes based on a KNN approach.  How does their method work
   (especially, how is similarity calculated in their method)?

8. What is a module network?  Explain how a module network can be used to
   organize a network of variables?  Give an example of how a module network
   might be used to model gene expression variables.

9. What is the schema matching approach?  How does it apply to the general
   problem of question answering?  Outline the Doan et al. approach to solving
   this problem.

10. How do Fumera et al. propose to recognize spam attached as images to email?
    What are the difficulties with this approach?

11. Lane and Brodley propose a method for detecting anomalous user behavior
    on a computer.  What types of anomalies were they looking for?  What
    behavior were they examining?  What type of learning method was used in
    the detection process?

12. Explain how a ratio template works.  How might such a template be used
    to recognize objects (like pedestrians)?  How could such a template be
    used to capture motion information?  How could that information be
    used to recognize objects?

13. How does the term version space apply in active learning as presented
    by Tong and Koller?  How do they propose to choose queries?  What are
    some of the difficulties of their approach?

14. What is the difference between primitive and non-primitive skills in
    ICARUS?  Give examples of each.  Explain what teleoreactive means and
    how it relates to the programs in ICARUS.

15. What is the difference between web content mining and web usage mining?
    Which of these notions is more closely related to machine learning (and
    why?)