Some sample exam 2 questions:

1. Briefly define the following terms:

   Unsupervised learning

   Clustering Algorithm

   Dendrogram

   Control Learning

   Delayed Reward

   Discounted Future Reward

   Markov Decision Process

   Analytical Learning

   Chunking

   Impasse
   
   Domain Theory

   Knowledge Based Artificial Neural Network

   Explanation Based Neural Network

   Bayes Theorem

   Maximum a posteriori hypothesis

   Maximum likelihood hypothesis

   Bayes optimal classifier

   Gibbs classifier

   Bayes network

   Bagging

   Boosting

   Stacking

   Linear programming

   Slack variable

   Margin (of a SVM decision surface)

   Support vector

   Association rule

   Market Basket

   Itemset

   The Apriori Properties

2. What are the two main approaches for generating clusters?  Explain in
    general terms how these approaches work.

3. List two methods that could be used to estimate the validity of the
   results of a clustering algorithm.  Explain how these methods work.

4. Explain how the following clustering methods work:

   Agglomerative Single Link

   Agglomerative Complete Link

   K-Means

5. A distance measure is important both in memory-based reasoning methods
   such as the k-nearest neighbor method and in clustering.  Why is it so
   critical in these methods.  In which is it possible to "learn" to do
   a better job of measuring the distance between points?  Why?

6. Give pseudo-code for the learning cycle of a Q learner.  What is the
   update rule for a deterministic world?  How about a non-deterministic
   world?

7. How are the V(s) and Q(s,a) functions related in Q learning?  What are
   the advantages of using the Q function over the V function?

8. How does Temporal Difference Learning relate to Q learning?

9. What is the difference between analytic or speedup learning and inductive
   learning?  Give an example of each type of learning.

10. Give an example of an explanation of a concept that might be used by
    an explanation-based learner.  What could be learned from this explanation?
    How does the concept of operationalization relate to what is learned?

11. For the following EBL domain theory:

    A(?x,?y), B(?y,?x,?z) -> C(?x,?y,?z)
    D(?x,?y), E(?x,?y) -> B(?x,1,?y)
    F(?x,?x), F(?y,?y) -> D(?x,?y)

    Assume the following facts are asserted:

    A(1,2)  A(3,1)  E(2,3)  F(3,2)  F(2,2)
    A(2,2)  A(3,2)  E(1,2)  F(3,3)  F(2,3)

    Explain with a proof tree that C(1,2,3) is true.

    Assuming the predicates A, E, and F are operational, what rule would
    EGGS learn?
   
   Assuming that predicate B is also operational, what rule would be learned?

11. What are some of the problems that can result from using a domain theory
    in an explanation-based learner?  How can we address these problems?

12. What is the utility problem in analytical learning?  How can we define
    utility?

13. What does it mean when we say that PRODIGY learns control knowledge?
    What is the advantage of learning control knowledge over adding new rules
    to a domain theory?

14. How does chunking work in SOAR?  When does a SOAR system try to create
    a chunk?

15. Give an example of a domain theory expressed as predicates.  How would
    that domain theory be converted into a corresponding neural network
    by KBANN (show the structure and weights of the resulting network).
    What is the main advantage of the KBANN approach?

16. What is a hybrid learning algorithm?  Give three examples of hybrid learning
    methods, explain how each works, and discuss the advantages of each.

17. What is Bayes theorem?  Discuss two examples showing how Bayes theorem
    can be used to justify approaches to learning.  Also, discuss an example
    of a learning method based on Bayes theorem.

18. What is a Bayesian Belief network?  Give an example of such a network.
    What are the advantages of a Bayesian network over a naive Bayes learner?

19. Explain the fundamental difference between the Bagging and Ada-Boosting
    ensemble learning methods?  How do these notions relate to the concept of
    generating a good ensemble?  What are the advantages and disadvantages of
    each method?

20. How is a problem phrased as a linear program in a support vector machine?
    Give an example.  What are slack variables used for and how are they
    represented in the linear program?

21. Explain the concept of a Kernel function in Support Vector Machines.
    Why are kernels so useful?  What properties should a kernel have to be
    used in an SVM?

22. How does the Apriori algorithm learn an association rule (give the
    algorithm)?  Give two examples of ways to speedup this algorithm.
    Show an example of how the algorithm works.