Sample questions for midterm 2:
1. Briefly define the following terms:
Eager learning
Lazy Learning
Curse of dimensionality
kd Tree
Single Point Crossover
Two Point Crossover
Uniform Crossover
Point Mutation
Inverted Deduction
Unsupervised learning
Clustering Algorithm
Dendogram
Control Learning
Delayed Reward
Discounted Future Reward
Markov Decision Process
Bayes Theorem
Maximum a posteriori hypothesis
Maximum likelihood hypothesis
Bayes optimal classifier
Gibbs classifier
Bayes network
PAC learning
e-exhausting a Version Space
Shattering a Set of Instances
Vapnik-Chervonenkis Dimension
Linear programming
Slack variable
Margin (of a SVM decision surface)
Support vector
Domain theory
Bagging
Boosting
Stacking
2. How does a k-Nearest Neighbor learner make predictions about new data points?
How does a distance-weighted k-Nearest Neighbor learner differ from a
standard k-Nearest Neighbor learner? What is locally weighted regression?
3. How does a Radial Basis Function network work? How does a kernel function
work?
4. How are concepts represented in a genetic algorithm? Give an example of
of concept represented in a GA.
5. What operators are used in a genetic algorithm to produce new concepts?
Give an example of a mechanism that can be used to judge a GA concept.
6. Give pseudo-code for a general genetic algorithm. Make sure to outline
the way concepts are represented, the operators used to create new
concepts, how concepts are chosen to reproduce, and how concepts are
evaluated.
7. Give two different mechanisms for selecting which members of a GA
population should reproduce. What are the advantages and disadvantages of
your mechanisms?
8. How does genetic programming work? How is a genetic program defined?
What genetic operators can be applied to a genetic program?
9. How does the sequential covering algorithm work to generate a set of
rules for a concept?
10. How does FOIL work to generate first-order logic rules for a concept?
11. What does it mean to view induction as inverted deduction? Give a
deduction rule and explain how that rule can be inverted to induce new
rules.
12. What are the two main approaches for generating clusters? Explain in
general terms how these approaches work.
13. List two methods that could be used to estimate the validity of the
results of a clustering algorithm. Explain how these methods work.
14. Explain how the following clustering methods work:
Agglomerative Single Link
Agglomerative Complete Link
K-Means
15. A distance measure is important both in memory-based reasoning methods
such as the k-nearest neighbor method and in clustering. Why is it so
critical in these methods. In which is it possible to "learn" to do
a better job of measuring the distance between points? Why?
16. Give pseudo-code for the learning cycle of a Q learner. What is the
update rule for a deterministic world? How about a non-deterministic
world?
17. How are the V(s) and Q(s,a) functions related in Q learning? What are
the advantages of using the Q function over the V function?
18. What is Bayes theorem? Discuss two examples showing how Bayes theorem
can be used to justify approaches to learning. Also, discuss an example
of a learning method based on Bayes theorem.
19. What is a Bayesian Belief network? Give an example of such a network.
What are the advantages of a Bayesian network over a naive Bayes learner?
20. Explain the fundamental difference between the Bagging and Ada-Boosting
ensemble learning methods? How do these notions relate to the concept of
generating a good ensemble? What are the advantages and disadvantages of
each method?
21. Give an example of a domain theory expressed as predicates. How would
that domain theory be converted into a corresponding neural network
by KBANN (show the structure and weights of the resulting network).
What is the main advantage of the KBANN approach?
22. What is a hybrid learning algorithm? Give three examples of hybrid learning
methods, explain how each works, and discuss the advantages of each.
23. How is a problem phrased as a linear program in a support vector machine?
Give an example. What are slack variables used for and how are they
represented in the linear program?
24. Explain the concept of a Kernel function in Support Vector Machines.
Why are kernels so useful? What properties should a kernel have to be
used in an SVM?