The final exam will be cumulative.

For sample questions for the material covered for Midterm 1 you should look at this page. For sample questions for the material covered for Midterm 2 you should look at this page.

Sample questions concerning material covered since Midterm 2:

- BRIEFLY define the following terms and give an example of how each term is used:
- Decision Tree
- Entropy of a set of examples
- Information gain of a set of examples
- Conditional probability
- Conditional independence
- Full joint probability distribution
- Bayes rule
- Bayes network

- Define the ID3 algorithm for learning a decision tree.
- For some set of examples with some features calculate the information gain for each feature (show the entropy values needed).Which of the features would be chosen in learning a decision tree?
- In a decision tree it is sometimes preferrable not to include a set of decisions that complete separates a set of points. Why? Give an example that illustrates your point. How do decision tree learning algorithms learn in such cases?
- How are features with more than two values captured as decisions in decision trees? How about continuous features?
- Why is probability often used in reasoning in artificial intelligence? Give an example of a situation where a probabilistic representation would be preferrable to a pure logical representation.
- Given some full joint probability distribution give two examples of conditional probabilities that can be determined from that distribution.
- How does the chain rule work in evaluating probability? Give an example and show how conditional independence affects this rule.
- What are the advantages and disadvantages of using a full joint probability distribution? How do Bayes networks address the disadvantages?
- Given a Bayes network show how to estimate a probability using that network.
- What is Bayes rule? Give an example demonstrating how that rule can be applied.