Some sample midterm questions:

1. Briefly define the following terms:

   Concept Learning

   Continuous-Valued Attribute

   Discrete-Valued Attribute

   Inductive Learning

   The Inductive Learning Hypothesis

   Version Space

   Inductive Bias

   Noise

   N-Fold Cross Validation

   Training, Testing, Validation (or Tuning) Set

   Confusion Matrix

   Confidence Interval

   ROC Curve

   Precision

   Recall

   Decision Tree

   Entropy

   Information Gain

   Gain Ratio (in decision trees)

   Overfitting

   Gradient Descent

   Artificial Neural Network

   Linear Threshold Unit

   Sigmoid Unit

   Perceptron

   Multi-Layer Perceptron

   Batch mode Gradient Descent

   Incremental or Stochastic Gradient Descent

   Input Unit
 
   Hidden Unit

   Output Unit

   Eager learning

   Lazy Learning

   Curse of dimensionality

   kd Tree

   Domain theory

2. Outline the four key questions that must be answered when designing a
   machine learning algorithm.  Give an example of an answer for each question.

3. Define the following algorithms: (a real question would just ask for one
   of these)

   Find-S

   List-Then-Eliminate (Version Space)

   Candidate Elimination (Version Space)

   ID3

   Perceptron Training Algorithm (assuming linear artificial neurons)

   Backpropagation (assuming sigmoidal artificial neurons)

   k-Nearest Neighbor

4. For each of the algorithms above, show how it works on a specific problem
   (examples of these may be found in the book or in the notes).

5. Why is inductive bias important for a machine learning algorithm?  Give
   some examples of ML algorithms and their corresponding inductive biases.

6. How would you represent the following concepts in a decision tree:

   A OR B
   A AND NOT B
   (A AND B) OR (C OR NOT D)

7. What problem does reduced-error pruning address?  How do we decide when
   to prunce a decision tree?

8. How do you translate a decision tree into a corresponding set of rules?

9. What mechanism was suggested in class for dealing with continuous-valued
   attributes in a decision tree?

10. What mechanism was suggested in class for dealing with missing attribute
   values in a decision tree?

11. What types of concepts can be learned with a perceptron using linear units?
    Give an example of a concept that could not be learned by this type of
    artificial neural network.

12. A multi-layer perceptron with sigmoid units can learn (using an
    algorithm like backpropagation) concepts that cannot be learned by 
    artificial neural networks that lack hidden units or sigmoid activation
    functions.  Give an example of a concept that could be learned by such
    a network and what the weights of a learned representation of this concept
    might be.

13. An artificial neural network uses gradient descent to search for a local
    minimum in weight space.  How is a local minimum different from the global
    minimum?  Why doesn't gradient descent find the global minimum? 

14. A concept is represented in C4.5 format with the following files.  The
    .names file is:

    Class1,Class2,Class3.  | Classes

    FeatureA:  continuous
    FeatureB:  BValue1, BValue2, BValue3, BValue4
    FeatureC:  continuous
    FeatureD:  Yes, No

    The data file is as follows:

    2.5,BValue2,100.0,No,Class2
    1.1,BValue4,300.0,Yes,Class1
    2.3,BValue3,150.0,No,Class3
    1.4,BValue1,350.0,No,Class2

    What input and output representation would you use to learn this problem
    using an artificial neural network?  Give the input and output vectors for
    each of the data points shown above.  What are the advantages and 
    disadvantages of your representation?

15. How does a k-Nearest Neighbor learner make predictions about new data points?
    How does a distance-weighted k-Nearest Neighbor learner differ from a
    standard k-Nearest Neighbor learner?  What is locally weighted regression?

16. How does a Radial Basis Function network work?  How does a kernel function
    work?

17. Give an example of a domain theory expressed as predicates.  How would
    that domain theory be converted into a corresponding neural network
    by KBANN (show the structure and weights of the resulting network).
    What is the main advantage of the KBANN approach?

18. What is a hybrid learning algorithm?  Give three examples of hybrid learning
    methods, explain how each works, and discuss the advantages of each.