CS 8751 Take Home Final

Due December 17th by 5:55 p.m. -- NO LATE EXAMS ALLOWED


You must submit a signed copy of the following page with your exam.

YOU MUST SHOW YOUR WORK TO RECEIVE FULL CREDIT!!!

  1. Show the initial G and S sets and the G and S sets after each of the data points shown below is presented to the Version Space (Candidate Elimination) algorithm: [15 points]
    	1	2	1	3	1	+
    	2	1	2	2	2	-
    	1	2	1	1	2	-
    	1	1	2	3	1	-
    	1	2	2	3	2	+
        
  2. For a dataset with 5 features, A, B, C, D, and E where A, B, and D have possible values of true and false, and C and E are continuously valued and the following examples:
         A		B	C	D	E	Class
         -----------------------------------------------------
         false	true	15	false	20	+ positive
         true	false	1	true	5	- negative
         false	false	10	true	10	- negative
         false	false	8	false	15	+ positive
         true	true	13	true	16	+ positive
         false	true	9	false	8	- negative
         false	false	1	true	5	- negative
         true	false	12	false	13	- negative
         true	true	15	false	6	+ positive
         true	true	15	true	10	+ positive
         false	true	13	true	7	- negative
         false	true	3	false	5	+ positive
    
         NOTE:
       
  3. Consider the use of ensembles in machine learning. [30 points]
    1. Explain how ensembles address the issue of overfitting avoidance.
    2. What is one strength of bagging compared to boosting? Justify your answer.
    3. Give a brief argument for the use of an ensemble consisting of one decision tree, one support-vector machine (SVM), and one neural network instead of using an ensemble of three models all produced by the same learning algorithm.
  4. What neural network would be generated by KBANN from the following rules assuming the output predicate is J and the input predicates are A, B, C and D? For each unit generated you should connect it to any input unit that it is not already directly connected to with a small weight link. [20 points]
       A, C -> E
       B, not C, D -> E
       E, C -> F
       not A, D -> F
       E, F -> G
       B, not E -> G
       E, not F -> H
       E, G, H -> J
       
  5. Given a neural network with 3 input units (A, B, C), two hidden units (D, E), one output unit (F) and one unit that always has an activation value of 1 (ONE) and the following weight connections:
       ONE->D: 0.0
       A->D: 0.5
       B->D: 0.0
       C->D: -1.0
       ONE->E: 0.5
       A->E: 0.0
       B->E: 0.5
       C->E: 0.5
       ONE->F: 0.0
       D->F: -0.5
       E->F: 0.5
       

    What would be the weights after each of the following points is presented (in the sequence shown) assuming a learning rate of 0.25 and a momentum term of 0.9. Assume the hidden and output units use a sigmoidal activation function and that the weights are changed using backpropagation. [20 points]

                  A B C   F
       Point 1:   1 0 1   1 
       Point 2:   0 1 1   0
       Point 3:   1 1 1   1
       
  6. A key concern in supervised learning is overfitting avoidance. Define overfitting and explain its importance. [20 points]

    Discuss one key technique (two in total) for addressing the problem of overfitting in (i) decision trees and (ii) neural networks.

  7. For the maze world shown in the top of the three diagrams below with actions and rewards shown in the diagram calculate the corresponding V*(s) and Q(s,a) values assuming a discount factor of 0.8. Assume the agent stops moving when they reach the upper right hand state. [20 points]

  8. Briefly define and explain the following terms and how they are used in support vector machines: (i) margin, (ii) kernel, and (iii) slack variables. [15 points]
  9. For the following points:
    	A	B	C	D	class
    	-------------------------------------
    	-1	1	-1	-1	-1
    	1	1	1	1	1
    	-1	1	1	1	1
    	-1	-1	1	1	-1
    	1	-1	-1	-1	-1
         

    Assuming a linear kernel and the use of slack variables give the set of constraint equations generated for these points. [20 points]

  10. For the Bayes network and CPTs shown below calculate the following: [20 points]
    1. p(e=true|a=true,b=true,c=true)
    2. p(d=true|e=true,b=false)
    3. p(e=false|a=true,c=false)

  11. Define the term association rule. Give the Apriori algorithm for learning association rules. Show an example of how the algorithm works. Give two examples of ways to speedup this algorithm. [20 points]
  12. How would you represent the solution to a regression problem in genetic algorithms? Give an example to demonstrate your solution. Discuss what kind of fitness function you would use and how concepts would be selected for reproduction. [20 points]