### Introduction

Decision trees are normally built in a greedy manner. One problem with this approach is that the best feature to split on at any point might not be apparent without considering its interactions with each of the other remaining features. In this work you will implement a one step lookahead feature in the ID3 algorithm that is included in the Weka code.

### To Do

You will need to create a new decision tree method based on the given ID3 code. You should extract the java code for ID3 from the weka-src.jar file (which should have been included when you downloaded the code). You can implement your lookahead version of ID3 either by creating a new decision tree file by copying the ID3 code and changing the name, or you can extend the existing ID3 code.

Your lookahead should work by simply changing how information gain is used to select the best feature:

```BestFeature(AvailableFeatures,RemainingExamples)
BestGain1 = 0
BestGainFeature = ?
NumExamples = | RemainingExamples |
For each feature F1 from AvailableFeatures
ThisInfoGain = Entropy(RemainingExamples)
For each feature value V1 of F1
ExamplesForThisValue = RemainingExamples with value V1 of F1
If (ExamplesForThisValue need not or cannot be further subdivided)
ThisInfoGain = ThisInfoGain - | ExamplesForThisValue | / NumExamples * Entropy(ExamplesForThisValue)
Else
BestGain2 = 0
For each feature F2 from (AvailableFeatures-F1)
SubInfoGain = 0
For each feature value V2 of F2
SubExamplesForThisValue = ExamplesForThisValue with value V2 of F2
SubInfoGain = SubInfoGain + | SubExamplesForThisValue | / NumExamples * Entropy(SubExamplesForThisValue)
EndFor
If (SubInfoGain > BestGain2)
BestGain2 = SubInfoGain
EndIf
EndFor
ThisInfoGain = ThisInfoGain - BestGain2
EndIf
EndFor
If (ThisInfoGain > BestGain1)
BestGain1 = ThisInfoGain
BestGainFeature = F1
EndIf
EndIf
Return BestGainFeature
End
```

Once you get your code working you should test it on the datasets you used in program 1. Compare these results to the trees that would be produced with ID3. Note that if you included features with unknown values you should replace the unknowns with some feature value.

Also, design a dataset that is guaranteed to have different results for ID3 and your ID3WithLookAhead with method. Hint: you may want to look at problems that are not linearly separable for inspiration.

### To Hand In

1. Comment your code and hand in a copy of the code.
2. Show the results of ID3 and your lookahead version of ID3 on the three datasets mentioned in the previous section.
3. Create a tar or zip archive of your code and email it to me.