While conducting my experiments, I noticed that including the HS data set gave me anomalous results since Naive Bayes performed far better than all other algorithms on the HS data set. This contradicts the findings of the original paper and so I have included two sets of results.

Normalized scores for each learning algorithm by metric (average over six problems without HS data set)



ACC F-score ROC Apr RMS 1-RMS Mean
ANN 0.7958 0.7672 0.7827 0.8088 0.3160 0.6840 0.7677
Log reg 0.8382 0.7873 0.7260 0.8413 0.3760 0.6240 0.7634
Random For 0.7193 0.7050 0.8605 0.7185 0.2382 0.7618 0.7530
Boosted Dt 0.7540 0.7677 0.7570 0.7763 0.3442 0.6558 0.7421
Bagged DT 0.7162 0.6613 0.8360 0.7143 0.2254 0.7746 0.7405
Boosted stmp 0.6908 0.6795 0.8228 0.6908 0.2165 0.7835 0.7335
SVM 0.8420 0.7823 0.6152 0.8453 0.4335 0.5665 0.7303
DT 0.7054 0.6933 0.7658 0.7050 0.2275 0.7726 0.7284
naïve Bayes 0.6356 0.6730 0.8663 0.6392 0.3188 0.6812 0.6991
KNN 0.3558 0.3707 0.6920 0.3557 0.2297 0.7703 0.5089


As, it can be seen from the table above, my results generally agree with the authors results. ANNs, Random forests and Boosted trees have performed well while KNN, Naive Bayes and Decision trees do not exhibit good performance.

Continue
Back
Back to project main page
Back to Varun's home page