The RVErS algorithm implementation was applied to two learning algorithms: C4.5 release 8 algorithm and naive Bayes. For C4.5 algorithm the tree pruning was disabled since pruning is a form of variable selection which may obscure the performance of RVErS.
Cost metric is a important factor in the RVErS algorithm. The key parameters such as cost function and value of 'k' to be selected depend upon this cost metric. It varies from one learning algorithm to another. The cost metric is the key component that defines the performance for the variable selection decision. Below are the cost metrics for C4.5 and naive bayes algorithms.
The cost metric for C4.5 is based on the number of calls to the gain-ratio data purity criterion. The cost of inducing a tree is therefore roughly in nature: one call per variable, per decision node, assuming linear number of nodes in the tree.
For C4.5, M(L, n) = num_nodes * n * total_instances. So, after removing 'k' variables from the original set of 'n' variables M(L, n-k) = num_nodes * (n-k) * total_instances
The cost metric for naive Bayes is based on the number of operations required to build the conditional probability table. Hence, the cost metric is linear in number of inputs.
Cost metric for naive bayes = cost for building the probability table for classes + cost for building the two dimensional probability table for attributes by classes.
= num_classes + (num_attributes * num_classes)
= num_classes * (num_attributes + 1)
So, for naive bayes, M(L, n) = num_instances * [num_classes * (n + 1)] , where, n is number of variables or attributes.