The probability of successfully selecting âkâ irrelevant variables at random is given by:
where,
n ... remaining variables
r ... relevant variables
The expected number of consecutive failures before a success at selecting k irrelevant variables is given by
The above expression yields the expected number of consecutive trials in which at least one of the r relevant variables will be randomly selected along with the irrelevant variables prior to success.
The expected cost of successfully removing k variables from n remaining given r relevant variables is given by

where, M(L, n) represents an upper bound on the cost of running algorithm âLâ on n inputs. For example, in case of perceptron unit is related to the upper bound on number of weight updates performed by n-input perceptron.
If the computational cost for running the alogorithm is less than M(L, n) should include this difference.
The optimal cost of removing irrelevant variables from n remaining and r relevant is given by
Optimal value of 'k'
The optimal value of 'k' for r relevant variables and n remaining variables is the value of 'k' for which we get optimal cost. Hence we can write the function of optimal 'k' as
