Bagging is commonly used to improve the performance of a classification algorithm by first using bootstrap sampling on the given data set to train a number of classifiers and then using the majority voting mechanism to aggregate their outputs. However, the improvement would be limited in the situation where the given data set contains missing values and the algorithm used to train the classifiers is sensitive to missing values. We propose an extension of bagging that considers not only the weights of the classifiers in the voting process but also the incompleteness of the bootstrapped data sets used to train the classifiers. The proposed extension assigns a weight to each of the classifiers according to its classification performance and adjusts the weight of each of the classifiers according to the ratio of missing values in the data set on which it is trained. In experiments, we use two classification algorithms, two measures for weight assignment, and two functions for weight adjustment. The results reveal the potential of the proposed extension of bagging for working with classification algorithms sensitive to missing values to perform classification on data sets having small numbers of instances but containing relatively large numbers of missing alues.
International Journal of Information and Education Technology, 3(5),560-566