Comparison of Data Selection Strategies For Online Support Vector Machine Classification
In Proceedings of the International Congress on Neurotechnology, Electronics and Informatics (http://www.neurotechnix.org/), (NEUROTECHNIX-2015), 16.11.-17.11.2015, Lissabon, SciTePress, pages 59-67, Nov/2015.
It is often the case that practical applications of support vector machines (SVMs) require the capability to perform online learning under limited availability of computational resources. Enabling SVMs for online learning can be done through several strategies. One group thereof manipulates the training data and limits its size. We aim to summarize these existing approaches and compare them, firstly, on several synthetic datasets with different shifts and, secondly, on electroencephalographic (EEG) data. During the manipulation, class imbalance can occur across the training data and it might even happen that all samples of one class are removed. In order to deal with this potential issue, we suggest and compare three balancing criteria. Results show, that there is a complex interaction between the different groups of selection criteria, which can be combined arbitrarily. For different data shifts, different criteria are appropriate. Adding all samples to the pool of considered samples performs usually significantly worse than other criteria. Balancing the data is helpful for EEG data. For the synthetic data, balancing criteria were mostly relevant when the other criteria were not well chosen.
Support Vector Machine, Online Learning, Brain Computer Interface, Electroencephalogram, Incremental/ Decremental Learning