Feature selection is a core concept in machine learning. Aiming at selecting a subset of relevant features for use in model construction, feature selection is a crucial step that can dramatically impact model performance. One of the most commonly used methods for improving the stability of feature selectors is to integrate the results of multiple feature selectors, aka ensemble feature selection. Drawbacks to this approach are that it is time-consuming, and, until now, there has been limited research on how to reduce its computational cost when estimating stability.
To address these issues, a research team from Tokyo University and Preferred Networks has proposed a fast, simulation-based method for estimating the stability of ensemble feature selectors.
The idea behind the proposed method is to build a feature selector simulator that mimics the behaviour of the base selector and uses a simulated ensemble feature selector to estimate the stability.
The proposed algorithm constructs a set of simulated selectors that model the base selector as well as the dataset. It can then quickly calculate stability by creating simulated ensemble feature selectors that contain two parameters: the number of useful features for the task (n); and a probability that reflects the uncertainty derived from both feature selectors and the dataset (p). Because these two parameters are obtained by running the real selector, in this study the researchers assume that the parameters have already been estimated.
In the proposed algorithm’s workflow, computational cost is dependant on the run trials of the real selectors, and so the overall computational complexity is relatively low, enabling a faster stability computation process.
To demonstrate the applicability of their proposed method, the team conducted experiments on three microarray gene expression datasets: Colon (Ding and Peng, 2005), Lymphoma (Ding and Peng, 2005), and Prostate (Nie et al., 2010). For their base feature selector, they used a trained random forest that assigns an importance score to each feature and another random forest as a predictor for evaluating the performance of the selected features. They employed the pair-wise Jaccard similarity as their stability index.
The results show that the proposed method can accurately estimate the stability of ensemble feature selectors while maintaining a low computation cost. The team believes their simulation method can aid in the evaluation of ensemble feature selection algorithms in terms of stability while saving time by reducing the required number of executions of the real feature selectors.
The paper Fast Estimation Method for the Stability of Ensemble Feature Selectors is on arXiv.
Author: Hecate He | Editor: Michael Sarazen, Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.