DigitalUM :: Browsing by Subject "Feature selection"

Browsing by Subject "Feature selection"

Now showing 1 - 8 of 8

Open Access
A methodology for energy multivariate time series forecasting in smart buildings based on feature selection
(Elsevier, 2019-05-10) González Vidal, Aurora; Jiménez Barrionuevo, Fernando; Skarmeta Gómez, Antonio; Ingeniería de la Información y las Comunicaciones; Facultades de la UMU::Facultad de Informática
The massive collection of data via emerging technologies like the Internet of Things (IoT) requires ﬁnding optimal ways to reduce the created features that have a potential impact on the information that can be extracted through the machine learning process. The mining of knowledge related to a concept is done on the basis of the features of data. The process of ﬁnding the best combination of features is called feature selection. In this paper we deal with multivariate time-dependent series of data points for energy forecasting in smart buildings. We propose a methodology to transform the time-dependent database into a structure that standard machine learning algorithms can process, and then, apply different types of feature selection methods for regression tasks. We used Weka for the tasks of database transformation, feature selection, regression, statistical test and forecasting. The proposed methodology improves MAE by 59.97% and RMSE by 40.75%, evaluated on training data, and it improves MAE by 42.28% and RMSE by 36.62%, evaluated on test data, on average for 1-step-ahead, 2-step-ahead and 3-step-ahead when compared to not applying any feature selection methodology.
Open Access
Effect of the Synergetic Use of Sentinel-1, Sentinel-2, LiDAR and Derived Data in Land Cover Classification of a Semiarid Mediterranean Area Using Machine Learning Algorithms
(Multidisciplinary Digital Publishing Institute, 2023-01-05) Valdivieso Ros, Carmen; Alonso Sarria, Francisco; Gomariz Castillo, Francisco; Geografía
Land cover classification in semiarid areas is a difficult task that has been tackled using different strategies, such as the use of normalized indices, texture metrics, and the combination of images from different dates or different sensors. In this paper we present the results of an experiment using three sensors (Sentinel-1 SAR, Sentinel-2 MSI and LiDAR), four dates and different normalized indices and texture metrics to classify a semiarid area. Three machine learning algorithms were used: Random Forest, Support Vector Machines and Multilayer Perceptron; Maximum Likelihood was used as a baseline classifier. The synergetic use of all these sources resulted in a significant increase in accuracy, Random Forest being the model reaching the highest accuracy. However, the large amount of features (126) advises the use of feature selection to reduce this figure. After using Variance Inflation Factor and Random Forest feature importance, the amount of features was reduced to 62. The final overall accuracy obtained was 0.91 & PLUSMN; 0.005 (alpha = 0.05) and kappa index 0.898 & PLUSMN; 0.006 (alpha = 0.05). Most of the observed confusions are easily explicable and do not represent a significant difference in agronomic terms.
Open Access
Multi-objective evolutionary feature selection for ensemble learning with random forests in time series forecasting
(Elsevier, 2025-11-10) Espinosa, Raquel; Sánchez Carpena, Gracia; Palma Méndez, José Tomás; Jiménez Barrionuevo, Fernando; Ingeniería de la Información y las Comunicaciones; Facultades de la UMU::Facultad de Informática
Time series forecasting is fundamental in numerous domains, including finance, healthcare, energy, and environmental monitoring. However, the high dimensionality of feature spaces can lead to overfitting and reduced interpretability, making feature selection a crucial preprocessing step. This paper proposes a multiobjective evolutionary algorithm for feature selection in time series forecasting, designed to enhance predictive accuracy while improving generalization. The method partitions the dataset, associating each partition with an objective function in the optimization process. By independently selecting relevant feature subsets, it generates a Pareto front of Random Forest models, each trained on a distinct subset of features. These models are then aggregated into a stacking-based ensemble framework, effectively balancing feature relevance and diversity. Additionally, we introduce a feature importance measure based on selection frequency in the non-dominated solutions of the optimization process. To validate our approach, we conduct experiments on real-world forecasting tasks, including air quality prediction in southeastern Spain and Italy and oil temperature forecasting in industrial applications. We also evaluate performance on synthetic datasets of increasing complexity, systematically varying instances, features, seasonality, noise, and trends. The proposed method is compared against conventional Random Forest, a wrapper-based feature selection method with a multiobjective evolutionary search strategy, and several state-of-the-art embedded feature selection techniques for time series forecasting. The results demonstrate that our approach significantly improves forecasting accuracy while mitigating overfitting. By integrating multi-objetive evolutionary optimization, random forest, ensemble learning, and a novel feature importance measure, our method offers a robust, interpretable, and effective feature selection for time series forecasting applications.
Open Access
Multi-objective evolutionary simultaneous feature selection and outlier detection for regression
(Institute of Electrical and Electronics Engineers, 2021-09-27) Jiménez Barrionuevo, Fernando; Lucena Sánchez, Estrella; Sánchez Carpena, Gracia; Sciavicco, Guido; Ingeniería de la Información y las Comunicaciones; Facultades de la UMU::Facultad de Informática
When investigating the causes of contamination in specific contexts, such as in underground water wells, multivariate regression is commonly used to establish possible links between the chemical-physical values of the samples and the levels of contaminant. Two issues often arise from such a statistical analysis: selecting the best predicting variables and detecting the instances that can be suspected to be outliers. In this paper, we propose a comprehensive, integrated, and general optimization model that solves these two problems simultaneously in such a way that outliers can be detected in reference to the specific variables that are selected for the regression, and we implement such an optimization model with a well-known evolutionary algorithm. We test our proposal on data extracted from a project whose aim is to establish the causes of the contamination of underwater water wells in a very specific area of northeastern Italy. The results show that our variable selection and outlier detection algorithm allows the synthesis of very reliable, interpretable, and clean regression models.
Open Access
Multi-surrogate assisted multi-objective evolutionary algorithms for feature selection in regression and classification problems with time series data
(Elsevier, 2022-12-10) Espinosa, Raquel; Jiménez Barrionuevo, Fernando; Palma Méndez, José Tomás; Ingeniería de la Información y las Comunicaciones; Facultades de la UMU::Facultad de Informática
Feature selection wrapper methods are powerful mechanisms for reducing the complexity of prediction models while preserving and even improving their precision. Meta-heuristic methods, such as multi-objective evolutionary algorithms, are commonly used as search strategies in feature selection wrapper methods since they allow minimizing the cardinality of the attribute subset and simultaneously maximizing the predictive capacity of the model. However, in high-dimensional problems, multi-objective evolutionary algorithms for wrapper-type feature selection may require excessive computational time, sometimes impractical, especially when the learning algorithm has a high computational cost, such as deep learning. To address this drawback, in this paper we propose a multi-surrogate assisted multi-objective evolutionary algorithm for feature selection, specially designed to improve generalization error. The proposed method has been compared with conventional feature selection wrapper methods that use random forest, support vector machine and long short-term memory learning algorithms to evaluate subsets of attributes. The experiments have been carried out with regression and classification problems with time series data for air quality forecasting in the south-east of Spain and for indoor temperature forecasting in a domotic house. The results demonstrate the superiority of the proposed multi-surrogate assisted method over conventional wrapper methods using the same run times.
Open Access
Multiobjective evolutionary feature selection for fuzzy classification
(Institute of Electrical and Electronics Engineers, 2019-05) Jiménez Barrionuevo, Fernando; Martínez, Carlos; Marzano, Enrico; Palma Méndez, José Tomás; Sánchez Carpena, Gracia; Sciavicco, Guido; Ingeniería de la Información y las Comunicaciones
The interpretability of classification systems refers to the ability of these to express their behavior in a way that is easily understandable by a user. Interpretable classification models allow for external validation by an expert and, in certain disciplines, such as medicine or business, providing information about decision making is essential for ethical and human reasons. Fuzzy rule based classification systems are consolidated powerful classification tools based on fuzzy logic and designed to produce interpretable models; however, in presence of a large number of attributes, even rule-based models tend to be too complex to be easily interpreted. In this paper, we propose a novel multivariate feature selection method in which both search strategy and classifier are based on multiobjective evolutionary computation. We designed a set of experiments to establish an acceptable setting with respect to the number of evaluations required by the search strategy and by the classifier. We tested our strategy on a real-life dataset and compared the results against a wide range of feature selection methods that includes filter, wrapper, multivariate, and univariate methods, with deterministic and probabilistic search strategies, and with evaluators of diverse nature. Finally, the fuzzy rule based classification model obtained with the proposed method has been evaluated with standard performance metrics and compared with other well-known fuzzy rule based classifiers. We have used two real-life datasets extracted from a contact center; in one case, with the proposed method, we obtained an accuracy of 0.7857 with eight rules, while the best fuzzy classifier compared obtained 0.7679 with eight rules, and in the second case, we obtained an accuracy of 0.7403 with five rules, while the best fuzzy classifier compared obtained 0.6364 with four rules.
Open Access
Sensitivity-constrained evolutionary feature selection for imbalanced medical classification: a case study on rotator cull tear surgery prediction
(MDPI, 2025-12-08) Belmonte, José María; Jiménez Barrionuevo, Fernando; Sánchez Carpena, Gracia; Gabardo, Santiago; Martínez Catalán, Natalia; Calvo, Emilio; Bernabé García, Gregorio; García Carrasco, José Manuel; Ingeniería de la Información y las Comunicaciones; Facultades de la UMU::Facultad de Informática
While most patients with degenerative rotator cuff tears respond to conservative treatment, a minority progress to surgery. To anticipate these cases under class imbalance, we propose a sensitivity-constrained evolutionary feature selection framework prioritizing surgical-class recall, benchmarked against traditional methods. Two variants are proposed: (i) a single-objective search maximizing balanced accuracy and (ii) a multi-objective search also minimizing the number of selected features. Both enforce a minimum-sensitivity constraint on the minority class to limit false negatives. The dataset includes 347 patients (66 surgical, 19%) described by 28 clinical, imaging, symptom, and functional variables. We compare against 62 widely adopted pipelines, including oversampling, undersampling, hybrid resampling, cost-sensitive classifiers, and imbalance-aware ensembles. The main metric is balanced accuracy, with surgical-class F1-score as secondary. PairwiseWilcoxon tests with a win–loss ranking assessed statistical significance. Evolutionary models rank among the top; the multi-objective variant with a Balanced Bagging Classifier performs best, achieving a mean balanced accuracy of 0.741. Selected subsets recurrently include age, tear location/severity, comorbidities, and pain/functional scores, matching clinical expectations. The constraint preserved minority-class recall without discarding or synthesizing data. Sensitivity-constrained evolutionary feature selection thus offers a data-preserving, interpretable solution for pre-surgical decision support, improving balanced performance and supporting safer triage decisions.
Open Access
Surrogate-assisted and filter-based multi-objective evolutionary feature selection for deep learning
(Institute of Electrical and Electronics Engineers, 2023-01-12) Espinosa Fernández, Raquel; Jiménez Barrionuevo, Fernando; Palma Méndez, José Tomás; Ingeniería de la Información y las Comunicaciones
Feature selection for deep learning prediction mod- els is a difficult topic for researchers to tackle. Most of the ap- proaches proposed in the literature consist of embedded methods through the use of hidden layers added to the neural network architecture that modify the weights of the units associated with each input attribute so that the worst attributes have less weight in the learning process. Other approaches used for deep learning are filter methods, which are independent of the learning algorithm, which can limit the precision of the prediction model. Wrapper methods are impractical with deep learning due to their high computational cost. In this paper, we propose new attribute subset evaluation feature selection methods for deep learning of the wrapper, filter and wrapper-filter hybrid types, where multi-objective and many-objective evolutionary algorithms are used as search strategies. A novel surrogate-assisted approach is used to reduce the high computational cost of the wrapper-type objective function, while the filter-type objective functions are based on correlation and an adaptation of the reliefF algorithm. The proposed techniques have been applied in a time series forecasting problem of air quality in the Spanish south-east and an indoor temperature forecasting problem in a domotic house, with promising results compared to other feature se

Browsing by Subject "Feature selection"

Results Per Page

Sort Options