TY - JOUR
T1 - Stacking ensemble approach to diagnosing the disease of diabetes
AU - Daza, Alfredo
AU - Ponce Sánchez, Carlos Fidel
AU - Apaza-Perez, Gonzalo
AU - Pinto, Juan
AU - Zavaleta Ramos, Karoline
N1 - Publisher Copyright:
© 2023 The Authors
PY - 2024/1
Y1 - 2024/1
N2 - Background: Diabetes is a very common disease today and has acquired a worrying focus in the field of public health globally, in fact, it is estimated that the number of people with diabetes worldwide has reached 415 million. Objective: Propose a method and 4 combined models based on Stacking ensemble to diagnose Diabetes. In addition, a web interface was developed with the best model proposed in this study. Methods: The dataset collected from the Diabetes Dataset composed of 768 patient records was used. The data was then pre-processed using the Python programming language. To balance the data, it was divided into 4 values and an oversampling method was applied to distribute the data proportionally. Then, divisions were made on the balanced data using the cross-validation method for data training, and the models were calibrated. Regarding the development of base algorithms, 7 independent algorithms were used, and 4 combined algorithms based on Stacking were proposed, and finally obtain the evaluation of the model with their respective metrics. Results: Stacking 1A (Logistic regression) with Oversampling reached the best value of Accuracy = 91.5 %, Sensitivity = 91.6 %, F1-Score = 91.49 % and Precision = 91.5 %, while with respect to the metric ROC Curve, Stacking 1A (Logistic regression) with Oversampling, Stacking 2A (Random Forest) with oversampling, and Random Forest (Independent) reached the best percentage, this being 97 %. Conclusions: Implementing 4 stacking models using the oversampling method, helps to make an adequate diagnosis of diabetes. Therefore, by using the combined method, an improvement in diabetes prediction was observed, surpassing the performance of the independent algorithms used.
AB - Background: Diabetes is a very common disease today and has acquired a worrying focus in the field of public health globally, in fact, it is estimated that the number of people with diabetes worldwide has reached 415 million. Objective: Propose a method and 4 combined models based on Stacking ensemble to diagnose Diabetes. In addition, a web interface was developed with the best model proposed in this study. Methods: The dataset collected from the Diabetes Dataset composed of 768 patient records was used. The data was then pre-processed using the Python programming language. To balance the data, it was divided into 4 values and an oversampling method was applied to distribute the data proportionally. Then, divisions were made on the balanced data using the cross-validation method for data training, and the models were calibrated. Regarding the development of base algorithms, 7 independent algorithms were used, and 4 combined algorithms based on Stacking were proposed, and finally obtain the evaluation of the model with their respective metrics. Results: Stacking 1A (Logistic regression) with Oversampling reached the best value of Accuracy = 91.5 %, Sensitivity = 91.6 %, F1-Score = 91.49 % and Precision = 91.5 %, while with respect to the metric ROC Curve, Stacking 1A (Logistic regression) with Oversampling, Stacking 2A (Random Forest) with oversampling, and Random Forest (Independent) reached the best percentage, this being 97 %. Conclusions: Implementing 4 stacking models using the oversampling method, helps to make an adequate diagnosis of diabetes. Therefore, by using the combined method, an improvement in diabetes prediction was observed, surpassing the performance of the independent algorithms used.
KW - Diabetes
KW - Hyperparameters
KW - Machine learning
KW - Oversampling
KW - Prediction
KW - Stacking
UR - http://www.scopus.com/inward/record.url?scp=85180596847&partnerID=8YFLogxK
U2 - 10.1016/j.imu.2023.101427
DO - 10.1016/j.imu.2023.101427
M3 - Article
AN - SCOPUS:85180596847
SN - 2352-9148
VL - 44
JO - Informatics in Medicine Unlocked
JF - Informatics in Medicine Unlocked
M1 - 101427
ER -