A Comprehensive Review of Data Mining Techniques for Diabetes Diagnosis Using the Pima Indian Diabetes Dataset

Main Article Content

Hadeel M Saleh

Abstract

Diabetes is a major global health concern, and early diagnosis is crucial for effective management and prevention of complications. This paper presents a comprehensive review of various data mining techniques applied to the diagnosis of diabetes, specifically using the Pima Indian Diabetes dataset. The Pima Indian dataset, a widely used benchmark in diabetes research, contains information on various health-related features such as age, body mass index, insulin levels, and glucose concentration, among others, which are crucial for predicting the onset of diabetes. The review explores a range of classification algorithms, including decision trees, support vector machines (SVM), logistic regression, k-nearest neighbors (KNN), and artificial neural networks (ANNs), discussing their performance, strengths, and limitations in predicting diabetes.


In order to increase the models' accuracy and efficiency, the paper also emphasizes the importance of preprocessing procedures like feature selection, data cleaning, and normalization. It also contrasts the evaluation metrics accuracy, precision, recall, F1-score, and area under the ROC curve (AUC) that are used to evaluate the performance of these models. We hope that this review will shed light on the best data mining methods for diagnosing diabetes, with a focus on scalability, interpretability, and model optimization.


The findings of this review suggest that while traditional techniques like decision trees and logistic regression are effective, more complex models such as support vector machines and neural networks tend to yield higher prediction accuracy. However, the trade-off between model complexity and interpretability remains a key challenge in the deployment of these techniques for clinical decision-making. The paper concludes by suggesting future directions for improving diabetes diagnosis through the integration of advanced machine learning methods and big data analytics.

Article Details

How to Cite
Saleh, H. M. (2024). A Comprehensive Review of Data Mining Techniques for Diabetes Diagnosis Using the Pima Indian Diabetes Dataset. EDRAAK, 2024, 39-42. https://doi.org/10.70470/EDRAAK/2024/006
Section
Articles