A Comparative Analysis of Dataset Performance in Disease Prediction via Machine Learning Algorithm

Mustafa Cosar

doi:10.55549/epstem.1803070

Research Article

A Comparative Analysis of Dataset Performance in Disease Prediction via Machine Learning Algorithm

Year 2025, Volume: 35, 29 - 37

Mustafa Cosar

https://doi.org/10.55549/epstem.1803070

Abstract

This study aims to investigate how dataset characteristics influence the predictive performance of machine learning (ML) algorithms in the context of disease diagnosis. While existing literature often focuses on evaluating the performance of various models on a single dataset, this study adopts a broader perspective. The UCI Heart Disease, Heart Failure, and Cleveland datasets were pre-processed using various techniques to ensure structural comparability and subsequently analyzed using models developed with the CatBoost algorithm. The study assesses the performance of these models on each dataset and explores the influence of different parameters. The model demonstrated strong predictive capability across all datasets, achieving high accuracy scores. For the UCI Heart Disease dataset, the model was able to effectively distinguish between classes, supported by an accuracy rate of 84.24% and other performance metrics. On the Heart Failure dataset, the model exhibited even higher performance, with an accuracy of 88.59%. The Cleveland dataset also yielded favorable results, achieving an accuracy of 85.25%. The results underscore the practical value of ML-based classifiers in the early prediction of heart-related medical conditions. By comparing model success across different datasets, the study highlights the applicability and effectiveness of these techniques and provides direction for future
research involving larger datasets and alternative algorithms.

Keywords

Machine learning , Disease prediction , Importance of dataset , Performance analysis

References

Cosar, M. (2025). A comparative analysis of dataset performance in disease prediction via machine learning algorithm. The Eurasia Proceedings of Science, Technology, Engineering and Mathematics (EPSTEM), 35, 29- 37.

There are 1 citations in total.

Details

Primary Language	English
Subjects	Electrical Machines and Drives
Journal Section	Articles
Authors	Mustafa Cosar
Early Pub Date	October 20, 2025
Publication Date	October 27, 2025
Submission Date	May 21, 2025
Acceptance Date	June 29, 2025
Published in Issue	Year 2025 Volume: 35

Cite

APA	Cosar, M. (2025). A Comparative Analysis of Dataset Performance in Disease Prediction via Machine Learning Algorithm. The Eurasia Proceedings of Science Technology Engineering and Mathematics, 35, 29-37. https://doi.org/10.55549/epstem.1803070