Analyzing the Most Relevant Predictors for Adult Coronary Heart Disease using Machine Learning

Main Article Content

Muhammad Naufaldi
Sunny Jovita
Nunung Nurul Qomariyah

Abstract




Coronary heart disease has been the number one illness to cause death in the world for decades. The healthcare indus-tries generates vast amount of clinical data, driven by medical records of patients, regulatory requirements, and results of medicalexaminations. In order to obtain the most relevant features for coronary heart disease, this study has conducted an experimental evaluation on data-driven diagnosis of coronary heart disease using classification algorithms. A statistical test (Chi-square) is usedto find the most valuable features and risk factors associated with coronary heart disease. The purposed of this univariate feature extraction algorithm is to determine the difference between the observed resuslts with expected results. Furthermore, CHD is predicted using several classification machine learning algorithms including Logistic Regression, Complement Naïve Bayes. andSupport Vector Machine (SVM). This study also evaluates ensemble machine learning algorithms, such as Random Forest and Extreme Gradient Boosting (XGBoost), Gradient Boost, to find the best performance of the classifications algorithms and select essential features from the dataset. Holdout and cross-validations methods are used to separated the dataset into two sets, called thetraining set and the testing set. The performance of proposed algorithm are assessed in terms of certain factors such as specificityand sensitivity. From this study, it is shown that Gradient boost model exhibits the best performance with 0.839 sensitivity.




Article Details

Section
Articles
Author Biographies

Sunny Jovita, Department of Computer Science, Faculty of Computing and Media, Bina Nusantara University, Jakarta, Indonesia

 

 

Nunung Nurul Qomariyah, Department of Computer Science, Faculty of Computing and Media, Bina Nusantara University, Jakarta, Indonesia

 

 

References

World Health Organization, “Cardiovascular diseases (cvds),” 2021. [Online]. Available: https://www.who.int/news- room/fact-sheets/detail/ cardiovascular-diseases-(cvds)

Y. Zhao, E. Wood, N. Mirin, R. Vedanthan, S. Cook, and R. Chunara, “Machine learning for integrating social determinants in cardiovascular disease prediction models: A systematic review,” September 2020.

D. Chicco and G. Jurman, “Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone,” BMC Medical Informatics and Decision Making, vol. 20, February 2020. [Online]. Available: https://doi.org/10.1186/s12911-020-1023-5

M. Khan and M. R. Mondal, “Data-driven diagnosis of heart disease,” International Journal of Computer Applications, vol. 176, pp. 46–54, July 2020. [Online]. Available: 10.5120/ijca2020920549

N. Thai-Nghe, Z. Gantner, and L. Schmidt-Thieme, “Cost-sensitive learning methods for imbalanced data,” June 2010, pp. 1–8.

R. T. Suriya Begum, Farooq Ahmed Siddique, “A study for predicting heart disease using machine learning,” Turkish Journal of Computer and Mathematics Education, vol. 12, pp. 4584–4592, April 2021.

M. Zeng, B. Zou, F. Wei, X. Liu, and L. Wang, “Effective prediction of three common diseases by combining smote with tomek links technique for imbalanced medical data,” pp. 225–228, May 2016.

W. Zhu, N. Zeng, and N. Wang, “Sensitivity, specificity, accuracy, associated confidence interval and roc analysis with practical sas ® imple- mentations,” NorthEast SAS users group, health care and life sciences, January 2010.

X.-Y. Gao, A. Ali, H. Shaban, and E. Anwar, “Improving the accuracy for analyzing heart diseases prediction based on the ensemble method,” Complexity, vol. 2021, pp. 1–10, February 2021.

Https://www.cdc.gov/brfss/annual_data/annual_2019.html.

S. Sperandei, “Understanding logistic regression analysis,” Biochemia medica, vol. 24, pp. 12–8, February 2014.

D. Srivastava and L. Bhambhu, “Data classification using support vector machine,” Journal of Theoretical and Applied

Information Technology, vol. 12, pp. 1–7, February 2010.

B. Seref and G. E. Bostanci, “Performance comparison of naïve bayes and complement naïve bayes algorithms,” pp. 131–

, April 2019.

E. P. R. Manpreet Kaur, Er. Shailja, “An optimized approach for prediction of heart diseases using gradient boosting

classifier,” International Journal of Application or Innovation in Engineering Management, vol. 9, pp. 130–136, August

A. Hanga, M. Alalyani, I. Hussain, Musa, and M. Almutheibi, “Brief review on sensitivity, specificity and

predictivities,” IOSR Journal of DentalandMedicalSciences,vol.14,June2015.

S. Arunachalam, “Cardiovascular disease prediction model using machine learning algorithms,” International Journal for

Research in Applied Science and Engineering Technology, vol. 8, pp. 1006–1019, July 2020.