Examining the Relationship of Breast Cancer Data With Survival Chance and Comparison of Algorithms on Breast Cancer Prediction

Authors

DOI:

https://doi.org/10.58190/ijamec.2025.117

Keywords:

Machine Learning, • Breast Cancer, • XGBoost

Abstract

This article compares the performance of machine learning algorithms on breast cancer data. The aim is to predict the survival status of breast cancer patients and contribute to the development of clinical decision support systems. Using a dataset obtained from the National Cancer Institute, XGBoost, Random Forest, Support Vector Machines (SVM), and Logistic Regression algorithms were compared. Data preprocessing steps were applied, correlation analysis was performed, and it was determined that the XGBoost algorithm showed the best performance with hyperparameter optimization. The metrics obtained after hyperparameter optimization of the XGBoost algorithm show an overall accuracy of 92%. Optimization has resulted in high performance for class 0 (precision 92%, recall 98%), but the recall for class 1 remains at 54%. The article discusses the effect of data imbalance on the results and offers suggestions for future studies. 

Downloads

Download data is not yet available.

References

[1] World Health Organization. Breast cancer.2024.

[2] Welch HG, Prorok PC, O'Malley AJ, Kramer BS. “Breast-Cancer Tumor Size, Overdiagnosis, and Mammography Screening Effectiveness”. New England Journal of Medicine, 375(15), 1438-1447, 2016.

[3] American Cancer Society. "Invasive Breast Cancer.",2021.

[4] Harbeck N, Gnant M. “Breast cancer”. The Lancet, 389(10074), 1134-1150, 2017.

[5] Topol E.J. High-performance medicine: the convergence of human and artificial intelligence. Nature Medicine, 25, 44-56.2019.

[6] Ehteshami Bejnordi B, Veta M, van Diest PJ, et al. "Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer". JAMA, 318(22), 2199-2210, 2017. DOI:10.1001/jama.2017.14585

[7] Zhang B, Shi H, Wang H. “Machine Learning and AI in Cancer Prognosis, Prediction, and Treatment Selection: A Critical Approach”. Journal of Multidisciplinary Healthcare, 16, 1779-1791, 2023. DOI:10.2147/JMDH.S410301

[8] Rabiei R, Ayyoubzadeh SM, Sohrabei S, Esmaeili M, Atashi AR. Prediction of Breast Cancer using Machine Learning Approaches. J Biomed Phys Eng.;12(3):297-308, 2022. doi: 10.31661/jbpe.v0i0.2109-1403.

[9] Lu, Y., Yang, F., Tao, Y., & An, P. An XGBoost Machine Learning Based Model for Predicting Ki-67 Value ≥ 15% in T2NxMo Stage Primary Breast Cancer Receiving Neoadjuvant Chemotherapy Using Clinical Data and Delta-Radiomic Features on Ultrasound Images and Overall Survival Analysis: A 5-Year Postoperative Follow-Up Study. Technology in Cancer Research & Treatment, 23, 1-12,2024. DOI:/10.1177/15330338241265989

[10] Sharma, Saurabh and Shah, Neel and Singh, Rishiraj and Lokare, Reena, Machine Learning Approach for Predicting Breast Cancer Using Genomic Data.Proceedings of the 3rd International Conference on Advances in Science & Technology (ICAST) 2020, DOI:10.2139/ssrn.3571724

[11] La Moglia, Alan, and Khaled Mohamad Almustafa. "Breast cancer prediction using machine learning classification algorithms." *Innovation in Biomedical Engineering*, 2024. https://doi.org/10.1016/j.ibmed.2024.100193.

[12] Nguyen, Quynh Thi Nhu, et al. "Machine learning approaches for predicting 5‐year breast cancer survival: A multicenter study." *Cancer Science*, cilt 114, sayı 10, 2023, ss. 4063–4072. DOI:/10.1111/cas.15917

[13] Breast Cancer dataset in Kaggle https://www.kaggle.com/datasets/reihanenamdari/breast-cancer/data. Erişim Tarihi: Aralık 25, 2024

[14] Høst, H., & Lund, E. Age as a prognostic factor in breast cancer. Cancer, 57(11), 2217-2221,1986 DOI:10.1002/1097-0142(19860601)57:11<2217::AID-CNCR2820571124>3.0.CO;2-T

[15] Walker, B., Pollard, E., Howard, S. P., Jones, V. M., O'Connor, K. L., Durbin, E. B., Hull, P. C., Jones, S. R., Adegboyega, A., Wang, X., Owen, W. A. B., Szabunio, M. M., Williams, L. B., & Moore, J. X.. The Role of Race/Ethnicity on the Association between Neighborhood Deprivation and Breast Cancer Outcomes among Kentucky Breast Cancer Patients years 2010-2022. Cancer Epidemiology, Biomarkers & Prevention. Advance online publication.2024. DOI:10.1158/1055-9965.EPI-24-1139

[16] Cleator, S., Makris, A., & Powles, T. .Response to letter “Analysis of breast cancer survival by clinical response to neoadjuvant chemoendocrine therapy” by Bogaerts et al. Annals of Oncology, 17, 352-353,2006.

[17] Dağlar G., Yüksek Y.N, Gözalan A.U, Tütüncü T, Güngör Y, Kama N.A. "The prognostic value of histological grade in the outcome of patients with invasive breast cancer." Turkish Journal of Medical Sciences, cilt 40, sayı 1,, ss. 7–15, 2010.

[18] Henson, D. E., et al. "Relationship among outcome, stage of disease, and histologic grade for 22,616 cases of breast cancer. The basis for a prognostic index." Cancer, cilt 68, sayı 10,, ss. 2142-2149,1991.DOI: 10.1002/1097-0142(19911115)68:10<2142::aid-cncr2820681010>3.0.co;2-d

[19] Narod, S.A. Tumour Size Predicts Long-Term Survival among Women with Lymph Node-Positive Breast Cancer. Curr. Oncol. 19(5), 249-253, 2012. doi:10.3747/co.19.1043

[20] American Cancer Society. "Understanding a Breast Cancer Diagnosis.",2021.

[21] Koh J, Kim MJ. Introduction of a New Staging System of Breast Cancer for Radiologists: An Emphasis on the Prognostic Stage [Erratum]. Korean J Radiol. 2019;20(1):69-82.

[22] Belete, A.M., et al. "The Effect of Estrogen Receptor Status on Survival in Breast Cancer Patients in Ethiopia. Retrospective Cohort Study." *Breast Cancer - Targets and Therapy*, cilt 2022, 2022, ss. 153-161. doi:10.2147/BCTT.S365295.

[23] Li, Z., Wei, H., Li, S., Wu, P., & Mao, X. The Role of Progesterone Receptors in Breast Cancer. Drug Design, Development and Therapy, 16, 305–314, 2022 DOI:/10.2147/DDDT.S336643

[24] Australian Institute of Health and Welfare (AIHW) & National Breast Cancer Centre (NBCC). Breast cancer survival by size and nodal status in Australia. Cancer Series no. 39. Cat. no. CAN 34. Canberra: AIHW; 2007.

[25] Maalouf, M.. Logistic regression in data analysis: An overview. International Journal of Data Analysis Techniques and Strategies, 3(3), 281-299 , 2011.DOI:10.1504/IJDATS.2011.041335

[26] Awad, M., & Khanna, R. Support Vector Machines for Classification. In Efficient Learning Machines (pp. 39-66). Springer,2015. DOI: 10.1007/978-1-4302-5990-9_3

[27] Kulkarni, V. Y., & Sinha, D. K. Random Forest Classifiers: A Survey and Future Research Directions. International Journal of Advanced Computing, 36(1), 1144-1153,2013

[28] Zeravan Arif Ali, Ziyad H. Abduljabbar, Hanan A. Tahir, Amira Bibo Sallow, & Saman M. Almufti. Exploring the Power of eXtreme Gradient Boosting Algorithm in Machine Learning: a Review. Academic Journal of Nawroz University (AJNU), 12(2), 320-334, 2023. DOI: /10.25007/ajnu.v12n2a1612

[29] Pandas Development Team. .pandas-dev/pandas: Pandas. Zenodo. DOI:/10.5281/zenodo.3509134

[30] Waskom, M. L. Seaborn: statistical data visualization. Journal of Open Source Software, 6(60), 3021,2021. DOI:/10.21105/joss.03021

[31] Hunter, J. D. Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90-95,2007. DOI:/10.1109/MCSE.2007.55

[32] Scikit-learn developers. Scikit-learn: Machine Learning in Python. scikit-learn.org

[33] Chen, T., & Guestrin, C. (2016, August). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 785-794).

[34] Scikit-optimize contributors. Scikit-optimize: Sequential model-based optimization with a SciPy API. scikit-optimize.github.io

[35] Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861-874.

[36] Blair, D. C. (1979). [Review of Information Retrieval, by C.J. Van Rijsbergen]. Journal of the American Society for Information Science, 30(6), 374-375

[37] Chinchor, N. (1992). MUC-4 Evaluation Metrics. In Fourth Message Understanding Conference (MUC-4). Proceedings of a Conference Held in McLean, Virginia, June 16-18, 1992 (pp. 22-29).

[38] Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In International Joint Conference on Artificial Intelligence (Vol. 14, No. 2, pp. 1137-1145).

[39] Geron A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition, O'Reilly Media, Inc.

Downloads

Published

31-03-2025

Issue

Section

Research Articles

How to Cite

[1]
A. M. Tiryaki, A. C. Mermer, and B. Ugurlu, “Examining the Relationship of Breast Cancer Data With Survival Chance and Comparison of Algorithms on Breast Cancer Prediction”, J. Appl. Methods Electron. Comput., vol. 13, no. 1, pp. 19–26, Mar. 2025, doi: 10.58190/ijamec.2025.117.

Similar Articles

1-10 of 73

You may also start an advanced similarity search for this article.