Enhancing Tumor Classification Through Machine Learning Algorithms for Breast Cancer Diagnosis

Lawrence Agbota; Edmund Agyemang; Priscilla Kissi-Appiah; Lateef Moshood; Akua Osei-Nkwantabisa; Vincent Agbenyeavu; Abraham Nsiah; Augustina Adjei

Enhancing Tumor Classification Through Machine Learning Algorithms for Breast Cancer Diagnosis

Lawrence Agbota, Edmund Agyemang, Priscilla Kissi-Appiah, Lateef Moshood, Akua Osei-Nkwantabisa, Vincent Agbenyeavu, Abraham Nsiah, Augustina Adjei

Abstract

In cancer diagnosis, machine learning helps improve cancer detection by providing doctors with a second perspective and allowing for faster and more accurate determination and decisions. Numerous studies have used both classic machine learning approaches and deep learning to address cancer classification. In this study, we examine the efficacy of five commonly used machine learning algorithms; both traditional and deep learning models namely, Logistic Regression, Support Vector Machines (SVM), Random Forest (RF), Decision Tree and Deep Neural Networks (DNN). We analyze their ability to properly classify tumors as Benign or Malignant using the Wisconsin breast cancer dataset (WBCD). Random Forest classifier was employed to reduce model complexity, successfully narrowing down the number of features to 17 through cross-validation and achieving a validation score of 96.84%. Subsequently, a grid search was used to determine the maximum tree depth, resulting in five. The Synthetic Minority Oversampling Technique (SMOTE) was employed as a resampling tool to balance the Benign and Malignant categories adequately solving the class imbalance problem encountered in classification problems. After evaluating the overall performance for the unbalanced data, Random Forest emerged as the best classification model with an accuracy of 98.20%, followed by Logistic Regression with an accuracy of 97.40%. However, after applying SMOTE, both Random Forest and Logistic Regression emerged as the best models both with an accuracy of 94.70%. Both Random Forest and Logistic Regression models had an outstanding performance with an area under the curve (AUC) value of 0.997 and 0.994 respectively.

Keywords: Breast Cancer, Random Forest, Logistic Regression, Support Vector Machines, Deep Neural Networks, Synthetic Minority Oversampling Technique.

DOI: 10.7176/CEIS/15-1-08

Publication date: June 30^th 2024

Full Text: PDF

Download the IISTE publication guideline!

To list your conference here. Please contact the administrator of this platform.

Paper submission email: CEIS@iiste.org

ISSN (Paper)2222-1727 ISSN (Online)2222-2863

Please add our address "contact@iiste.org" into your email contact list.

This journal follows ISO 9001 management standard and licensed under a Creative Commons Attribution 3.0 License.

Computer Engineering and Intelligent Systems

Enhancing Tumor Classification Through Machine Learning Algorithms for Breast Cancer Diagnosis

Abstract