Comparing Prediction Accuracy for Machine Learning and Other Classical Approaches in Gene Expression Data
Abstract
Microarray based gene expression profiling has been emerged as an efficient technique for cancer classification, as well as for diagnosis, prognosis, and treatment purposes. The classification of different tumor types is of great significance in cancer diagnosis and drug innovation. Using a large number of genes to classify samples based on a small number of microarrays remains a difficult problem. Feature selection techniques can be used to extract the marker genes which influence the classification accuracy effectively by eliminating the unwanted noisy and redundant genes. Quite a number of methods have been proposed in recent years with promising results. But there are still a lot of issues which need to be addressed and understood. Diagonal discriminant analysis, regularized discriminant analysis, support vector machines and k-nearest neighbor have been suggested as among the best methods for small sample size situations. In this paper, we have compared the performance of different discrimination methods for the classification of tumors based on gene expression data. The methods are applied to datasets from four recently published cancer gene expression studies. The performance of the classification technique has been evaluated for varying number of selected features in terms of misclassification rate using hold-out cross validation. Our study shows that KNN, RDA and SVM with linear kernel methods have lower misclassification rate than the other algorithms.
Keywords: microarray, gene expression, KNN, DLDA, RDA, SVM
To list your conference here. Please contact the administrator of this platform.
Paper submission email: CEIS@iiste.org
ISSN (Paper)2222-1727 ISSN (Online)2222-2863
Please add our address "contact@iiste.org" into your email contact list.
This journal follows ISO 9001 management standard and licensed under a Creative Commons Attribution 3.0 License.
Copyright © www.iiste.org