Computational Efficiency Analysis of Customer Churn Prediction Using Spark and Caret Random Forest Classifier

Olayemi Olasehinde, Olanrewaju Victor Johnson, Johnson Tunde Fakoya


Today’s businesses are buying into technological advancement for productivity, profit maximization and better service delivery. Meanwhile technology as also brought about data coming in at an alarming rate in which businesses need to re-strategize how these data are being handled for them to retain ability to turn them to value. Traditional data mining techniques has proofed beyond doubt that data can be harnessed and turn into value for business growth. But the era of large scale data is posing a challenge of computational efficiency to this traditional approach. This paper therefore address this issue by under-studying a big data analytics tool-Spark with a data mining technique Caret. A churn Telecom dataset was used to analyse both the computational and performance metrics of the two approaches using their Random Forest (RF) classifier. The Classifier was trained with same the train set partitioning and tuning parameters. The result shows that Spark-RF is computational efficient with execution time of 50.25 secs compared to Caret-RF of 847.20 secs. Customer churning rate could be minimized if proper management attention and policy is paid to tenure (ShortTenure), Contract, InternetService and PaymentMethod as the variable importance plot and churn rate count mechanism confirm that. The Classifier accuracy was approximately 80% for both implementation.

Keywords: Spark, Caret, Random Forest, Churn, accuracy

Full Text: PDF
Download the IISTE publication guideline!

To list your conference here. Please contact the administrator of this platform.

Paper submission email:

ISSN (Paper)2224-5758 ISSN (Online)2224-896X

Please add our address "" into your email contact list.

This journal follows ISO 9001 management standard and licensed under a Creative Commons Attribution 3.0 License.

Copyright ©