Performance Evaluation of User-Behaviour Techniques of Web Spam Detection Models
Abstract
Web spam detection is a critical issue in today’s rapidly growing usage of the Internet and the World Wide Web. The upsurge of web spam has significantly deteriorated the Quality of Services (QoS) of the World Wide Web. The degeneration of the quality of search engine results has given rise to researches on the detection of spam pages efficiently and accurately. Existing user-behaviour oriented web spam detection models employed the content-based, link-based and other features of webpages for classification of web spams. These user-behaviour techniques either implemented singly or combined has achieved good detection performance. However, the effectiveness of these features in identifying Web spams correctly needs to be determined. In this study, predictive web spam detection models that employed all related user-behaviour features of webpages were developed and evaluated. The content, link, and obvious-based features datasets were collected from an online repository. Relevant features were extracted using an improved Filter-based method. Six user-behaviour related features extracted from the datasets were used to combine the datasets to generate all possible subset of feature space required, such that 7 new datasets were generated for the study. Multi-Layer Perceptron (MLP) approach was adopted as a classifier for each of the identified features. Python Machine Learning Library was used to simulate the models using percentage splits of 60/40%, 70/30% and 80/20% ratio for training/testing dataset and the performances were evaluated using accuracy, True Positive (TP) rate, False Positive (FP) rate and precision as metrics. The result showed that for the majority of the datasets the formulated models have shown an increase in efficiency after feature selection. The MLP classifier was able to achieve the best result of 66.0% accuracy when the link-based dataset was used with feature selection. The study concluded that link-based features of a user is sufficient and effective for the detection of web spams.
Keywords: Webspam, Content-based, Link-based, features, user-behaviour, evaluation
DOI: 10.7176/NCS/10-07
Publication date:December 31st 2019
To list your conference here. Please contact the administrator of this platform.
Paper submission email: NCS@iiste.org
ISSN (Paper)2224-610X ISSN (Online)2225-0603
Please add our address "contact@iiste.org" into your email contact list.
This journal follows ISO 9001 management standard and licensed under a Creative Commons Attribution 3.0 License.
Copyright © www.iiste.org