Social Media Fake Account Detection for Afan Oromo Language using Machine Learning

A social networking service serves as a platform to build social networks or social relations among people who, share interests, activities, backgrounds, or real life connections. A social network service is generally offered to participants who registers to this site with their unique representation (often a profile) and one’s social links. Most social network services are web-based and provide means for users to interact over the Internet. (M. Smruthi, , February 2019).Online social networking sites became an important means in our daily life. Millions of users register and share personal information with others. Because of the fast expansion of social networks, public may exploit them for unprincipled and illegitimate activities. As a result of this, privacy threats and disclosing personal information have become the most important issues to the users of social networking sites. The intent of creating fake profiles have become an adversary effect and difficult to detect such identities/malicious content without appropriate research. The current research that have been developed for detecting malicious content, primarily considered the characteristics of user profile. Most of the existing techniques lack comprehensive evaluation. In this work we propose new model using machine learning and NLP (Natural Language Processing) techniques to enhance the accuracy rate in detecting the fake identities in online social networks. We would like to apply this approach to Facebook by extracting the features like Time, date of publication, language, and geo position. (Srinivas


Introduction 1.Background
Oromo is a Cushitic language spoken by about 50 million people in Ethiopia, Kenya, Somalia and Egypt, and it is the third largest language in Africa. The Oromo people are the largest ethnic group in Ethiopia and account for more than 40% of the population. They can be found all over Ethiopia, and particularly in Wollega, Shoa, Illubabour, Jimma, Arsi, Bale, Hararghe, Wollo, Borana and the southwestern part of Gojjam.The Oromo language, also known as Afaan Oromo by using this language peoples communicate through social media . Since Social media as a means of communication can disseminate information quickly and widely, making it not only as a means of friendship and various information, but also used as a means of trading, dissemination of government policies, political campaigns and religious preaching (Guta, 11 June 2019) Online social networks (OSNs), such as Facebook, Twitter, RenRen, LinkedIn, Google+, and Tuenti, have become increasingly popular over last few years. People use OSNs to keep in touch with each others, share news, organize events, and even run their own e-business. (Sarah Khaled, 2018) Now a days use of internet is increased. with the use of internet, the term social media networks become popular. Everyone who use internet is well-known about social media networks. Social media network is collection of many social networking websites. Social networking is platform, where a user of social network can express their point of view towards anything. (Sachin Ingle1, 2019) Online Social Networks are most popular through which information can be exchanged through the world. Social Networks being the center of attraction for many applications and they incorporate a range of new information and communication tools to the user community. A Social Network is best viewed as a graphical structure with nodes and edges depicting the users and their interaction activities respectively. The nodes and edges in a Social Network graph can be labeled or unlabeled depending upon the structure of the network being used. Because of the great reputation of social intelligence, social networking sites such as Facebook, YouTube, Twitter, LinkedIn, Pinterest, Google +, Tumblr and Instagram have become the preferred means of communication and information sharing tools amongst a diverse set of users including individuals and companies. The users of the social networks will play a vital role and they are completely responsible for the contents being exchanged in the networks. Users share information by interesting websites, videos and files. People share confidential data through the set-up of great faith and others have the same faith in the data shared. The rush of online social networks' reputation and the accessibility of huge amount of data enable them simple objective to the opponents. These objectives mainly include stealing individual user's details without seeking any permission. One of the main problems in social media is the spammers as they can use their accounts for different targets. One of these targets is spreading rumors which may affect a determined business or even the society in a large scale. According to the importance of the effect of social media to the society, in this research, (Buket Ersahin1, 2017)  2 fake profile accounts from Twitter online social network to prevent the spreading of fake news, advertisements and fake followers.
The attempt for the encroachment of a legitimate user profile through fake identities is considered as the mostly practiced technique. As the expansion of greater security in online social networking sites it turned to be very hard to encroach into online social networks. As a result of this, antagonists create false identities to gain access to other profiles. (Srinivas Rao Pulluri1, A Comprehensive Model for Detecting Fake Profiles in Online Social Networks, 2017) In 2019, Facebook took down on average close to 2 billion fake accounts per quarter. Fraudsters use these fake accounts to spread spam, phishing links, or malware. It's a lucrative business that can be devastating for any innocent users that it snares. Facebook is now releasing details about the machine-learning system it uses to tackle this challenge. The tech giant distinguishes between two types of fake accounts. First, there are "user-misclassified accounts," personal profiles for businesses or pets that are meant to be Pages. These are relatively straightforward to deal with-they just get converted to Pages. "Violating accounts," on the other hand, are more serious. These are personal profiles that engage in scamming and spamming or otherwise violate the platform's terms of service. Violating accounts need to be remove as quickly as possible without casting net and snagging real accounts as well. (Hao, 2020) The main objective of any Social Networking Site is to target different user segments. The best thing about Facebook is the ability to find old friends, but YouTube provides a platform for people to connect, inform, and inspire others across the world by video sharing.

Principal Component Analysis
PCA is applied to reduce the dimensionality of the dataset. In this proposed work PCA plays an important position by giving the great endorsement to make decisions on which profile features to be used. Principal Component Analysis (PCA) is the simplest and robust dimensionality reduction technique ever seen. In this paper we have selected a mathematical model called variance maximization for drawing PCA results. According to this model "first principal component has the highest projection variance which is the direction in feature space along. And the second component defines the direction which has highest projection variance among all the other orthogonal direction to the first component". While calculating the score on profile features both false and real accounts to be measured (Srinivas Rao Pulluri1, A Comprehensive Model for Detecting Fake Profiles in Online Social Networks, 2017)

Related Work
Different researches have been presented to detect fake accounts with different approaches in this study, they have presented a classification method for detecting the fake accounts on Twitter. They have preprocessed the dataset using a supervised discretization technique named Entropy Minimization Discretization (EMD) on numerical features and analyzed the results of the Naïve Bayes algorithm. (Buket Ersahin1, 2017). Inspired by the importance of detecting fake accounts, researchers have recently started to investigate efficient fake accounts detection mechanisms. Most detection mechanisms attempt to predict and classify user accounts as real or fake (malicious, Sybil) by analyzing user level activities or graph-level structures. There are several data mining methodologies [4] and approaches that help detecting fake accounts that are described in the following sub-sections. (Sachin Ingle1, 2019) In this section, we woud demonstrate some of the works that have been presented in this area. Reference (M. Smruthi, , February 2019)has reached an accuracy 80% the performance were evaluated using the supervised machine learning algorithms and the highest accuracy were obtained and the maximum percentage of skin exposed were calculated from the images collected from the fake accounts. However, in my research (Kunal Goswami, 2017)Neural network algorithm is used to evaluate the proposed feature set and compare it against the state-of-the-art feature sets in detecting fraud. The feature set considers the user's social interaction on the Yelp platform to determine if the user is committing fraud. The neural network algorithm helps in comparing the feature set with other feature sets used to detect fraud. Any attempt to find the characteristics that lead to fraud has a prerequisite to be good enough to detect fraud as well. However, (Pregueiro) OSNs suffer from abuse in the form of the creation of fake accounts, which do not correspond to real humans. Fakes can introduce spam, manipulate online rating, or exploit knowledge extracted from the network. OSN operators currently expend significant resources to detect, manually verify, and shut down fake accounts.
(K Subba Reddy, 2017)Information is spread across social networks quickly. However at the same time social media networks become susceptible to different types of unwanted spammer actions. As part of their work, they propose a mechanism to detect spammers in facebook social network. Their work is based on number of features at content level and user level. Use (S. P. Maniraj, 2019)classification algorithms in machine learning to detect fake accounts. The process of finding a fake account mainly depends on factors such as engagement rate and artificial activity. and Decision trees are made seeing the success rate i.e., in their case taking the value which contains more fake accounts. Following Table show

Proposed Algorithm
This section presents the proposed methods of predicting fake twitter accounts. Proposed methods are divided into two main parts: feature reduction, and data classification aiming to develop a new technique that achieves a high classification accuracy results in a reasonable time. (Sachin Ingle1, 2019) 1.4.1. Data Pre-Processing The "MIB" dataset feature vectors are presented in two types:  Categorical features e.g. language, profile-side bare color, tweets.  Numerical features e.g. friends-count, followers count, default-profile, profile-use-background image.
Feature Reduction In feature reduction phase, four data reduction techniques were applied to guide the process of deciding the most promising feature patterns to be used in the mining process (Sachin Ingle1,

Performance and Evaluation
In this section the results and findings of this work would be explained and evaluated. Initially, three different classification algorithms have been trained and tested using divergent four feature sets. Neural network classification algorithm and SVM classification algorithm were used as the principles mining techniques in many social network researches, so they have been applied on the feature sets mentioned in Feature Reduction and compared with the proposed SVN-NN algorithm. (Sachin Ingle1, 2019) 1.5.1. Neural Networks Currently, there are many neural network algorithms used to train models and predict results based on the previously trained models. Feed-forward back propagation algorithm has been selected as the base algorithm. The predicted results have been compared with the actual legitimate values (i.e. whether the account is real or fake), and the prediction accuracy was calculated as follows: As mentioned above the feature subsets with highest accuracy was highlighted, as following: spearmans rank-order Correlation best pattern was (1000001000110110), Multiple linear Regression best pattern was (0110110111001111), Wrapper-SVM best pattern was (110111111011111). (Sachin Ingle1, 2019) Most of the existing techniques for detecting malicious content of Facebook lack inclusive evaluation. The main objective of (Srinivas Rao Pulluri1, A Comprehensive Model for Detecting Fake Profiles in Online Social Networks, 2017) research work is to increase the accuracy rate in identifying the fake profiles/malicious content in online social networking sites as compared to existing research. We would like to apply the proposed approach on Facebook.

Application Result
User activities related to likes, comments, and to some extent, shares on Facebook, contribute the maximum to detection of fake accounts. Therefore, this work represents a significant step towards a profile-feature based detection of fake accounts on Facebook. Many fake users were classified as real, possibly because fake accounts mimic real user behavior to elude detection mechanisms.
Detecting and blocking fake account is important for online communities for maintaining safe environments for its real users and as a responsibility considering their impact on society. Fake account detection system will help for reduction of time, fraud and human effort to identify privacy attack on social media. The system will help to filter any fake user that makes peoples of the local population indirectly or directly participate in the violent activities across the different region of the country.

Conclusion
Fake accounts are being continuously evolving in online social media. Therefore, it is very essential to invent new methods to detect Fake profiles in online social media. So the real time Facebook dataset were required to detect New Media and Mass Communication www.iiste.org ISSN 2224-3267 (Paper) ISSN 2224-3275 (Online) Vol.90, 2020 6 the fake accounts and vulgar images in Facebook. For the detection of Fake accounts the user timeline information namely post-count, comment-count, etc. were used and for the vulgar image detection the images from the user time line and the display picture of the users were taken out. The performance were evaluated using the supervised machine learning algorithms and the highest 80%accuracy were obtained and the maximum percentage of skin exposed were calculated from the images collected from the fake accounts. For the future scope, a more complex algorithm for the skin detection can be implemented. The natural language processing techniques can be implemented to detect fake accounts more accurately. The new features will be certainly introduced by the Facebook, and these features can also be included while analyzing the fake accounts. (M. Smruthi, , February 2019)