Text Document Categorization using Enhanced Sentence Vector Space Model and Bi-Gram Text Representation Model Based on Novel Fusion Techniques

Abdisa Demissie Amensisa

Abstract


The text document classification tasks passes under the Automatic Classification (also known as pattern Recognition) problem in Machine Learning and Text Mining. It is necessary to classify large text documents into specific classes, to make clear and search simply. Classified data are easy for users to browse. The important issue in usual text document classification is representing the features for classification of an unknown document into predefined categories. The Combination of classifiers is fused together to increase the accuracy classification result in a single text document. This paper states a novel fusion approach to classify text documents by considering ES-VSM and Bigram representation models for text documents. ES-VSM: Enhanced Sentence –Vector Space Model is an advanced feature of the sentence based vector space model and extension to simple VSM will be considered for the constructive representation of text documents. The main objective of the study is to boost the accuracy of text classification by accounting for the features extracted from the text document. The proposed system concatenates two different representation models of the text documents for designing two different classifiers and feeds them as one input to the classifier. An enhanced S-VSM and interval-valued representation model are considered for the effective representation of text documents. A word level neural network Bigram representation of text documents is proposed for effective capturing of semantic information present in the text data. A Proposed approach improves the overall accuracy of text document classification to a significant extent.

Keywords: ES-VSM; Fusion, Text Document Classification, Neural Network, Text Representation, Machine learning.

DOI: 10.7176/NMMC/93-03

Publication date:September 30th 2020

 


Full Text: PDF
Download the IISTE publication guideline!

To list your conference here. Please contact the administrator of this platform.

Paper submission email: NMMC@iiste.org

ISSN (Paper)2224-3267 ISSN (Online)2224-3275

Please add our address "contact@iiste.org" into your email contact list.

This journal follows ISO 9001 management standard and licensed under a Creative Commons Attribution 3.0 License.

Copyright © www.iiste.org