Sentence Level N-Gram Context Feature in Real-Word Spelling Error Detection and Correction: Unsupervised Corpus Based Approach

Tsegay Mullu Kassa


Spell checking is the process of finding misspelled words and possibly correcting them. Most of the modern commercial spell checkers use a straightforward approach to finding misspellings, which considered a word is erroneous when it is not found in the dictionary. However, this approach is not able to check the correctness of words in their context and this is called real-word spelling error. To solve this issue, in the state-of-the-art researchers use context feature at fixed size n-gram (i.e. tri-gram) and this reduces the effectiveness of model due to limited feature. In this paper, we address the problem of this issue by adopting sentence level n-gram feature for real-word spelling error detection and correction. In this technique, all possible word n-grams are used to learn proposed model about properties of target language and this enhance its effectiveness. In this investigation, the only corpus required to training proposed model is unsupervised corpus (or raw text) and this enables the model flexible to be adoptable for any natural languages. But, for demonstration purpose we adopt under-resourced languages such as Amharic, Afaan Oromo and Tigrigna. The model has been evaluated in terms of Recall, Precision, F-measure and a comparison with literature was made (i.e. fixed n-gram context feature) to assess if the technique used performs as good.  The experimental result indicates proposed model with sentence level n-gram context feature achieves a better result: for real-word error detection and correction achieves an average F-measure of 90.03%, 85.95%, and 84.24% for Amharic, Afaan Oromo and Tigrigna respectively.

Keywords: Sentence level n-gram, real-word spelling error, spell checker, unsupervised corpus based spell checker

DOI: 10.7176/JIEA/10-4-02

Publication date:September 30th 2020

Full Text: PDF
Download the IISTE publication guideline!

To list your conference here. Please contact the administrator of this platform.

Paper submission email:
ISSN (Paper)2224-5782 ISSN (Online)2225-0506
Please add our address "" into your email contact list.
This journal follows ISO 9001 management standard and licensed under a Creative Commons Attribution 3.0 License.
Copyright ©