Finding Similarities between Structured Documents as a Crucial Stage for Generic Structured Document Classifier

Hamam Mokayed, Azlinah Hj. Mohamed

Abstract


One of the addressed problems of classifying structured documents is the definition of a similarity measure that is applicable in real situations, where query documents are allowed to differ from the database templates. Furthermore, this approach might have rotated [1], noise corrupted [2], or manually edited form and documents as test sets using different schemes, making direct comparison crucial issue [3]. Another problem is huge amount of forms could be written in different languages, for example here in Malaysia forms could be written in Malay, Chinese, English, etc languages. In that case text recognition (like OCR) could not be applied in order to classify the requested documents taking into consideration that OCR is considered more easier and accurate rather than the layout  detection.

Keywords: Feature Extraction, Document processing, Document Classification.


Full Text: PDF
Download the IISTE publication guideline!

To list your conference here. Please contact the administrator of this platform.

Paper submission email: CEIS@iiste.org

ISSN (Paper)2222-1727 ISSN (Online)2222-2863

Please add our address "contact@iiste.org" into your email contact list.

This journal follows ISO 9001 management standard and licensed under a Creative Commons Attribution 3.0 License.

Copyright © www.iiste.org