Probabilistic Reference to Suspect or Victim in Nationality Extraction from Unstructured Crime News Documents

Mohammad Darwich, Masnizah Mohd

Abstract


There is valuable information in unstructured crime news documents which crime analysts must manually search for. To solve this issue, several information extraction models have been implemented, all of which are capable of being enhanced. This gap has created the motivation to propose an enhanced information extraction model that uses named entity recognition to extract the nationality from crime news documents and coreference resolution to associate the nationality to either the suspect or the victim. After the proposed model extracts the nationality, it references it to the suspect or victim by looking up all of the victim related keywords and the suspect related keywords within the text, and their corresponding distances from the position of the nationality keyword. Based on their total distances, a probability score algorithm decides whether the nationality is more likely to belong to either the victim or the suspect. Two experiments were conducted to evaluate the nationality extractor component and the reference identification component used by the model. The former experiment had achieved 90%, 94%, and 91% for precision, recall, and F-measure values respectively. The latter experiment had achieved 65%, 68%, and 66% for precision, recall, and F-measure respectively. The model had achieved promising results after evaluation.

Keywords: information extraction, named entity recognition, coreference resolution, crime domain


Full Text: PDF
Download the IISTE publication guideline!

To list your conference here. Please contact the administrator of this platform.

Paper submission email: IKM@iiste.org

ISSN (Paper)2224-5758 ISSN (Online)2224-896X

Please add our address "contact@iiste.org" into your email contact list.

This journal follows ISO 9001 management standard and licensed under a Creative Commons Attribution 3.0 License.

Copyright © www.iiste.org