Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476300
Title: Named Entity Recognition for Arabic crime documents using decision tree and naive bayes classifiers sequential combination
Authors: Suhad Abdulzahra Hachim Alshoukry (P63379)
Supervisor: Nazlia Omar, Prof. Dr.
Keywords: Named entity recognition
Multiple classifiers
Arabic named entities
Dissertations, Academic -- Malaysia
Issue Date: 16-Jun-2015
Description: Named Entity Recognition (NER) is a field that emerges as a significant approach in Natural Language Processing (NLP). It plays an essential role in multiple domains such as Information Extraction (IE), Sentiment Analysis (SA) and Question Answering (QA). It aims to extract names of people, locations, organizations, currencies and dates. There have been various research efforts in terms of Arabic NER. However, identifying Arabic named entities is a challenging task regarding to the complexity that lies in Arabic language. Such complexity can be represented by the non-existence of capitalization feature which facilitates the process of NER. Furthermore, there is a lack of lexical corpora that may include all the Arabic NEs. On other hand, most of the approaches that have been proposed for Arabic NER were based on handcrafted rule-based methods which can be laborious and time consuming. Therefore, this study aims to propose an Arabic NER based on a combination of classifiers and feature extraction for crime dataset. The dataset has been collected from online resources and undergone multiple pre-processing tasks including normalization, tokenization and stemming. Moreover, the feature extraction task which contains POS tagging, keyword trigger, definite articles and affixes has been performed. Hence, three classifiers, which are Naive Bayes (NB), Decision Tree (DT) and a sequential combination between them, will utilize these features as training set in order to classify the named entities. The experimental results have shown that the combination of DT and NB with feature extraction outperforms the individual classifier by achieving 94.19% of F-measure. This shows that the combination of multiple classifiers has a significant impact on the effectiveness of Arabic NER.,Master of Information Technology
Pages: 91
Publisher: UKM, Bangi
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_81665+SOURCE1+SOURCE1.0.PDF
  Restricted Access
251.22 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.