Please use this identifier to cite or link to this item:
https://ptsldigital.ukm.my/jspui/handle/123456789/476283
Title: | A rule-based named entity recognition for drug-related crime news documents |
Authors: | Khmael Rakm Rahem (P72249) |
Supervisor: | Nazlia Omar, Dr. |
Keywords: | Computational linguistics Crime entities Universiti Kebangsaan Malaysia -- Dissertations Dissertations, Academic -- Malaysia |
Issue Date: | 15-Jun-2015 |
Description: | Drug abuse pertains to the consumption of a particular substance that may induce adverse effects to a person. Every year, drug users have become increasingly notable. In international security studies, drug trafficking has become an important topic. In this regard, drug-related crimes are identified as an extremely significant challenge faced by any community. Several techniques for investigations in the crime domain have been implemented by many researchers. Most of these researchers, however, focus on extracting general crime entities. The number of studies that focus on the drug crime domain is relatively limited. The main aim of this work is to design a rule-based named entity recognition (NER) model for drug-related crime news documents. In this work a set of heuristic and grammatical rules are used to extract named entities such as types of drugs, amount of drugs, price of drugs, drug hiding methods, and the nationality of the suspect. The heuristic rules are designed based on the developed indicator words lists, which contain special words that are determined to introduce drug-related crime information, and developed gazetteers lists, which are dictionaries of drug and nationality names. In addition, a set of grammatical rules is established based on POS information and indicator words lists. The corpus used in this research is obtained from BERNAMA. Several experiments were conducted to evaluate the system. The combined approach of heuristic and grammatical rules achieves a good performance with an overall precision of 86%, recall of 87% and F1-measure of 87%. The results show that the ensemble of both heuristic and grammatical rules improves the extraction effectiveness in terms of macro-F1 for all entities.,Master of Information Technology |
Pages: | 103 |
Call Number: | P98.R334 2015 3 tesis |
Publisher: | UKM, Bangi |
Appears in Collections: | Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ukmvital_80597+SOURCE1+SOURCE1.0.PDF Restricted Access | 350.16 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.