Please use this identifier to cite or link to this item:
https://ptsldigital.ukm.my/jspui/handle/123456789/476352
Title: | Arabic nested noun compounds extraction based on named entity pattern and association measures |
Authors: | Maryam Yaseen Abdullah Al-Mashhadani (P72226) |
Supervisor: | Nazlia Omar, Dr. |
Keywords: | Nested noun compound Named entity pattern Association measures Arabic noun Dissertations, Academic -- Malaysia |
Issue Date: | 22-Jun-2015 |
Description: | Arabic noun compound are phrases which consist of two or more nouns. The key characteristic behind noun compounds lies on its frequent occurrences within the text. Therefore, extracting these noun compounds is essential for several domains of research such as Information Retrieval, Sentiment Analysis and Question Answering. Many research efforts have been proposed in terms of extracting Arabic noun compounds by using linguistic approaches, statistical measures or a combination of both. Most of the existing methods have concentrated on the extraction of bi-gram or tri-gram noun compound. However, extracting 4-gram or 5-gram noun compound or so called nested noun compound is quite challenging due to the difficulties in selecting an appropriate method with effective results. Multiple features have a significant impact on the effectiveness of extracting nested noun compound such as contextual information, unit-hood and term-hood. Thus, there is still room for improvement in terms of enhancing the effectiveness of nested noun compound extraction. Therefore, this study proposed a combination of named entity pattern and association measures in order to enhance the extraction of nested noun compound. Several preprocessing steps are involved including transformation, normalization, tokenization, and stemming. In addition, the new linguistic pattern for named entities has been utilized by using a list of Arabic named entities in order to enhance the linguistic approach in terms of nested noun compound recognition. The proposed association measures consist of NC-value, NTC-value and NLC-value. The experimental results have demonstrated that NLC-value has outperformed NTC-value and NC-value regarding to nested noun compound extraction by achieving 83%, 76%, 72% and 65% for bigram, trigram, 4-gram and 5-gram respectively.,Certification of Master's/Doctoral Thesis" is not available |
Pages: | 97 |
Publisher: | UKM, Bangi |
Appears in Collections: | Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ukmvital_82215+SOURCE1+SOURCE1.0.PDF Restricted Access | 518.19 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.