Please use this identifier to cite or link to this item:
https://ptsldigital.ukm.my/jspui/handle/123456789/475656
Title: | Word sense disambiguation based on Yarowsky algorithm for English Quranic information retrieval system |
Authors: | Omar Jamal Mohamed (P74473) |
Supervisor: | Sabrina Tiun, Dr. |
Keywords: | Word sense disambiguation Yarowsky algorithm Quran English language Information retrieval system Dissertations, Academic -- Malaysia |
Issue Date: | 9-Nov-2012 |
Description: | Word sense disambiguation (WSD) is the process of eliminating ambiguity that lies on some words by identifying the exact sense of a given word. In the natural languages, many words could yield multiple meaning based on the context. WSD aims to identify the most accurate sense for such cases. In particular, when translating one language to another, there would be a possibility to tackle an ambiguity among the translated words. Quran, which is the holy book for approximately billion Muslims, has been originally written in Arabic language. Apparently, when translating Quran to English language, several semantic issues have been caught by researchers. Such issues lies on the ambiguity of words such as ‘ليلا ونهارا ’ and ‘يوم الحساب ’, which are translated into ‘day and night’ and ‘judgment day’. Such ambiguity has to be eliminated by determining the exact sense of the translated word. Several research efforts have been intended to disambiguate the sense of translated Quran. However, the process of identifying an appropriate method for WSD in translated Quran is still a challenging task. This is due to the complexity of Arabic morphology. Hence, this study aims to propose an adaption of Yarowsky algorithm as a WSD method for Quranic translation. In addition, this study aims to develop an IR prototype based on the proposed adaption method in order to evaluate the method based on the retrieval effectiveness. The dataset was used in this study is a collection of Quranic content. Several pre-processing tasks have been performed in order to eliminate the irrelevant data such as stop-words, numbers and punctuation. Sequentially, two lists of senses for each ambiguity word are created with their context. This would be performed in order to let the Yarowsky algorithm train on such example set. A decision list was then constructed using the Yarowsky algorithm, which depicts the labelling sense of each word. The evaluation method that has been used in this study is the three IR evaluation metrics; Precision, Recall and F-measure. The experimental results have shown a 77% of F-measure. Such result seems to be weak when compared to the results of Yarwosky that have been applied in open domain. This is due to the lack of examples that could be extracted from Quran for both senses. Meanwhile, such result seems to be competitive in WSD of Quranic translation. Finally, it can also be concluded that WSD has a significant impact on the IR system based on the promising outcome gained via the simple prototype IR implementation.,Certification of Master's/Doctoral Thesis" is not available |
Pages: | 66 |
Publisher: | UKM, Bangi |
Appears in Collections: | Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.