Please use this identifier to cite or link to this item:
https://ptsldigital.ukm.my/jspui/handle/123456789/476319
Title: | A hybrid method for measuring sentence similitary in Arabic |
Authors: | Alisawi Hanan Nuri Abofaied A. (P50125) |
Supervisor: | Nazlia OmaR, Dr. |
Keywords: | Dissertations, Academic -- Malaysia Arabic sentence similarity Semantic and syntactic information |
Issue Date: | 15-Mar-2012 |
Description: | Short text sentences similarity plays an important role in text-related research such as information retrieval and text mining. Although there are related studies on determining text similarity, not many focus on short texts due to data sparseness and the lack of context especially in Arabic. Moreover, Arabic is one of the most complex languages in terms of morphology. This research focuses on proposing a hybrid method for measuring Arabic sentence similarity in the application of answer grading system. The proposed method is based on the combination of semantic and syntactic information similarity. In this research, Bilingual Evaluation Understudy algorithm (BLEU) is adapted for Arabic sentences similarity, utilizing its advantages and overcoming its limitation by hybridizing it with other semantic and syntactic similarities. A version of the BLEU is applied for measuring the semantic similarity of Arabic sentences and the suitable parameter settings are investigated for its n-gram components. In addition, the algorithm is enhanced by making use of extensive semantic knowledge from sources like WordNet. The obtained score is combined with the forward and backward syntactic matching algorithms in order to obtain the overall sentences similarity score. Scores are assigned and compared with scores awarded by a human. For testing the effectiveness of the method, 20 Arabic questions are used as a test dataset. The experiments are performed individually on root, stemmed and exact words. The experimental results show that the best performance measure is the root with the combination of the different sizes of n-grams (i.e. uni-gram, bi-gram and tri-gram modified precision) and the uni-gram modified precision. The proposed method in terms of measuring semantic similarity based on the BLEU algorithm with the uni-gram modified precision, realizes the best-performing variation, yielding Pearson’s and Spearman’s correlations of 0.70 and 0.65, respectively when applied on the rooted texts. The results also show that the system provides scores slightly close to the human derived scores,Master |
Pages: | 73 |
Call Number: | QA76.9.N38 A566 2012 |
Publisher: | UKM, Bangi |
URI: | https://ptsldigital.ukm.my/jspui/handle/123456789/476319 |
Appears in Collections: | Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ukmvital_81879+SOURCE1+SOURCE1.0.PDF Restricted Access | 1.95 MB | Adobe PDF | ![]() View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.