Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476319
Title: A hybrid method for measuring sentence similitary in Arabic
Authors: Alisawi Hanan Nuri Abofaied A. (P50125)
Supervisor: Nazlia OmaR, Dr.
Keywords: Dissertations, Academic -- Malaysia
Arabic sentence similarity
Semantic and syntactic information
Issue Date: 15-Mar-2012
Description: Short text sentences similarity plays an important role in text-related research such as information retrieval and text mining. Although there are related studies on determining text similarity, not many focus on short texts due to data sparseness and the lack of context especially in Arabic. Moreover, Arabic is one of the most complex languages in terms of morphology. This research focuses on proposing a hybrid method for measuring Arabic sentence similarity in the application of answer grading system. The proposed method is based on the combination of semantic and syntactic information similarity. In this research, Bilingual Evaluation Understudy algorithm (BLEU) is adapted for Arabic sentences similarity, utilizing its advantages and overcoming its limitation by hybridizing it with other semantic and syntactic similarities. A version of the BLEU is applied for measuring the semantic similarity of Arabic sentences and the suitable parameter settings are investigated for its n-gram components. In addition, the algorithm is enhanced by making use of extensive semantic knowledge from sources like WordNet. The obtained score is combined with the forward and backward syntactic matching algorithms in order to obtain the overall sentences similarity score. Scores are assigned and compared with scores awarded by a human. For testing the effectiveness of the method, 20 Arabic questions are used as a test dataset. The experiments are performed individually on root, stemmed and exact words. The experimental results show that the best performance measure is the root with the combination of the different sizes of n-grams (i.e. uni-gram, bi-gram and tri-gram modified precision) and the uni-gram modified precision. The proposed method in terms of measuring semantic similarity based on the BLEU algorithm with the uni-gram modified precision, realizes the best-performing variation, yielding Pearson’s and Spearman’s correlations of 0.70 and 0.65, respectively when applied on the rooted texts. The results also show that the system provides scores slightly close to the human derived scores,Master
Pages: 73
Call Number: QA76.9.N38 A566 2012
Publisher: UKM, Bangi
URI: https://ptsldigital.ukm.my/jspui/handle/123456789/476319
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_81879+SOURCE1+SOURCE1.0.PDF
  Restricted Access
1.95 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.