Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476497
Title: Hybrid approach for stemming Arabic words
Authors: Yasir Said Hamood Al-Hanini (P52635)
Supervisor: Mohd Juzaiddin Ab Aziz, Dr.
Keywords: Natural language processing (Computer science)
Automatic indexing
Text processing (Computer science)
Information storage and retrieval systems -- Language
Arabic language -- Data processing
Universiti Kebangsaan Malaysia -- Dissertations
Dissertations, Academic -- Malaysia
Issue Date: 16-Jun-2011
Description: The stemming of Arabic words has always been a challenge for natural language processing because of its richness and complexity. In general, the word stemming is one of the most important factors that affect the performance of natural language processing applications such as, machine translation, information retrieval, question answering, word sense disambiguation, and text summarization. The morphological analyzer stemming is the process of reducing words to their roots or stems. Most current stemmers do not handle the multi-word expressions and Arabic names. The main goal of this thesis is to enhance the word stemming for extracting the root and stem of Arabic words. The hybridization of the word stemming would be implemented by combining the stemming and dictionary-based information. The approach in this thesis consists of three processes which are pre-processing, light stemming, and dictionary-based stemming. The pre-processing step involves the normalization of the Arabic word such as, remove diacritics, resolve the orthographic variation in Arabic writing, and the Shadda. Then, the stemming is to extract the possible roots of word without using any dictionary. It includes the segmentation of the word into word's prefixes, suffixes and infix to produce the stem or root. It also involves the verifying of the extracted root by using the Arabic patterns. The last step is the dictionary-based stemming that applies for the extracted root that cannot be matched with any Arabic patterns. Finally, the evaluation method is used to assess the hybrid Arabic stemming by applying the method for many Arabic documents (corpus). In the experiment of this work, the average of accuracy in hybrid stemmer on the corpus is 96.29%. The accuracy values of hybrid stemmer had been increased in all documents in the corpus when they compared with the accuracy values in light stemmer (85.5%) and dictionary-based stemmer (88.63%). This improvement of accuracy values is due to the solving the problems in light stemmer and dictionary based stemmer.,Tesis ini tidak ada “Perakuan Tesis Sarjana/Doktor Falsafah”,Master Information Technology
Pages: 107
Call Number: QA76.9.N38H338 2011 3 tesis
Publisher: UKM, Bangi
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_114628+SOURCE1+SOURCE1.0.PDF
  Restricted Access
11.01 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.