Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476571
Title: Stemming algorithm for different tenses to improve Persian dictionary
Authors: Arash Ghazvini (P56178)
Supervisor: Tengku Mohd Tengku Sembok, Prof. Dr.
Keywords: Universiti Kebangsaan Malaysia -- Dissertations
Dissertations, Academic -- Malaysia
Computational linguistics
Compute algorithms
Persian language-Data processing
Text processing (Computer science)
Information storage and retrieval systems
Issue Date: 6-Aug-2012
Description: Persian language is an Indo-European language that is known for its complexity due to the morphology structure. Persian language consists of a variety of tenses, while the focus of this research project is on Past subjunctive, past perfect, Continuous past, present perfect and past simple. A verb may appear in different forms in a sentence depending on the person, number, tense, mood and the occurrence of certain roots. Development of Persian stemming algorithm for the above-mentioned tenses and their impact on improvement of Persian dictionary is presented in this research project. The main problem is to design a Persian stemming algorithm to remove affixes from the verb in order to extract its root and convert it to infinitive of the verb. The objective of this research project is to design a Persian stemming algorithm for the mentioned tense verbs and to implement the proposed algorithm to get result for the meaning of infinitive in Persian dictionary. This dictionary is then tested to prove higher accuracy compared to the conventional Persian dictionaries which do not show results for affixed verbs since all verbs are stored as an infinitive in the database. Upon a search, the system will show a null result if the word is an affixed verb and will then process the word using Finite State Automata by the stemming process to remove the affixes and find the root of the verb; the root will then be converted into an infinitive and the search will occur again within the database. According to the findings, implemented Persian stemming algorithm based dictionary is fully accurate for the regular verbs in mentioned tenses which are formed from their infinitive based on general grammatical rules,Certification of Master's/Doctoral Thesis" is not available
Pages: 107
Call Number: QA76.9.A43G483 2012 tesis
Publisher: UKM, Bangi
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_120699+SOURCE1+SOURCE1.0.PDF
  Restricted Access
1.21 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.