Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476354
Title: Arabic keyphrases extraction using a hybrid of statistical and machine learning method
Authors: Nidaa Ghalib Ali (P72250)
Supervisor: Nazlia Omar, Assoc. Prof. Dr.
Keywords: Keyphrase extraction
Arabic keyphrase
Noun phrase
Dissertations, Academic -- Malaysia
Issue Date: 8-Jun-2015
Description: Keyphrase is a short phrase of one to five words, representing the important concept in the articles. Keyphrases are useful for a variety of tasks such as text summarization, automatic indexing, classification and text mining. Accurate and effective method for keyphrase extraction is in great demand because of the huge number of documents available electronically. There are numerous methods developed for keyphrase extraction. Some of these methods such as statistical or machine learning approaches need a large corpus in order to extract keyphrase in a single document. Other methods use only statistical information of the word or phrase obtained from single document to rank it and to decide whether the word or phrase is a keyphrase. However, both ways suffer from low accuracy, especially when extracting keyphrase from single and small document. Many efforts are implemented for automatically extracting keyphrases for English documents and other languages. In contrast, little effort is made for Arabic documents. This study proposes a hybrid method which combines statistical and machine learning methods for Arabic keyphrase extraction. The statistical methods are Term Frequency (TF), First Occurrence in Text (FO), Sentence Count (SC), C-Value and TF-IDF. The machine learning methods are Linear Logistic Regression (LLR), Linear Discriminant Analysis (LDA) and Support Vector Machines (SVM). To perform the task, the noun phrases extracted, using part of speech (POS) patterns. Then, the statistical methods are applied and the results are used as features for classification. The purpose of the classification is to classify whether the selected noun phrase is a keyphrase or not. The test dataset contains 174 text files which were selected from multiple domains such as education, health and sports. The hybrid model which is based on SVM achieves the best result with 93.9% accuracy. The experiments have proved that the proposed model is viable for Arabic keyphrase extraction.,Certification of Master's/Doctoral Thesis" is not available
Pages: 89
Publisher: UKM, Bangi
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_82994+SOURCE1+SOURCE1.0.PDF
  Restricted Access
438.94 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.