Arabic keyphrases extraction using a hybrid of statistical and machine learning method

Nidaa Ghalib Ali

Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476354

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Nazlia Omar, Assoc. Prof. Dr.
dc.contributor.author	Nidaa Ghalib Ali
dc.contributor.other	P72250	-
dc.date.accessioned	2023-10-06T09:16:56Z	-
dc.date.available	2023-10-06T09:16:56Z	-
dc.date.issued	2015-06-08
dc.identifier.other	ukmvital:82994
dc.identifier.other	P72250	-
dc.identifier.uri	https://ptsldigital.ukm.my/jspui/handle/123456789/476354	-
dc.description	Keyphrase is a short phrase of one to five words, representing the important concept in the articles. Keyphrases are useful for a variety of tasks such as text summarization, automatic indexing, classification and text mining. Accurate and effective method for keyphrase extraction is in great demand because of the huge number of documents available electronically. There are numerous methods developed for keyphrase extraction. Some of these methods such as statistical or machine learning approaches need a large corpus in order to extract keyphrase in a single document. Other methods use only statistical information of the word or phrase obtained from single document to rank it and to decide whether the word or phrase is a keyphrase. However, both ways suffer from low accuracy, especially when extracting keyphrase from single and small document. Many efforts are implemented for automatically extracting keyphrases for English documents and other languages. In contrast, little effort is made for Arabic documents. This study proposes a hybrid method which combines statistical and machine learning methods for Arabic keyphrase extraction. The statistical methods are Term Frequency (TF), First Occurrence in Text (FO), Sentence Count (SC), C-Value and TF-IDF. The machine learning methods are Linear Logistic Regression (LLR), Linear Discriminant Analysis (LDA) and Support Vector Machines (SVM). To perform the task, the noun phrases extracted, using part of speech (POS) patterns. Then, the statistical methods are applied and the results are used as features for classification. The purpose of the classification is to classify whether the selected noun phrase is a keyphrase or not. The test dataset contains 174 text files which were selected from multiple domains such as education, health and sports. The hybrid model which is based on SVM achieves the best result with 93.9% accuracy. The experiments have proved that the proposed model is viable for Arabic keyphrase extraction.,Certification of Master's/Doctoral Thesis" is not available
dc.language.iso	eng
dc.publisher	UKM, Bangi
dc.relation	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat
dc.subject	Keyphrase extraction
dc.subject	Arabic keyphrase
dc.subject	Noun phrase
dc.subject	Dissertations, Academic -- Malaysia
dc.title	Arabic keyphrases extraction using a hybrid of statistical and machine learning method
dc.type	theses
dc.rights.holder	UKM	-
dc.format.pages	89
Appears in Collections:	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:

File	Description	Size	Format
ukmvital_82994+SOURCE1+SOURCE1.0.PDF Restricted Access		438.94 kB	Adobe PDF	View/Open

Show simple item record Recommend this item