Automatic Arabic text summarization using clustering and keyphrase extraction

Hamzah Noori Fejer

Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476356

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Nazlia Omar, Assoc. Prof. Dr.
dc.contributor.author	Hamzah Noori Fejer
dc.contributor.other	P72243	-
dc.date.accessioned	2023-10-06T09:16:57Z	-
dc.date.available	2023-10-06T09:16:57Z	-
dc.date.issued	2015-02-20
dc.identifier.other	ukmvital:82996
dc.identifier.other	P72243	-
dc.identifier.uri	https://ptsldigital.ukm.my/jspui/handle/123456789/476356	-
dc.description	Automatic text summarization has become an important due to the rapid growth of information texts since it is very difficult for human beings to manually summarize large documents of texts . A full understanding of the document is essential to form an ideal summary. However, achieving full understanding is either difficult or impossible for computers. Therefore, selecting important sentences from the original text and presenting these sentences as a summary present the most common techniques in automated text summarization. Arabic natural language processing lacks tools and resources which are essential to advance research in Arabic text summarization. In addition to the limited resources, there has been little attention and research done in this field. Arabic text summarization still suffer from low accuracy as they use simple summarization techniques such as a single level summarization model. The aim of this research is to improve Arabic text summarization by using clustering and keyphrase extraction. This study proposes a combined clustering method (partitioning and hierarchical) to group Arabic documents into several clusters. Keyphrase extraction module is applied to extract important keyphrases from each cluster, which helps to identify the most important sentences and find similar sentences based on several similarity algorithms. These algorithms are applied to extract one sentence from a group of similar sentences while ignoring the other similar sentences (i.e., sentences that have a greater similarity than the predefined threshold). This model is designed to improve the quality in Arabic text summarization. The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics were used for the evaluation. For the summarization dataset two corpora were used. The first one is called Essex Arabic Summaries Corpus (EASC) which was used for single document .The second corpus, DUC2002 was used for multi-document summarization. This model achieved an accuracy of 63.3% for single-document and 43.4% for multi-document summarization .The experiments have proved that the proposed model gives better performance in comparison to other systems.,Certification of Master's/Doctoral Thesis" is not available
dc.language.iso	eng
dc.publisher	UKM, Bangi
dc.relation	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat
dc.subject	Text summarization
dc.subject	Keyphrase extraction
dc.subject	Arabic text
dc.subject	Clustering method
dc.subject	Dissertations, Academic -- Malaysia
dc.title	Automatic Arabic text summarization using clustering and keyphrase extraction
dc.type	theses
dc.rights.holder	UKM	-
dc.format.pages	99
Appears in Collections:	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:

File	Description	Size	Format
ukmvital_82996+SOURCE1+SOURCE1.0.PDF Restricted Access		129.63 kB	Adobe PDF	View/Open

Show simple item record Recommend this item