Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476356
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorNazlia Omar, Assoc. Prof. Dr.
dc.contributor.authorHamzah Noori Fejer (P72243)
dc.date.accessioned2023-10-06T09:16:57Z-
dc.date.available2023-10-06T09:16:57Z-
dc.date.issued2015-02-20
dc.identifier.otherukmvital:82996
dc.identifier.urihttps://ptsldigital.ukm.my/jspui/handle/123456789/476356-
dc.descriptionAutomatic text summarization has become an important due to the rapid growth of information texts since it is very difficult for human beings to manually summarize large documents of texts . A full understanding of the document is essential to form an ideal summary. However, achieving full understanding is either difficult or impossible for computers. Therefore, selecting important sentences from the original text and presenting these sentences as a summary present the most common techniques in automated text summarization. Arabic natural language processing lacks tools and resources which are essential to advance research in Arabic text summarization. In addition to the limited resources, there has been little attention and research done in this field. Arabic text summarization still suffer from low accuracy as they use simple summarization techniques such as a single level summarization model. The aim of this research is to improve Arabic text summarization by using clustering and keyphrase extraction. This study proposes a combined clustering method (partitioning and hierarchical) to group Arabic documents into several clusters. Keyphrase extraction module is applied to extract important keyphrases from each cluster, which helps to identify the most important sentences and find similar sentences based on several similarity algorithms. These algorithms are applied to extract one sentence from a group of similar sentences while ignoring the other similar sentences (i.e., sentences that have a greater similarity than the predefined threshold). This model is designed to improve the quality in Arabic text summarization. The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics were used for the evaluation. For the summarization dataset two corpora were used. The first one is called Essex Arabic Summaries Corpus (EASC) which was used for single document .The second corpus, DUC2002 was used for multi-document summarization. This model achieved an accuracy of 63.3% for single-document and 43.4% for multi-document summarization .The experiments have proved that the proposed model gives better performance in comparison to other systems.,Certification of Master's/Doctoral Thesis" is not available
dc.language.isoeng
dc.publisherUKM, Bangi
dc.relationFaculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat
dc.rightsUKM
dc.subjectText summarization
dc.subjectKeyphrase extraction
dc.subjectArabic text
dc.subjectClustering method
dc.subjectDissertations, Academic -- Malaysia
dc.titleAutomatic Arabic text summarization using clustering and keyphrase extraction
dc.typetheses
dc.format.pages99
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_82996+SOURCE1+SOURCE1.0.PDF
  Restricted Access
129.63 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.