Please use this identifier to cite or link to this item:
https://ptsldigital.ukm.my/jspui/handle/123456789/476356
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Nazlia Omar, Assoc. Prof. Dr. | |
dc.contributor.author | Hamzah Noori Fejer (P72243) | |
dc.date.accessioned | 2023-10-06T09:16:57Z | - |
dc.date.available | 2023-10-06T09:16:57Z | - |
dc.date.issued | 2015-02-20 | |
dc.identifier.other | ukmvital:82996 | |
dc.identifier.uri | https://ptsldigital.ukm.my/jspui/handle/123456789/476356 | - |
dc.description | Automatic text summarization has become an important due to the rapid growth of information texts since it is very difficult for human beings to manually summarize large documents of texts . A full understanding of the document is essential to form an ideal summary. However, achieving full understanding is either difficult or impossible for computers. Therefore, selecting important sentences from the original text and presenting these sentences as a summary present the most common techniques in automated text summarization. Arabic natural language processing lacks tools and resources which are essential to advance research in Arabic text summarization. In addition to the limited resources, there has been little attention and research done in this field. Arabic text summarization still suffer from low accuracy as they use simple summarization techniques such as a single level summarization model. The aim of this research is to improve Arabic text summarization by using clustering and keyphrase extraction. This study proposes a combined clustering method (partitioning and hierarchical) to group Arabic documents into several clusters. Keyphrase extraction module is applied to extract important keyphrases from each cluster, which helps to identify the most important sentences and find similar sentences based on several similarity algorithms. These algorithms are applied to extract one sentence from a group of similar sentences while ignoring the other similar sentences (i.e., sentences that have a greater similarity than the predefined threshold). This model is designed to improve the quality in Arabic text summarization. The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics were used for the evaluation. For the summarization dataset two corpora were used. The first one is called Essex Arabic Summaries Corpus (EASC) which was used for single document .The second corpus, DUC2002 was used for multi-document summarization. This model achieved an accuracy of 63.3% for single-document and 43.4% for multi-document summarization .The experiments have proved that the proposed model gives better performance in comparison to other systems.,Certification of Master's/Doctoral Thesis" is not available | |
dc.language.iso | eng | |
dc.publisher | UKM, Bangi | |
dc.relation | Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat | |
dc.rights | UKM | |
dc.subject | Text summarization | |
dc.subject | Keyphrase extraction | |
dc.subject | Arabic text | |
dc.subject | Clustering method | |
dc.subject | Dissertations, Academic -- Malaysia | |
dc.title | Automatic Arabic text summarization using clustering and keyphrase extraction | |
dc.type | theses | |
dc.format.pages | 99 | |
Appears in Collections: | Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ukmvital_82996+SOURCE1+SOURCE1.0.PDF Restricted Access | 129.63 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.