Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476388
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorMohd. Juzaiddin Ab Aziz, Prof. Dr.
dc.contributor.authorAmiri Dorna (P53644)
dc.date.accessioned2023-10-06T09:17:35Z-
dc.date.available2023-10-06T09:17:35Z-
dc.date.issued2012-12-05
dc.identifier.otherukmvital:84861
dc.identifier.urihttps://ptsldigital.ukm.my/jspui/handle/123456789/476388-
dc.descriptionText Categorization (TC) is the task of automatically assigning a set of documents into a set of predefined categories. The problem of Text Categorization is that when all the terms (features) within documents are taken as the feature set, it leads to high dimensional feature space, which makes the computing process difficult and time consuming. This work focuses on applying text summarization (TS) as an effective feature selection technique in TC to handle the mentioned problem. This research aims at TC based on a graph-based summarization approach. Feature selection plays a great role in TC by selecting informative features. Although current feature selection methods evaluate features well but they don’t have the ability to reduce the feature set size. In preprocessing phase. WordNet, a lexical database for the English language , and a stemmer based on porter stemming algorithm were applied. A text summary is a shorter version of the original text which contains the standpoints and main information of it , hence it was used as a replacement. TS was done by applying TextRank model which is a graph-based approach. It ranks all the sentences exist in a document based on the importance of each sentence. The summary that is constructed by selecting 10%, 20% and 30% of important sentences, then is used directly to select features. The machine learning algorithm which classifier was trained according to it , is k-nearest neighbour (KNN). KNN classifies unlabeled documents in a test set, based on labeled documents in a training set , and assigns each to its relevant category. In this work “hard categorization” method was taken into consideration in which a document can be assigned to just one category. The corpus was collected from online news agencies. The results reveal that graph based TS on the train set alleviates the process of TC by affecting the feature set size. This effect leads to the reduction of time for classifier training and also the reduction of calculation complexity. 60% of the collected documents were considered as the train set and the remained 40% as the test set . 10% summary,20%summary,30% summary were tested on the proposed method and 20% summary showed the best performance.,Master / Sarjana
dc.language.isoeng
dc.publisherUKM, Bangi
dc.relationFaculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat
dc.rightsUKM
dc.subjectText Categorization
dc.subjectText processing (Computer science)
dc.titleA text categorization based on the effect of text summarization
dc.typetheses
dc.format.pages93
dc.identifier.callnoQA76.9.T48A475 2013 3 tesis
dc.identifier.barcode002015
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_84861+SOURCE1+SOURCE1.0.PDF
  Restricted Access
1.87 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.