Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476475
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorSabrina Tiun, Dr.
dc.contributor.authorAli Abdulkadhim Hasan (P80531)
dc.date.accessioned2023-10-06T09:19:10Z-
dc.date.available2023-10-06T09:19:10Z-
dc.date.issued2017-04-25
dc.identifier.otherukmvital:99004
dc.identifier.urihttps://ptsldigital.ukm.my/jspui/handle/123456789/476475-
dc.descriptionIn the modern era of information technology, the use of short text has increased dramatically in which many applications are being relied on short text such as mobile messaging, breaking news social media and queries. The key challenge behind the short text lies on the limitation of acquiring context information from such text. This limitation increases both sparsity and ambiguity of the text. The traditional approaches that have been used for the classical text such as bag-of-words seem to be insufficient due to the too limited information that can be extracted from the short text. This limitation leads to a lack of the semantic knowledge and the semantic relations between the words within the short text. Hence, this study aims to propose a new feature extraction method based on Interesting Term Count (ITC) with an external knowledge of WordNet and weighting to new weight (di) to identify the variation between classes on the base of ITC. The proposed feature extraction approach aims at identifying the common terms without losing the semantic manner where the WordNet will be utilized to provide the semantic correspondences among the words within the short text. A benchmark dataset of Search-snippet (which consists of 10,060 train test snippets from multiple categories), and a Web-KB dataset, are both used as the dataset for evaluation. Also, three classification methods have been used including Support Vector Machine, Decision Tree (J-48), and Naive Bayes. An evaluation has been performed by comparing the performance of the three classifiers, with the enhanced ITC versus the base ITC. Experimental results show outperformance of the classifiers with the proposed feature extraction method (the enhanced ITC). This implies the effectiveness of using external source knowledge for short text classification.,Certification of Master's/Doctoral Thesis" is not available
dc.language.isoeng
dc.publisherUKM, Bangi
dc.relationFaculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat
dc.rightsUKM
dc.subjectShort text
dc.subjectInstant messaging
dc.titleEnhanced itc feature based on semantic similarity for short documents classification
dc.typetheses
dc.format.pages95
dc.identifier.callnoTK5105.73.H337 2017 3 tesis
dc.identifier.barcode003244(2018)
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_99004+SOURCE1+SOURCE1.0.PDF
  Restricted Access
73 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.