Please use this identifier to cite or link to this item:
https://ptsldigital.ukm.my/jspui/handle/123456789/476475
Title: | Enhanced itc feature based on semantic similarity for short documents classification |
Authors: | Ali Abdulkadhim Hasan (P80531) |
Supervisor: | Sabrina Tiun, Dr. |
Keywords: | Short text Instant messaging |
Issue Date: | 25-Apr-2017 |
Description: | In the modern era of information technology, the use of short text has increased dramatically in which many applications are being relied on short text such as mobile messaging, breaking news social media and queries. The key challenge behind the short text lies on the limitation of acquiring context information from such text. This limitation increases both sparsity and ambiguity of the text. The traditional approaches that have been used for the classical text such as bag-of-words seem to be insufficient due to the too limited information that can be extracted from the short text. This limitation leads to a lack of the semantic knowledge and the semantic relations between the words within the short text. Hence, this study aims to propose a new feature extraction method based on Interesting Term Count (ITC) with an external knowledge of WordNet and weighting to new weight (di) to identify the variation between classes on the base of ITC. The proposed feature extraction approach aims at identifying the common terms without losing the semantic manner where the WordNet will be utilized to provide the semantic correspondences among the words within the short text. A benchmark dataset of Search-snippet (which consists of 10,060 train test snippets from multiple categories), and a Web-KB dataset, are both used as the dataset for evaluation. Also, three classification methods have been used including Support Vector Machine, Decision Tree (J-48), and Naive Bayes. An evaluation has been performed by comparing the performance of the three classifiers, with the enhanced ITC versus the base ITC. Experimental results show outperformance of the classifiers with the proposed feature extraction method (the enhanced ITC). This implies the effectiveness of using external source knowledge for short text classification.,Certification of Master's/Doctoral Thesis" is not available |
Pages: | 95 |
Call Number: | TK5105.73.H337 2017 3 tesis |
Publisher: | UKM, Bangi |
Appears in Collections: | Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ukmvital_99004+SOURCE1+SOURCE1.0.PDF Restricted Access | 73 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.