Please use this identifier to cite or link to this item:
https://ptsldigital.ukm.my/jspui/handle/123456789/476592
Title: | Hierarchical multi-label of short document classification using term expansion and label powerset |
Authors: | Zaid Farooq Salih (P84488) |
Supervisor: | Sabrina Tiun, Dr. |
Keywords: | Text processing (Computer science) Universiti Kebangsaan Malaysia -- Dissertations Dissertations, Academic -- Malaysia |
Issue Date: | 23-Mar-2018 |
Description: | The process of classifying any text set aims to assign a predefined set of categories or classes for such documents in accordance to their contents. Greater challenges for text classification when the process needs to consider multi-label classes and hierarchical classification. This task aims to provide a number of appropriate classes for a single text document in a hierarchy structure. The task of hierarchical classification is getting more challenging when handling short text. Short text documents contain a limited number of words which make it highly ambiguous regarding the difficulty of extracting contextual information. Several approaches have been proposed for the task of hierarchical text classification. However, such approaches have used the One-against-all mechanism which seems to be inefficient for the short text classification. Therefore, this study aims to propose a combination of term expansion method and Label Powerset mechanism for the short hierarchical classification using the Support Vector Machine (SVM) classifier. The term expansion aims to handle the problem of ambiguity that lies behind the short text by providing semantic correspondences using WordNet dictionary. In addition, an appropriate feature extraction approach has been used with the term expansion method in order to identify the most important terms within the expanded text. Such method has been utilized by a modified version of TF-IDF, which is the Interesting Term Count (ITC). On the other hand, the Label Powerset mechanism will be utilized with the SVM classifier in order to handle the problem of hierarchical text classification. A discretization process has been applied in order to convert the hierarchy into a flat structure. To test the proposed method, a short text dataset of ACM has been used in the experiments which contains vast amount of titles and keywords related to publication articles. The evaluation has been conducted by comparing the term expansion and without applying the expansion. Experimental results have shown that the proposed method (with the term expansion) has achieved the best F-measure of 93%. This indicates the effectiveness of term expansion with Label Powerset approach in hierarchical multi-label of short document.,Master of Information Technology,Certification of Master's / Doctoral Thesis" is not available" |
Pages: | 88 |
Call Number: | QA76.9.T48S255 2018 3 tesis |
Publisher: | UKM, Bangi |
Appears in Collections: | Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ukmvital_121716+SOURCE1+SOURCE1.0.PDF Restricted Access | 15.02 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.