Please use this identifier to cite or link to this item:
https://ptsldigital.ukm.my/jspui/handle/123456789/513385
Title: | Text feature selection using enhanced binary bat algorithm |
Authors: | Aisha Adel Ahmed Al-Hajjana (P75140) |
Supervisor: | Nazlia Omar, Assoc. Prof. Dr. |
Keywords: | Universiti Kebangsaan Malaysia -- Dissertations Dissertations, Academic -- Malaysia Internet Algorithms Data sets Classification |
Issue Date: | 5-Mar-2019 |
Description: | Ph.D.,Given the huge amount of the textual data generated and shared on the internet such as news reports, articles, tweets and product reviews, the need for effective Text-Feature Selection (TFS) becomes increasingly important. This is challenging due to the high dimensionality of text data. Most of the current TFS methods ignore the feature dependencies, which reduces the quality of the selected feature set and affect the classification performance. Other methods depend on population-based meta-heuristic algorithms, which improve the quality of the selected feature set and the classification results. However, this type of methods is classifier dependent and produce higher number of features. In addition, these algorithms are exposed to premature convergence due to poor population diversity. Moreover, the performance of meta-heuristics is less efficient when tackling high-dimensional problems, and the population diversity needs to be controlled during the optimization process. To handle these problems, this research aims to develop a method based on Binary Bat Algorithm (BBA) to improve TFS. First, rough set theory is adapted and used to evaluate the solutions produced by BBA. The proposed method is compared with a wrapper version of BBA based TFS method. Then, a modified version of Latin Hypercube Sampling (LHS) method is proposed to initialize a diverse population. The proposed method is compared with random initialization in terms of the performance of TFS method during optimization process and the classification results. Experiments show that the proposed initialization method improves the diversity of the initial population and the final solution, but the population diversity decrease during early stages of the optimization process. Thus, a cooperative co-evolutionary BBA is introduced to control the populations diversity during the optimization process and to improve the performance of BBA based TFS method. This is done by dividing the dimension of the problem into several parts and optimizing each of them in a separate sub-population. To evaluate the generality and capability of the proposed method, three classifiers and two standard benchmark datasets in English and one in Arabic have been used. The results show that the proposed method steadily improves the classification performance in comparison with best results reported in literature. The improvement is obtained for both English and Arabic datasets which indicates the generality of the proposed TFS method in terms of the dataset language. |
Pages: | 213 |
Call Number: | ZA4201.H335 2019 3 tesis |
Publisher: | UKM, Bangi |
Appears in Collections: | Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ukmvital_122016+SOURCE1+SOURCE1.0.PDF Restricted Access | 2.66 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.