Please use this identifier to cite or link to this item:
https://ptsldigital.ukm.my/jspui/handle/123456789/772430
Title: | An enhanced multi density micro clustering-grid based approach for clustering evolving data stream |
Authors: | Mayas Mohammed Mahdi Abd Ali Aljibawi (P83443) |
Supervisor: | Mohd Zakree Ahmad Nazri, Assoc. Prof. Dr. |
Keywords: | Universiti Kebangsaan Malaysia -- Dissertations Dissertations, Academic -- Malaysia Data mining Nonlinear theories |
Issue Date: | 3-Jan-2022 |
Abstract: | The use of a density-based approach for clustering streaming data has emerged as a valuable method. The main advantage of it over the other clustering categories is the ability to cluster arbitrary shape clusters, detect the outliers, and require no prior knowledge about the number of clusters. Data stream is boundless, evolving, requires fast processing, and allowing just one or a tiny number of scans. Thus, the traditional algorithms are not suitable. Many density-based algorithms have been proposed for purpose of clustering data stream. However, data stream clustering algorithms based on density are not problems free. The first problem is the high memory usage with the increase of the streaming speed or increasing of the dimensionality. The second problem is the decreasing in the quality of the clustering when the range of the densities increase. The third problem is the high computation time and complexity of the algorithm. In this study, this problem was considered to be investigated by improving the grid -based clustering method. This study develops a grid-based clustering method for data streaming clustering called HMG-Stream. In this work, these problems are taken into consideration to propose a new method. This study developed a hybrid Micro Clustering Grid-based method for clustering evolving data stream called HMG-Stream (Hybrid Micro Clustering Grid-based method for clustering data stream). It is an online-offline algorithm, which keeps summarization of information, performs a pruning process to remove the outliers before the final clustering. In the online phase, the algorithm combined density micro-clustering and density grid clustering in a hybrid form for recursively saving the important parameter instead of saving the whole point as core mini clusters and map the outlier into the grids. Moreover, the grids’ data point form new core mini clusters if they reach a predefined density threshold before the pruning process remove the outliers for decreasing the memory usage. The algorithm form the final clustering in the offline-phase using the core mini clusters depending on a dynamic grid granularity. The algorithm's efficiency is measured by using different types of wellknown real and synthetic datasets of different size and densities with various quality metrics. The experiments results show that the proposed method used less memory allocation than compared algorithm. Moreover, the quality of the clustering is improved for the multi-density datasets compared with other state-of-the-art algorithms, as well as the complexity is decreased which make the proposed method applicable for the data stream clustering. |
Description: | Full-text |
Pages: | 256 |
Publisher: | UKM, Bangi |
Appears in Collections: | Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
AN ENHANCED MULTI DENSITY MICRO.pdf Restricted Access | 5.64 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.