Please use this identifier to cite or link to this item:
https://ptsldigital.ukm.my/jspui/handle/123456789/513375
Title: | Enhance online-offline varying density method for data stream clustering |
Authors: | Maryam Mousavi (P65600) |
Supervisor: | Azuraliza Abu Bakar, Prof. Dr. |
Keywords: | Universiti Kebangsaan Malaysia -- Dissertations Dissertations, Academic -- Malaysia Data sets Algorithms |
Issue Date: | 29-Oct-2018 |
Description: | The term data stream refers to a potentially bulky, continuous and fast sequence of information. As opposed to traditional data forms which are unchanging and static, a data stream has its own unique characteristics; it is massive, even potentially infinite, and is, moreover, continuous, requires a single scan, and dynamically changes over the time, thus requiring a rapid response usually in real time. The process of data stream clustering involves extracting valuable patterns in real time from dynamic streaming data in only a single scan, which can be very challenging. However, due to the nature and characteristics of data stream, traditional clustering techniques cannot be applied. Thus, it has become crucial to develop new and improved clustering techniques. The existing clustering techniques are generally categorized into five main categories: hierarchical, partitioning, grid-based, density-based and model-based. Density-based techniques are the remarkable category in clustering data streams. These techniques consider the dense areas of objects as clusters where they are separated with low density sparse areas in data set. They can detect the clusters with arbitrary shapes and can handle noises. In addition, these methods do not require a priori knowledge of the numbers of clusters. The main objective of this research is to propose a new online-offline density-based clustering method for data stream with varying density. In the online phase, the summary of data is created (often known as micro-clusters) and in the offline phase, this synopsis of data is used to form the final clusters. Finding the accurate micro-clusters is the goal of online phase. When a new data point arrives, the procedure of finding the nearest and best fit micro-cluster is the time consuming process. This procedure can lead to increase the execution time. To address this problem, a new merging algorithm is proposed that can lead to decrease the execution time. For maintaining a limited number of micro-clusters, a pruning process is applied along with the summarization process. In the existing methods, this pruning process takes too long time to remove micro-clusters whose do not receive objects frequently that cause to increase the memory usage. In this thesis, to solve this problem, a new pruning algorithm is introduced to reduce the memory usage. Another problem with density-based methods is that they use global parameters in the data sets with varying density that can lead to dramatic decrease in the clustering quality. In our work, to create final clusters, a new density-based algorithm that works based on only MinPts parameter is proposed for increasing the clustering quality of data sets with varying density. The performance evaluation on both synthetic and real data sets illustrates the efficiency and effectiveness of the proposed method. The experimental results show that our method can increase the clustering quality in data sets with varying density along with limited time and memory usage.,Ph.D. |
Pages: | 185 |
Call Number: | Z692.D37M638 2018 3 tesis |
Publisher: | UKM, Bangi |
Appears in Collections: | Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ukmvital_121010+SOURCE1+SOURCE1.1.PDF Restricted Access | 4.34 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.