Please use this identifier to cite or link to this item:
https://ptsldigital.ukm.my/jspui/handle/123456789/513469
Title: | Pattern discovery algorithms for time series data mining |
Authors: | Almahdi Mohammed Almahdi Ahmed (P48465) |
Supervisor: | Azuraliza Abu Bakar, Prof. Dr. |
Keywords: | Pattern Discovery Algorithms Data mining |
Issue Date: | 17-Dec-2012 |
Description: | Time series pattern discovery is a process of searching for patterns in streams of data. Searching for important patterns in time series data is crucial since the data is collected in on a time basis and tends to generate very large amounts of data. The main problems in time series pattern discovery are the high dimensionality of data, the complexity of search for subsequence of event, and the existence of noisy and large number of patterns. In this study, a novel approach is introduced for pattern discovery from time series data that contains three main phases:i) time series data representation; ii) pattern detection; and iii) frequent and sequential pattern mining.In the time series data representation phase,an improved algorithm is proposed based on the integration of the Relative Frequency algorithm and Harmony Search algorithm for symbolicbased representation. The performance of this proposed algorithm was evaluated using 20 public domain time series data sets and the Malaysia rainfall data set collected from 10 stations in Selangor for 35 years. The proposed symbolic representation of rainfall data were then used for the detection phase.In this phase, two main approaches were adopted: the sliding window approach for data segmentation and an improved Case-based Reasoning (CBR) approach to classify the subsequences of the sequences. In the mining phase, two improved algorithms were proposed for pattern discovery that are variants of frequent pattern mining and the Apriori algorithm. These algorithms discover frequent and sequential patterns. The existing frequent pattern mining algorithm was integrated with the Allen intervals operation method to discover the frequent patterns. Then, the Apriori algorithm was employed to find sequential patterns of time series data. The performance of the proposed algorithms was evaluated in relation to word size, alphabet size, accuracy, error rates, confidence,support and experts.The experimental results showed that the proposed data representation algorithm outperforms several existing algorithms with optimal symbolic representation. The algorithm optimized the word and alphabet size of time series data and improved the standard Symbolic Aggregate approXimation (SAX) algorithm. The CBR detection phase obtained a better result with high accuracy, lower error and misclassification rate. In the mining phase, the proposed integrated mining algorithms were able to generate higher confidence and support of frequent and sequential patterns.Generally, the proposed study has shown its potential in producing methods that manage to preserve important knowledge and thus reduce information loss. Therefore, for the weather prediction problem, more important knowledge can be revealed and to support the decision-making process.,Ph.D |
Pages: | 171 |
Call Number: | QA76.9.D343.A375 2012 3 |
Publisher: | UKM, Bangi |
Appears in Collections: | Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ukmvital_74641+Source01+Source010.PDF Restricted Access | 1.81 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.