Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/513451
Title: Rough Set Based Algorithms For Knowledge Discovery In Databases (KDD)
Authors: Khadija Mohammed Hasan Al-Aidarous (P39456)
Supervisor: Azuraliza Abu Bakar, Prof. Dr.
Keywords: Based Algorithms
Knowledge Discovery In Databases
KDD
Data mining
Issue Date: 30-Jul-2013
Description: Rough Set Theory (RST) has witnessed a rapid growth of interest worldwide and it has proven its soundness and usefulness in many real-life applications. Research study revealed that RST is a good technique for data analysis and can be combined with other complementary techniques due to the variety of functions which can be achieved using its theory; such as data reduction, discovering attribute dependencies and relationships, evaluating importance of features and discovering patterns of data. However, there are still several applications of its theory to unexplored areas within Knowledge Discovery in Databases (KDD) which require further attention and research. Motivated with this fact, the main objective of this study is to investigate new usages of RST for solving existing problems in KDD process. The KDD can be grouped into three main phases namely: data preprocessing, data modeling, and post mining. This research proposes variants of rough set algorithms for the data preprocessing and modeling phase, specifically in missing values estimation and classification. Three main problems are addressed in this study: the existence of missing values which is challenging when dealing with data quality, the unrealistic independence and equal importance assumptions of NB which are violated by most real world data sets, and the feature weighting problem. Three rough based algorithms namely Estimating Missing values using VDM in RST (EMVR), Rough Naïve Bayes (RNB), and Tabu search with Rough Naïve Bayes (TRNB) are proposed to handle these problems. For estimating missing values and to improve the data quality, EMVR is proposed as a sequential approach that will perform partitioning of the given data set using rough set equivalence classes and use a rough set distance-based metric in selecting the closest objects which are used for estimating the missing values. In the modeling phase, an algorithm called Rough Naïve Bayes (RNB) is proposed to handle the Naïve Bayes unpractical assumptions. It uses the RS ability for discovering attribute relationships and dependencies. Finally, a Tabu Search (TS) based approach is proposed to improve the performance of RNB and to solve the problem of finding the optimal weight values for weighted NB classifiers. The rough set analysis is embedded within the tabu search algorithm for optimizing the weight vector of RNB to improve its performance. To demonstrate the validity and feasibility of the proposed algorithms, empirical evaluation is conducted using several benchmark data sets from the UCI machine learning repository. The proposed algorithms were implemented then evaluated using cross validation method with number of performance measures. Two-tailed paired t-test on the outcome of cross validation runs is used to show significant difference for the conducted comparisons. Experimental results revealed that the proposed algorithms have high performance with significant improvements in most of the examined data sets.,PhD
Pages: 258
Call Number: QA76.9.D343 .A383 2013 3
Publisher: UKM, Bangi
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_71600+Source01+Source010.PDF
  Restricted Access
5.53 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.