Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/487025
Title: Histogram based machine learning algorithms for continuously updated data
Authors: Haider Omar Abduljawad Lawend (P72653)
Supervisor: Anuar Mikdad Muad, Dr.
Keywords: Machine learning
Algorithms
Data sets
Universiti Kebangsaan Malaysia -- Dissertations
Dissertations, Academic -- Malaysia
Issue Date: 5-Mar-2019
Description: In machine learning algorithms, large amount of new data need to be updated from time to time. This leads to concept change problem as the data keep changing with time. Therefore, it is a necessary to provide a solution that is able to work fast with huge amount of data and also continuously adapt and learn new information. The objective of this thesis is to develop a machine learning algorithm to deal with concept change problem and huge amount of data. In this thesis, a new supervised learning algorithm namely partial histogram Bayes learning algorithm (PHBayes) is proposed. PHBayes is a type of Bayesian learning algorithm which represents the distribution of the class as histogram probability distribution. The classification is performed by assigning an instance to the class with highest posterior probability using Bayesian rule. PHBayes demonstrated faster, more accurate with large number of instances, requires small memory and very flexible to the changes in the data compared with other classifiers. The accuracy of PHBayes is improved as compared to naive Bayes. PHBayes achieved accuracy 0.7965 while naive Bayes achieved 0.7219. PHBayes is slower by 3.395 times in training phase but faster by 2.307 times in the testing phase as compared to naive Bayes. Another new unsupervised clustering algorithm is proposed, namely histogram based Gaussian mixture model (HGMM) which learns using splitting technique instead of expectation maximization whereby the splitting is controlled based on the distribution of the histograms in the clusters. HGMM demonstrated faster and more accurate than the traditional Gaussian mixture model. The purity of HGMM is improved as compared to Gaussian mixture model. HGMM achieved purity 0.9955 while Gaussian mixture model achieved 0.8444. HGMM is faster than Gaussian mixture model by 2.729 times in the training phase. HGMM is specifically designed to work with PHBayes which is able to do the clustering on the fly in order to continuously update the information of classes. Therefore, PHBayes is extended to semi-supervised learning algorithm to work with labeled and unlabeled data with limited supervision. Here, a new semi-supervised learning algorithm namely histogram Gaussian mixture model with Bayes based learning algorithm (HGBayes) is proposed. HGBayes combines supervised PHBayes with unsupervised HGMM using active learning technique. HGBayes achieved high accurate results using unlabeled data with few labeled instances. The ratio between labeled and unlabeled instances in HGBayes is 1:119. In addition, to improve the accuracy of the learning system in the vision environment, an edge detection algorithm based on Canny edge detector by tracking the edges and measuring their lengths and the degree of straightness is proposed, namely multiple features edge detector (MFED) and is suitable for organized edges. The accuracy of MFED is improved as compared to Canny edge detector. MFED achieved accuracy 0.9877 while Canny edge detector achieved 0.8469. MFED is slower than Canny edge detector by 3.467 times. Finally, PHBayes, HGMM, HGBayes and MFED are combined to work in one learning system which is accurate, fast and flexible to the changes in data with focusing on computer vision applications.,Ph.D.
Pages: 254
Call Number: Q325.5.L339 2019 3 tesis
Publisher: UKM, Bangi
Appears in Collections:Faculty of Engineering and Built Environment / Fakulti Kejuruteraan dan Alam Bina

Files in This Item:
File Description SizeFormat 
ukmvital_120958+SOURCE1+SOURCE1.1.PDF
  Restricted Access
4.63 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.