Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/513246
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorZalinda Othman, Assoc. Prof. Dr.
dc.contributor.authorSeyyedali Fattahi (P57798)
dc.date.accessioned2023-10-16T04:34:58Z-
dc.date.available2023-10-16T04:34:58Z-
dc.date.issued2016-03-04
dc.identifier.otherukmvital:83273
dc.identifier.urihttps://ptsldigital.ukm.my/jspui/handle/123456789/513246-
dc.descriptionClassification is a significant data-mining task, which assigns collected items in a collection to predefined classes or labels referred to as a supervised learning. When the one class (majority) is significantly distributed more than other class (minority), the class imbalance problem (CIP) occurred. This causes a significant challenge in a classification research. The minority class is the class with highest interest for researchers despite some researchers ignored the balancing of datasets. Then classifying minority class would avoid poor classifier performance. In classification, individual traditional classifiers such as C4.5 and Random Forest try to learn from the majority class and show poor classification performance on minority class. But ensemble classifiers have advantages which improve classification performance by averaging accuracy of all individual classifiers. There are two main critical challenges in CIP with ensemble methods. First, reducing and alleviating the imbalance ratio (bias of imbalance), second, obtaining a better composite ensemble model with a lower error rate. This work proposed a new ensemble model to solve two-class classification and multi-class classification. This work provides a substantial comparison between existing methods and combinations from major levels on both balanced and imbalanced data. In addition, this work characterizes classifiers performance under changing distribution. We proposed three ensemble models that combined Synthetic Minority Over-sampling TEchnique (SMOTE), with Rotation Forest (ROFO), AdaBoostM1, Random Forest (RF) algorithms, and K-means Clustering. These methods are called respectively, SMOTE-ROFO, SMOTERotBoost, SMOTE-(RF)2 and KCSMOTE-(RF)2. All experiments were carried out on 66 imbalanced datasets from KEEL and UCI repository datasets and developed by Java coding-based WEKA, Orange, and KNIME for validation. Proposed ensemble models were compared to classical classifiers and existing ensemble models such as SMOTE-Boost, SMOTE-Bagging, and SMOTE-Random Subspace in terms of imbalance evaluation domain such as Overall Accuracy, Precision, Recall, F-score, Area Under Curve and metrics. Experimental results showed SMOTE-ROFO, SMOTE-(RF)2 and KCSMOTE-(RF)2 have effective performance in dealing with CIP. The approaches are capable to provide high-quality solutions at two-class and multiclass imbalance classification. The proposed ensemble models outperform other compared ensemble models in reducing error rate and alleviating costs of misclassification.,Certification of Master's/Doctoral Thesis" is not available
dc.language.isoeng
dc.publisherUKM, Bangi
dc.relationFaculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat
dc.rightsUKM
dc.subjectClassification
dc.subjectSmote-based ensembles
dc.subjectMinority class
dc.subjectDissertations, Academic -- Malaysia
dc.titleSmote-based ensembles for imbalanced classification problem
dc.typeTheses
dc.format.pages338
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_83273+SOURCE1+SOURCE1.0.PDF
  Restricted Access
479.15 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.