Smote-based ensembles for imbalanced classification problem

Seyyedali Fattahi (P57798)

Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/513246

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Zalinda Othman, Assoc. Prof. Dr.
dc.contributor.author	Seyyedali Fattahi (P57798)
dc.date.accessioned	2023-10-16T04:34:58Z	-
dc.date.available	2023-10-16T04:34:58Z	-
dc.date.issued	2016-03-04
dc.identifier.other	ukmvital:83273
dc.identifier.uri	https://ptsldigital.ukm.my/jspui/handle/123456789/513246	-
dc.description	Classification is a significant data-mining task, which assigns collected items in a collection to predefined classes or labels referred to as a supervised learning. When the one class (majority) is significantly distributed more than other class (minority), the class imbalance problem (CIP) occurred. This causes a significant challenge in a classification research. The minority class is the class with highest interest for researchers despite some researchers ignored the balancing of datasets. Then classifying minority class would avoid poor classifier performance. In classification, individual traditional classifiers such as C4.5 and Random Forest try to learn from the majority class and show poor classification performance on minority class. But ensemble classifiers have advantages which improve classification performance by averaging accuracy of all individual classifiers. There are two main critical challenges in CIP with ensemble methods. First, reducing and alleviating the imbalance ratio (bias of imbalance), second, obtaining a better composite ensemble model with a lower error rate. This work proposed a new ensemble model to solve two-class classification and multi-class classification. This work provides a substantial comparison between existing methods and combinations from major levels on both balanced and imbalanced data. In addition, this work characterizes classifiers performance under changing distribution. We proposed three ensemble models that combined Synthetic Minority Over-sampling TEchnique (SMOTE), with Rotation Forest (ROFO), AdaBoostM1, Random Forest (RF) algorithms, and K-means Clustering. These methods are called respectively, SMOTE-ROFO, SMOTERotBoost, SMOTE-(RF)2 and KCSMOTE-(RF)2. All experiments were carried out on 66 imbalanced datasets from KEEL and UCI repository datasets and developed by Java coding-based WEKA, Orange, and KNIME for validation. Proposed ensemble models were compared to classical classifiers and existing ensemble models such as SMOTE-Boost, SMOTE-Bagging, and SMOTE-Random Subspace in terms of imbalance evaluation domain such as Overall Accuracy, Precision, Recall, F-score, Area Under Curve and metrics. Experimental results showed SMOTE-ROFO, SMOTE-(RF)2 and KCSMOTE-(RF)2 have effective performance in dealing with CIP. The approaches are capable to provide high-quality solutions at two-class and multiclass imbalance classification. The proposed ensemble models outperform other compared ensemble models in reducing error rate and alleviating costs of misclassification.,Certification of Master's/Doctoral Thesis" is not available
dc.language.iso	eng
dc.publisher	UKM, Bangi
dc.relation	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat
dc.rights	UKM
dc.subject	Classification
dc.subject	Smote-based ensembles
dc.subject	Minority class
dc.subject	Dissertations, Academic -- Malaysia
dc.title	Smote-based ensembles for imbalanced classification problem
dc.type	Theses
dc.format.pages	338
dc.identifier.callno	QA76.9.D343F3548 2016 3 tesis	en_US
Appears in Collections:	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:

File	Description	Size	Format
ukmvital_83273+SOURCE1+SOURCE1.0.PDF Restricted Access		479.15 kB	Adobe PDF	View/Open

Show simple item record Recommend this item