Hybrid discernibility of rough set based algorithms for overlap clustering

Djoko Budiyanto Setyohadi (P51680)

Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/513510

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Azuraliza Abu Bakar, Prof. Dr.
dc.contributor.author	Djoko Budiyanto Setyohadi (P51680)
dc.date.accessioned	2023-10-16T04:37:27Z	-
dc.date.available	2023-10-16T04:37:27Z	-
dc.date.issued	2014-05-02
dc.identifier.other	ukmvital:79957
dc.identifier.uri	https://ptsldigital.ukm.my/jspui/handle/123456789/513510	-
dc.description	Overlap Clustering is important in data mining area due to the vague and uncertain of the real-world data. This conditions make the distinction among the classes is not clear and a conventional clustering algorithm often fails to find the appropriate cluster. Fuzzy C-Means (FCM) is the well-known overlap clustering algorithm due to its simplicity and its high performance. The main advantage of FCM is it is more natural since the objects are not forced to fully belong to one of the classes and the overlap clustering is performed by using the degrees of class memberships. FCM relies on distance measurement for membership computation and this leads into two problems. First, it is sensitive to outliers and its performance decrease when the data dimension increase. Second, the use of random initial seed that causes the local optima problem. Recently, Rough K-Means (RKM) algorithm is developed to handle with the vagueness of the data in clustering, and it outperformed FCM when deals with overlap objects using boundary area concept. However, RKM is less descriptive than FCM, since it only separates the vague objects from the crisp objects. Therefore, in this study a hybrid rough set based clustering method is proposed to overcome the overlap clustering. Two algorithms are proposed for overlap clustering problem namely, Discernibility of Rough Set (DR) based algorithm for initial seed computation, and Rough K-Means Discernibility algorithm (RKMD) for overlap clustering. The DR algorithm is proposed to optimize the original RKM clustering, while the RKMD is proposed to compute membership values to handle the overlap objects within boundary area. The DR and RKMD are then hybridized to perform the overlap clustering called DR-RKMD algorithm. The DR-RKMD is then enhanced to validate its performance towards the outlier detection task. A new outlier detection factor is proposed in this work. The aim is to show that the DR-RKMD is able to significantly detect the outliers since the performance of clustering method effects the performance of the clustering-based outlier detection method. The experiments of the proposed algorithms are conducted in three phases. Firstly, the DR is validated using Davies Bouldin index which is performed by simulating the threshold value which influences the capability to control the vague objects. DR shows better performance compared to previous methods since it allows the adjustment of vague objects efficiently. Secondly, the DR-RKMD is validated in terms of the preciseness of the object to be assigned into appropriate cluster, the compactness, and the separation of the cluster. The measures used are Sum Square Error, Dunn index, and Silhouette index. The results show that the proposed algorithm outperformed previous algorithms on to several overlap dataset. The complexity which represents computation cost is also reduced. Thirdly, the improved DR-RKMD for outlier detection with new detection factor produces better detection rate than several previous methods when tested upon several benchmark datasets.,Pengelompokan bertindih penting dalam bidang perlombongan data berikutan dengan isu kekaburan dan ketidak pastian yang wujud pada data sebenar. Keadaan ini menyebabkan perbezaan antara kelas adalah kurang jelas dan algoritma pengelompokan konvensional kerap gagal untuk menemui kelompok yang sesuai. Fuzzy C-Means (FCM) merupakan algoritma pengelompokan yang terkenal kerana ia mudah dan berprestasi tinggi. Kelebihan utama FCM ialah ia lebih bersifat lebih tabii disebabkan objek tidak dipaksa untuk berada dalam satu daripada kelas dengan menggunakan darjah keahlian kelas yang berbeza. Disebabkan FCM bersandar kepada pengukuran jarak untuk pengiraan keahlian, dua masalah dikenal pasti. Pertama, ia sensitif kepada data terpencil dan prestasinya menurun apabila dimensi data bertambah. Kedua, penggunaan benih awal secara rawak menyebabkan masalah optima tempatan. Pada masa kini, algoritma Rough K-Means (RKM) dibangunkan untuk menangani kekaburan data dalam pengelompokan, dan ia telah menandingi FCM semasa menghadapi objek bertindan menggunakan konsep kawasan sempadan. Walau bagaimanapun, RKM kurang desriptif berbanding FCM, kerana ia hanya memisahkan objek kabur dari objek rapuh. Oleh itu, dalam kajian ini, satu kaedah pengelompokan berasaskan set kasar hibrid dicadangkan untuk menangani masalah pengelompokan bertindan tersebut. Dua algoritma dicadangkan untuk masalah pengelompokan bertindan iaitu, algoritma berasaskan kebolehbezaan dalam Rough Set (KR) untuk pengiraan benih awal, dan algoritma kebolehbezaan KR-KRKM. Algoritma KR dicadangkan untuk mengoptimumkan pengelompokan RKM yang asal, sementara KRKM dicadangkan untuk mengira nilai keahlian bagi menagani objekobjek bertindan di kawasan sempadan. KR dan KRKM kemudian di hibrid untuk melaksanakan pengelompokan bertindan dinamakan algoritma KR-KRKM. KRKRKM kemudian ditambahbaik untuk menilai prestasinya keatas fungsi pengesanan data terpencil. Satu faktor pengesanan data terpencil yang baru dicadangkan dalam kajian ini. Tujuannya untuk menunjukkan yang KR-KRKM berupaya mengesan secara signifikan data terpencil kerana prestasi kaedah pengelompokan mempengaruhi prestasi kaedah pengesanan data terpencil berasaskan pengelompokan.Ujikaji keatas algoritma yang dicadangkan dijalankan dalam tiga fasa. Pertama, KR dinilai menggunakan indeks David Bouldin yang dilaksanakan dengan mengsimulasi nilai ambang yang mempengaruhi kebolehan mengawal objek kabur. KR menunjukkan prestasi yang lebih baik berbanding dengan kaedah terdahulu kerana ia membenarkan pengubahsuaian objek kabur dengan cekap. Kedua, KR-KRKM dinilai dalam terma kejituan objek diumpuk ke kelompok yang betul, kepadatan, dan pemisahan kelompok. Pengukuran yang digunakan ialah ralat jumlah kuasa dua, indeks Dunn, dan indeks Silhouette. Keputusan menunjukkan algoritma yang dicadangkan telah menandingi algoritma terdahulu keatas beberapa set data bertindan. Kekompleksan yang mewakili kos komputeran juga dikurangkan. Ketiga, KR-KRKM yang ditambahbaik untuk pengesanan data terpencil dengan faktor pengesanan yang baru menghasilkan kadar pengesanan yang lebih baik dari beberapa kaedah terdahulu apabila diuji keatas beberapa set data penanda aras.,PhD
dc.language.iso	eng
dc.publisher	UKM, Bangi
dc.relation	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat
dc.rights	UKM
dc.subject	Algorithms
dc.subject	Cluster analysis
dc.title	Hybrid discernibility of rough set based algorithms for overlap clustering
dc.type	Theses
dc.format.pages	167
dc.identifier.callno	QA278.D566 2014 3 tesis
dc.identifier.barcode	001064
Appears in Collections:	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:

File	Description	Size	Format
ukmvital_79957+SOURCE1+SOURCE1.0.PDF Restricted Access		2.07 MB	Adobe PDF	View/Open

Show simple item record Recommend this item