Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476625
Title: Integrating correlation clustering and agglomerative hierarchical clustering for holistic schema matching
Authors: Basel Mahmoud Alshaikhdeeb (P59517)
Supervisor: Kamsuriah Ahmad, Dr.
Keywords: Data integration (Computer science)
Statistical matching
Cluster analysis
Universiti Kebangsaan Malaysia -- Dissertations
Dissertations, Academic -- Malaysia
Issue Date: 15-Apr-2014
Description: With the dramatic growth of heterogeneous data sources accessible over the Web, data integration has become more significant in the field of data warehouses, semantic web and e-commerce. The purpose of data integration is to provide a unified view over various heterogeneous sources. Schema matching plays an essential role in data integration by finding semantic correspondences between elements of two schemas. Recently, large schema matching, which is matching many schemas concurrently rather than pair-wise matching, has brought considerable attention. Holistic schema matching has become challenging in the field of large-scale schema matching by taking various number of schemas as input and finding the similarities among them. However, matching many input schemas may consume longer time and may produce poor quality matching. Therefore, reducing a large search space, in terms of achieving more accurate matching has become a challenging issue. Many approaches have been proposed in order to overcome the search space reduction using several clustering techniques whether partitioning clustering such as, k-means and k-methods or hierarchical clustering such as, agglomerative and divisive. However, the current approaches still have some drawbacks and need for improvement in terms of its performance. Thus, this research proposed an improved integrated clustering method in order to reduce the search space avoiding randomly initial solutions which leads to effective matching for holistic schema. The proposed clustering method is an integration of Correlation Clustering and Hierarchical Agglomerative clustering which works on maximizing the dissimilarity and minimizing the similarity between Interclusters in order to produce the initial solutions and then match the correspondences attributes toward its relevant clusters. Furthermore, a pre-processing phase has been implemented including domain dictionary and auxiliary information (such as synonyms and abbreviation). The experiments are carried out on the Airfare, Auto and Book data sets from UIUC Web Integration Repository. Each data set contains of 20 web interfaces. The results of experiments are compared with other matching approaches. It shows that Airfare, Auto and Book achieve accuracy of 0.9, 0.93 and 0.9 respectively. The proposed method can contribute towards more effective and auspicious result in solving holistic schema matching.,Master of Information Technology,Certification of Master's / Doctoral Thesis" is not available"
Pages: 78
Call Number: QA76.9.D338A457 2014 3 tesis
Publisher: UKM, Bangi
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_122171+SOURCE1+SOURCE1.0.PDF
  Restricted Access
16.43 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.