Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476471
Title: Schema matching for large-scale data based on ontology clustering method
Authors: Harith Oraibi Jameel Alani (P78918)
Supervisor: Saidah Saad, Dr.
Keywords: Semantic
Metadatabases -- Management
Issue Date: 10-Aug-2017
Description: Holistic schema matching is the process of identifying semantic correspondences among multiple schemas at once. The key challenge behind holistic schema matching lies in selecting an appropriate method that has the ability to maintain effectiveness and efficiency. Effectiveness refers to the quality of matching while efficiency refers to the time and memory consumed within the matching process. Several approaches have been proposed for holistic schema matching in the literature. These approaches were mainly dependent on clustering techniques. The problem behind the clustering lies on addressing the similarity among fields in a single level manner (e.g. departure is similar to from). Fields on schemas contain much complicated semantic relations due to schema level. Ontology which is a hierarchy of taxonomies, has the ability to identify semantic correspondences with various levels. Hence, this study aims to propose an ontology-based clustering approach for holistic schema matching. Two datasets have been used from ICQ query interfaces consisting of 40 interfaces, which refer to Airfare and Job. The ontology used in this study has been built using the both statistical and semantic methods. First the statistical methods which consists of Term Frequency – Inverse Document Frequency (TF-IDF) and Term Co-occurrence have been used to create the main classes and sub-classes of the ontology. Second, the semantic method which consist of XBenchMatch dictionary which is a benchmark lexicon that contains rich semantic correspondences for the field of schema matching, aims to populate the ontology. In order to accommodate the schema matching using the ontology, a rule-based clustering approach is used with multiple distance measures including Dice, Cosine and Jaccard. The evaluation has been conducted using the common information retrieval metrics; precision, recall and f-measure. In order to assess the performance of the proposed ontology-based clustering, a comparison among two experiments has been performed. The first experiment aims to conduct the ontology-based clustering approach (i.e. using ontology and rule-based clustering), while the second experiment aims to conduct the traditional clustering approaches without the use of ontology. Results show that the proposed ontology-based clustering approach has outperformed the traditional clustering approaches without ontology by achieving an f-measure of 94% for Airfare and 92% for Job datasets. This emphasizes the strength of ontology in terms of identifying correspondences with semantic level variation.,Certification of Master's/Doctoral Thesis" is not available
Pages: 108
Call Number: T58.6.A433 2017 3 tesis
Publisher: UKM, Bangi
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_99000+SOURCE1+SOURCE1.0.PDF
  Restricted Access
739.6 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.