Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/513434
Title: Web search result clustering based on multiview multirepresentation ensemble clustering method
Authors: Ali Sabah Abdulameer (P88931)
Supervisor: Sabrina Tiun, Dr.
Keywords: Universiti Kebangsaan Malaysia -- Dissertations
Dissertations, Academic -- Malaysia
Metadatabases -- Management
Issue Date: 9-Jun-2021
Description: The abundance of the huge and continually increasing amount of online information makes information retrieval an essential and difficult process. As the amount of available online information daily increases, the amount of search results that have returned by a user query will also increase. Nowadays, search result responds to user queries, of which only a few are relevant. Clustering, or more specifically Web search result clustering (WSRC) is an effective technique to organize retrieval results in a meaningful cluster. Existing WSRC methods produce low-quality results in clustering short texts (snippets) of web documents mostly due to the low frequencies of document terms. An effective short-text clustering method should consider text representation and enrich semantic space and relationship between words. In addition, existing text clustering methods utilise only one representation at a time (single view), whereas documents can be represented by multiple views. With single view clustering methods there is no appropriate single clustering algorithm available for all types of data due to the different dynamic and nature of web search result datasets. Moreover, existing search results clustering methods are vulnerable to the poor cluster labels and the poor clustering quality due to the overlapping challenges. Hence, to tackle these challenges this research aims to design and develop WSRC model to enhance the performance of WSRC. To handle the problem of short-text clustering and the limited number of snippets, a new wiki- KNN-based data representation is proposed and integrates it with clustering algorithms. The incorporation of the new wiki-KNN-based representation method is to overcome short-text problems. Consequently, a new and unsupervised distributed word representation scheme where each word is represented by a vector of its semantically related words is designed. The proposed model also proposes an effective dynamic clustering method that combines multiview of data, including semantic view, lexical view (word weighting), and topic view as well as the number of clusters. This multiview combination, or named as an enhanced multiview multirepresentation ensemble clustering (MMEC) is an enhanced multiview multirepresentation clustering models that combine n-gram weighting, word embedding Word2vec, and Dirichlet multinomial mixture to generate different candidate clustering solutions. In addition, the proposed model proposes a fuzzybased overlapping handling method to deal with overlapping clusters of the existing search results. Furthermore, a new integrated unsupervised multi-topic cluster labelling method for extracting representative labels for WSRC has also been proposed. All the proposed methods in the proposed WRSC model are evaluated extensively using the standard WSRC dataset; ODP-239 and MORESQUE. The obtained results with the clustering method (namely, k-means and k-medoids are 85.44 and 87.07 respectively in terms of F-measure) indicate that all the proposed methods outperform the baseline methods. In conclusion, employing the new wiki- KNN-based representation, MMEC, and fuzzy-based overlapping handling enhance the overall performance of the WSRC.,Ph.D
Pages: 240
Publisher: UKM, Bangi
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_130620+Source01+Source010.PDF
  Restricted Access
2.34 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.