Please use this identifier to cite or link to this item:
https://ptsldigital.ukm.my/jspui/handle/123456789/563491
Title: | Automatic query expansion based word embedding using deep averaging networks and deep median networks for arabic text retrieval |
Authors: | Yasir Hadi Farhan |
Supervisor: | Shahrul Azman Mohd Noah, Prof. Dr. |
Keywords: | Universiti Kebangsaan Malaysia -- Dissertations Dissertations, Academic -- Malaysia Automatic Query Expansion |
Issue Date: | 7-Apr-2021 |
Abstract: | The mismatch between the query terms and the documents in the collection is considered as the most popular issue in the traditional Information Retrieval (IR) system. Short queries and the users' knowledge gap are the main reason behind that issue. Automatic Query Expansion (AQE) is one of the most common ways used to address the problem mentioned above. Word Embedding (WE) has recently gained interest among IR researchers. The aim of WE technique is to capture semantic and syntactic similarities between the terms, where semantically and syntactically similar words often share similar contexts. Therefore, most research in AQE recently relies on WE as a semantic modeling technique. However, most of the AQE approaches mainly rely on the assumption that each query term can select the best candidates based on their semantic closeness. Nevertheless, this assumption was unable to represent the semantic of the query terms concerning the whole content of the query. Additionally, expanding the query based on that assumption may cause the meaning of the context will be changed, or the semantic ambiguity of the query may increase. Moreover, expanding the whole query sentence is found to be less explored. Therefore, the sentence embedding techniques Deep Averaging Networks (DANs) and Deep Median Networks (DMNs) are employed to do so. DANs and DMNs take the words vectors as input, but the output of DANs is the average vector of these input vectors, whereas the median vector is the output of DMNs. The proposed DANs and DMNs were incorporated into the probabilistic model of Okapi BM25, and in addition to other query expansion approaches which are the Prospect-Guided Query Expansion Strategy (V2Q) and the Embedding-Based Query Expansion approach (EQE1), with the aim to enhance those models and make them more suitable to treat the high morphologic text such as Arabic. The experiment results demonstrated that the proposed DANs and DMNs able to improve the retrieval performance for the Arabic text, and they are more accurate when they incorporated into V2Q and EQE1 approaches. The proposed DMNs technique has demonstrated an improvement in terms of Mean Average Precision by 18.3% over the baseline Okapi BM25 model, by 37.9% over the baseline EQE1 approach, and by 34.7% over the baseline V2Q strategy for TREC 2001/2002 collection. |
Description: | Fullpage |
Notes: | P90529 |
Pages: | 206 |
Publisher: | UKM, Bangi |
Appears in Collections: | Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Yasir Hadi Thesis.pdf Restricted Access | Fullpage | 1.96 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.