Semantic representation approach based on lexical knowledge sources for semantic relatedness measurement

Abdulgabbar Mohammed Saleh Saif (P60052)

Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/513244

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Mohd Juzaiddin Ab Aziz, Assoc. Prof. Dr.	-
dc.contributor.author	Abdulgabbar Mohammed Saleh Saif (P60052)	-
dc.date.accessioned	2023-10-16T04:34:58Z	-
dc.date.available	2023-10-16T04:34:58Z	-
dc.date.issued	2015-06-25	-
dc.identifier.other	ukmvital:83271	-
dc.identifier.uri	https://ptsldigital.ukm.my/jspui/handle/123456789/513244	-
dc.description	The Explicit Semantic Analysis (ESA) is a knowledge-based model that has received wide attention from researchers in the computational linguistics and the information retrieval fields. Based on the human organized language resources, ESA builds the semantic representation of the words depending on the textual definition of the concepts in the certain knowledge source. However, the representation vectors formed by ESA model are generally very excessive, high dimensional, and may contain many redundant concepts. Furthermore, the representation vector of a word is populated by conflating all the textual definitions (contexts) of the constituent, which ultimately is equivalent to conflating all of the different meanings of the ambiguous word. The main aim of this thesis is to propose a reduced dimension representation method that constructs the semantic interpretation of the words as vectors over the latent topics from the original ESA representation vectors. For modeling the latent topics, the Latent Dirichlet Allocation (LDA) is adapted to the ESA vectors for extracting the topics as probability distributions over the concepts rather than the words in the traditional model. The proposed method is applied to the wide knowledge sources used in the computational semantic analysis: WordNet and Wikipedia. On the English sources with high degree of the completeness, the proposed method is evaluated in two natural language processing tasks: measuring the semantic relatedness between words/texts and text clustering. The experimental results indicate that the reduced dimension representation method outperforms the baseline models in measuring the semantic relatedness and text clustering across several golden standard evaluation data sets. Moreover, on the text clustering task, the proposed method improved the performance of the clustering algorithm based on the conventional bag of words representation model in terms of the evaluation measures and the computational aspects. Since the knowledge-based methods depends mainly on the quality and quantity of the exploited knowledge sources, the non-English lexical sources with poor semantics such as Arabic cannot provide enough semantic evidence for addressing the ambiguity and synonymy issues in measuring the relatedness. To overcome the limitations of Arabic WordNet, the cross-lingual technique is proposed for mapping the synsets in Arabic WordNet to their corresponding concepts in Wikipedia. For evaluating this technique, Arabic mapping data set which contains 1,291 synset-article pairs is compiled. The proposed technique that utilized the cross-lingual features achieved the higher accuracy value (93.6%) than the accuracy values (ranged between 77.0% and 82.7%) of the state-of-the-art methods that depend only on the monolingual features. The experimental analysis shows that the leveraging of bilingual features is useful for improving the mapping task either for the synonymy or ambiguity issues.,Certification of Master's/Doctoral Thesis" is not available	-
dc.language.iso	eng	-
dc.publisher	UKM, Bangi	-
dc.relation	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat	-
dc.rights	UKM	-
dc.subject	Semantic representation	-
dc.subject	Text clustering	-
dc.subject	Semantic interpretation	-
dc.subject	Lexical knowledge	-
dc.subject	Dissertations, Academic -- Malaysia	-
dc.title	Semantic representation approach based on lexical knowledge sources for semantic relatedness measurement	-
dc.type	Theses	-
dc.format.pages	161	-
Appears in Collections:	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:

File	Description	Size	Format
Semantic representation approach based on lexical knowledge sources for semantic relatedness measurement.pdf Restricted Access	Partial	306.9 kB	Adobe PDF	View/Open

Show simple item record Recommend this item