Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/513244
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorMohd Juzaiddin Ab Aziz, Assoc. Prof. Dr.
dc.contributor.authorAbdulgabbar Mohammed Saleh Saif (P60052)
dc.date.accessioned2023-10-16T04:34:58Z-
dc.date.available2023-10-16T04:34:58Z-
dc.date.issued2015-06-25
dc.identifier.otherukmvital:83271
dc.identifier.urihttps://ptsldigital.ukm.my/jspui/handle/123456789/513244-
dc.descriptionThe Explicit Semantic Analysis (ESA) is a knowledge-based model that has received wide attention from researchers in the computational linguistics and the information retrieval fields. Based on the human organized language resources, ESA builds the semantic representation of the words depending on the textual definition of the concepts in the certain knowledge source. However, the representation vectors formed by ESA model are generally very excessive, high dimensional, and may contain many redundant concepts. Furthermore, the representation vector of a word is populated by conflating all the textual definitions (contexts) of the constituent, which ultimately is equivalent to conflating all of the different meanings of the ambiguous word. The main aim of this thesis is to propose a reduced dimension representation method that constructs the semantic interpretation of the words as vectors over the latent topics from the original ESA representation vectors. For modeling the latent topics, the Latent Dirichlet Allocation (LDA) is adapted to the ESA vectors for extracting the topics as probability distributions over the concepts rather than the words in the traditional model. The proposed method is applied to the wide knowledge sources used in the computational semantic analysis: WordNet and Wikipedia. On the English sources with high degree of the completeness, the proposed method is evaluated in two natural language processing tasks: measuring the semantic relatedness between words/texts and text clustering. The experimental results indicate that the reduced dimension representation method outperforms the baseline models in measuring the semantic relatedness and text clustering across several golden standard evaluation data sets. Moreover, on the text clustering task, the proposed method improved the performance of the clustering algorithm based on the conventional bag of words representation model in terms of the evaluation measures and the computational aspects. Since the knowledge-based methods depends mainly on the quality and quantity of the exploited knowledge sources, the non-English lexical sources with poor semantics such as Arabic cannot provide enough semantic evidence for addressing the ambiguity and synonymy issues in measuring the relatedness. To overcome the limitations of Arabic WordNet, the cross-lingual technique is proposed for mapping the synsets in Arabic WordNet to their corresponding concepts in Wikipedia. For evaluating this technique, Arabic mapping data set which contains 1,291 synset-article pairs is compiled. The proposed technique that utilized the cross-lingual features achieved the higher accuracy value (93.6%) than the accuracy values (ranged between 77.0% and 82.7%) of the state-of-the-art methods that depend only on the monolingual features. The experimental analysis shows that the leveraging of bilingual features is useful for improving the mapping task either for the synonymy or ambiguity issues.,Certification of Master's/Doctoral Thesis" is not available
dc.language.isoeng
dc.publisherUKM, Bangi
dc.relationFaculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat
dc.rightsUKM
dc.subjectSemantic representation
dc.subjectText clustering
dc.subjectSemantic interpretation
dc.subjectLexical knowledge
dc.subjectDissertations, Academic -- Malaysia
dc.titleSemantic representation approach based on lexical knowledge sources for semantic relatedness measurement
dc.typeTheses
dc.format.pages161
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_83271+SOURCE1+SOURCE1.0.PDF
  Restricted Access
306.9 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.