Please use this identifier to cite or link to this item:
https://ptsldigital.ukm.my/jspui/handle/123456789/513416
Title: | Named-entity disambiguation for ontology population using embedding-based context-entity semantic relatedness |
Authors: | Mohamed Lubani (P84179) |
Supervisor: | Shahrul Azman Mohd Noah, Prof. Dr. |
Keywords: | Universiti Kebangsaan Malaysia -- Dissertations Dissertations, Academic -- Malaysia Ontology population Ontology Knowledge acquisition (Expert systems) |
Issue Date: | 20-May-2020 |
Description: | Ontology population is the task of updating an ontology with new realizations of concepts and relations extracted from unstructured text. To maintain a sound and useful ontology, multiple instance assertions should be avoided and instances should be asserted to their correct places in the ontology. For this reason, entities need to be disambiguated and linked to their correct senses in the ontology based on their contexts. In order to be used for context-entity semantic relatedness, text elements need to be associated with representations that reflect their properties. The distributed vector representations capture the semantic and syntactic properties of text elements and embed them in numerical vector representations referred to as embeddings. Building these vectors for the task of entity disambiguation requires a tagged text corpus in which entities are detected and linked to their correct senses. However, manually building such a corpus is too expensive and should be avoided. In addition, the corpus should include entities that already exist in the ontology as well as other entities that co-occur with them. Furthermore, to assess context-entity semantic relatedness, a context vector representation (embedding) is needed. Most existing methods either use pre-annotated text corpora or utilize the hyperlinks in Wikipedia pages to construct the training corpus. Such methods are unable to automatically annotate new unstructured plain text when Wikipedia hyperlinks are not present. For context representation, most existing methods simply combine the vector representations of the context's elements without considering the specific nature of the entity disambiguation task. This study aims to propose a method of building a tagged text corpus from unstructured plain text related to the entities in the ontology. In addition, a method of building context vector representations is proposed to enhance the features related to the entity disambiguation task. To achieve this objective, Wikidata knowledge base is utilized to link entity mentions in the text to their correct senses. Entity acronyms, aliases and semantic relations are used to detect and link entities to their senses. Once generated, the corpus is used to build the vector representations of words and entities using an extended skip-gram model. A modified autoencoder is used to build the final context and entity representations as well as map the related representations to close points in the vector space. This assists in the building of dedicated context vector representations in which entity disambiguation related features are enhanced and noisy-irrelevant features are eliminated. Based on the similarities between the built vector representations as well as other entity contextindependent features, an entity disambiguation model is proposed. The proposed model achieved near state-of-the-art disambiguation accuracy of 93.76% and outperformed recent embedding-based disambiguation methods when tested using the AIDA CoNLL-YAGO dataset.,Ph.D |
Pages: | 228 |
Publisher: | UKM, Bangi |
Appears in Collections: | Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ukmvital_130426+Source01+Source010.PDF Restricted Access | 2.56 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.