Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476458
Title: Enhanced word embedding technique for biomedical named entity recognition
Authors: Maan Tareq Abd (P84263)
Supervisor: Masnizah Mohd, Assoc. Prof. Dr.
Keywords: Biomedical entity recognition
Word embedding
Molecular biology
Universiti Kebangsaan Malaysia -- Dissertations
Issue Date: 22-Jun-2017
Description: Biomedical named entity recognition is the process to identify and classify technical entities in the domain of molecular biology such as protein, gene names, cell types, virus names and DNA sequence. Biomedical named entities have inherently complex structures which poses a big challenge for their identification and classification. The state of the art in supervised machine learning models still suffer from low performance in biomedical entity recognition task where there is still a wide gap between their performance in news-wire domains (≈ 91%) and their performance in biomedical domains (≈ 78%). To handle this problem, this research explores different effective word representations with support vector machine learning method to deal with the special characteristics of biomedical named entities. First, this research identifies and evaluates a set of morphological and contextual features with support vector machine learning method for biomedical named entity recognition. In addition, this research studies the effect of using prototypical word embedding technique (PWE) on the performance of support vector machine learning method. Furthermore, this research proposes a new model based on support vector machine and extended distributed prototypical word embedding technique (EDRWE) for biomedical named entity recognition. These models are evaluated on widely used standard biomedical named entity recognition dataset namely GENIA corpus. The results show that support vector machine model with morphological and contextual features achieves a good results with an overall F-measure of 70.6%. In addition, experimental results also show that both PWE and EDRWE word embedding technique achieve higher performance with an F-measure of 76.97% and 82.8% respectively, and significantly improves the overall performance of support vector machine learning for biomedical named entity recognition over traditional features representation technique. In general, results show that word representation is a key factor in constructing suitable recognition method.,Certification of Master's/Doctoral Thesis" is not available
Pages: 75
Publisher: UKM, Bangi
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_97956+SOURCE1+SOURCE1.0.PDF
  Restricted Access
295.8 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.