Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/513418
Title: Adverse drug reaction (ADR) extraction and medical sentiment analysis through syntactic-based lexicon enhancement with word and document embedding approach
Authors: Rami Naim Mohammad Yousef (P88515)
Supervisor: Sabrina Tiun, Dr.
Keywords: Universiti Kebangsaan Malaysia -- Dissertations
Dissertations, Academic -- Malaysia
Social networks
Biomedical textual information
Adverse drug reactions
Information storage and retrieval systems -- Medicine
Issue Date: 6-Oct-2020
Description: The emergence of social networks has led to a considerable expansion of Biomedical textual information. A new entity has been detected within such expression, namely, adverse drug reactions (ADRs). ADR can be defined as the side effect that can be mentioned in people’s opinions. Extracting such entities along with categorizing their polarities (i.e. positive or negative) would offer the opportunity to obtain valuable feedback on important drugs and medicines. The earliest research efforts on ADR extraction relied on rule-based and trigger terms. Yet, such rules and trigger terms required an extension task wherein numerous new terms occurred with ADRs. This is because the syntactic aspect of the rules and trigger terms has not been adequately addressed in the literature. On the other hand, the use of lexicon could contribute towards improving the ADR extraction. However, the utilization of lexicon in the literature was intended to change the whole text where the concept of text expansion would be applied. This, accordingly, might affect further analysis on the medical review document such as the sentiment classification. Apart from the ADR extraction, classifying the medical review documents into positive and negative (i.e. sentiment analysis) is a challenging task in which the text inside such documents was written informally. This study, hence, aims to propose a method for ADR extraction and sentiment classification. First, this study proposes an extension of trigger terms as features. Second, an enhancement of medical lexicon is applied using a pre-trained model of biomedical word embedding. Thirdly, a new document embedding approach is proposed based on Recurrent Neural Network (RNN) in order to improve the medical sentiment document vectors. Two benchmark datasets of medical review documents have been considered in the experiments. In addition, two baseline studies have been selected within the comparisons. Furthermore, two vector space models have been considered in the experiments including Count Vector (CV) and Term Frequency Inverse Document Frequency (TFIDF) alongside multiple language models including unigram, bigram, trigram and quadgram. Results show that the proposed extension of rules has outperformed the baseline, as well as, the trigger terms in which the best results are achieved by the combination of proposed trigger terms and Support Vector Machine (SVM). This is on the basis of CV and quadgram where the f1-score is 0.88 for the first dataset and 0.69 for the second dataset. On the other hand, results of applying the lexicon substitution method show that it outperforms both the baseline and the proposed extension of trigger terms; especially the use of Logistic Regression (LR) classifier with unigram and CV in which the f1-score is 0.87 for the first dataset and 0.91 for the second dataset. Finally, the document embedding method has been applied in order to perform the sentiment classification where LR and SVM obtain an F1-score of 0.90 and 0.89 respectively based on the proposed embedding. Such results show that the proposed document embedding outperforms the baseline study. All these findings demonstrate the usefulness of the proposed methods for extracting ADR and classifying the sentiment polarity.,Ph.D
Pages: 236
Publisher: UKM, Bangi
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_130488+Source01+Source010.PDF
  Restricted Access
3.02 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.