Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/513321
Title: Cross-lingual sentiment analysis from English to Arabic using supervised and semi-supervised approach
Authors: Adel Qasem Abdo Al-Shabi (P72760)
Supervisor: Nazlia Omar, Assoc. Prof. Dr.
Keywords: Cross-lingual sentiment
Computational linguistics
Issue Date: 5-Mar-2018
Description: Given the massive popularity and quantity of multilingual user-generated content on social media, the need for effective multilingual and cross-lingual sentiment analysis (CLSA) is becoming increasingly important. This is a challenging task as the amount of training data in languages other than English is very limited to carry out precise inlanguage sentiment classification. CLSA refers to the task of using annotated sentiment corpora in one language (e.g. English) as training data to predict the sentiment polarity of the data in another language. Current state-of-the-art CLSA methods suffer from low performance as machine translation quality is far from satisfactory. In addition, these methods use traditional feature representation, which use both noise and correct classification information from translated data. Moreover, the significant features distribution difference between source and target languages severely hinder the performance of these methods. To handle these problems, this research aims to design and develop new methods to improve the performance of cross-lingual sentiment classification from English to Arabic. First, to reduce the effect of the erroneous translation of training or test data, this research designs an enhanced translation based cross-lingual sentiment analysis model based on translation quality improvement through bilingual lexicon of opinion expressions. This phase presents a modified multiclass label propagation algorithm for automatically extracting and labelling of opinion expressions. However, there is a need for a mechanism to obviate noisy features (negative transfer) with useful sentiment features. Consequently, new CLSA model based on an effective language-independent data representation scheme, which uses new opinion expression-based, and graph-based features have been proposed. This phase identifies new features from each set and combines these features to represent each review. In many practical cases, there is a significant features distribution difference between source and target language due the difference of culture, linguistic expressions, writing style and people interest. Consequently, new semi-supervised graph-based label propagation CLSA models with Prior Supervised Induction Approach (CLSA-PSIA) is proposed that exploit the useful sentiment knowledge in the unlabelled target language to bridge the gap between source and target languages. The CLSA-PSIA combines a prior supervised learning from translated data with a semi-supervised learning from unlabelled target data. This proposed model is evaluated using training data generated by translating English sentiment classification dataset i.e. Amazon dataset. A manually corrected standard test data is generated to test the performance of CLSA model. The results show that the enhanced translation based cross-lingual sentiment analysis model outperforms the baseline translation-based models that are employed in the comparison. In addition, experimental results also show that new data representation scheme significantly improves the overall performance of cross language sentiment analysis regardless of the artificial noise added by machine translation. Moreover, the CLSA-PSIA model successfully bridges the gap between source and target languages and achieves comparable results to the upper bound model.,Certification on Master's/Doctoral Thesis" is not available
Pages: 179
Call Number: P98.A448 2018 3 tesis
Publisher: UKM, Bangi
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_100134+SOURCE1+SOURCE1.0.PDF
  Restricted Access
286.52 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.