Cross-lingual sentiment analysis from English to Arabic using supervised and semi-supervised approach

Adel Qasem Abdo Al-Shabi

Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/513321

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Nazlia Omar, Assoc. Prof. Dr.	-
dc.contributor.author	Adel Qasem Abdo Al-Shabi	-
dc.contributor.other	P72760	-
dc.date.accessioned	2023-10-16T04:35:30Z	-
dc.date.available	2023-10-16T04:35:30Z	-
dc.date.issued	2018-03-05	-
dc.identifier.other	P72760	-
dc.identifier.uri	https://ptsldigital.ukm.my/jspui/handle/123456789/513321	-
dc.description	Given the massive popularity and quantity of multilingual user-generated content on social media, the need for effective multilingual and cross-lingual sentiment analysis (CLSA) is becoming increasingly important. This is a challenging task as the amount of training data in languages other than English is very limited to carry out precise inlanguage sentiment classification. CLSA refers to the task of using annotated sentiment corpora in one language (e.g. English) as training data to predict the sentiment polarity of the data in another language. Current state-of-the-art CLSA methods suffer from low performance as machine translation quality is far from satisfactory. In addition, these methods use traditional feature representation, which use both noise and correct classification information from translated data. Moreover, the significant features distribution difference between source and target languages severely hinder the performance of these methods. To handle these problems, this research aims to design and develop new methods to improve the performance of cross-lingual sentiment classification from English to Arabic. First, to reduce the effect of the erroneous translation of training or test data, this research designs an enhanced translation based cross-lingual sentiment analysis model based on translation quality improvement through bilingual lexicon of opinion expressions. This phase presents a modified multiclass label propagation algorithm for automatically extracting and labelling of opinion expressions. However, there is a need for a mechanism to obviate noisy features (negative transfer) with useful sentiment features. Consequently, new CLSA model based on an effective language-independent data representation scheme, which uses new opinion expression-based, and graph-based features have been proposed. This phase identifies new features from each set and combines these features to represent each review. In many practical cases, there is a significant features distribution difference between source and target language due the difference of culture, linguistic expressions, writing style and people interest. Consequently, new semi-supervised graph-based label propagation CLSA models with Prior Supervised Induction Approach (CLSA-PSIA) is proposed that exploit the useful sentiment knowledge in the unlabelled target language to bridge the gap between source and target languages. The CLSA-PSIA combines a prior supervised learning from translated data with a semi-supervised learning from unlabelled target data. This proposed model is evaluated using training data generated by translating English sentiment classification dataset i.e. Amazon dataset. A manually corrected standard test data is generated to test the performance of CLSA model. The results show that the enhanced translation based cross-lingual sentiment analysis model outperforms the baseline translation-based models that are employed in the comparison. In addition, experimental results also show that new data representation scheme significantly improves the overall performance of cross language sentiment analysis regardless of the artificial noise added by machine translation. Moreover, the CLSA-PSIA model successfully bridges the gap between source and target languages and achieves comparable results to the upper bound model.,Certification on Master's/Doctoral Thesis" is not available	-
dc.language.iso	eng	-
dc.publisher	UKM, Bangi	-
dc.relation	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat	-
dc.subject	Cross-lingual sentiment	-
dc.subject	Computational linguistics	-
dc.title	Cross-lingual sentiment analysis from English to Arabic using supervised and semi-supervised approach	-
dc.type	Theses	-
dc.rights.holder	UKM	-
dc.format.pages	179	-
dc.identifier.callno	P98.A448 2018 3 tesis	-
Appears in Collections:	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:

File	Description	Size	Format
Cross-lingual sentiment analysis from English to Arabic using supervised and semi-supervised approach.pdf Restricted Access	Partial	286.52 kB	Adobe PDF	View/Open

Show simple item record Recommend this item