Machine learning and lexicon-based approach for Arabic sentiment analysis

Tareq Abdo Abdullah Al-Moslmi

Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476210

Title:	Machine learning and lexicon-based approach for Arabic sentiment analysis
Authors:	Tareq Abdo Abdullah Al-Moslmi
Supervisor:	Nazlia Omar, Prof. Dr.
Keywords:	Sentiment analysis Semantic computing
Issue Date:	2014
Description:	The fast-growing of Web opinion data has led to the need for automatic tools to analyse and understand people's sentiments toward different topics. However, opinions and sentiments are often conveyed implicitly through Latent Semantics using informal language, which make the sentiment analysis tasks more difficult. Sentiment analysis in Arabic language is a challenging task due to its morphological richness and a large variation of Arabic dialects. In addition, most of the resources in the domain of sentiment analysis, such as SentiWordNet, publicly available for research are mainly for English and therefore we believe there is a need for resources for languages like Arabic. Such sentiment lexicons can be used to design cross-domain Arabic Sentiment Analyzer (ASA), as current approaches applied to Arabic sentiment analysis are typically based on Machine learning approaches which are domain-independent techniques. This research presents a new hybrid approach for Arabic sentiment analysis and opinion mining. First, we present a new open source sentiment lexicon for Arabic sentiment words and phrases. The Arabic senti-lexicon is a list of 4,800 positive and negative words and phrases that occur frequently in Arabic online opinion data, manually annotated with their part-of-speech, polarity strength, subjectivity and intensity. In addition, we present a lexicon-based method for Arabic sentiment analysis. The Arabic Sentiment Analyzer (ASA) uses lexicons of positive and negative words annotated with their semantic orientation (polarity and strength), and incorporates intensification and negation. Secondly, we have investigated seven feature selection methods (Information Gain, Principal Components Analysis, Relief-F, Gini index, Uncertainty, Chi-square and SVM) and three machine learning classifiers (SVM, Naive Bayes and KNN) to classify the polarity of Arabic sentiment classification. Finally, a hybrid approach based on the integration of both a lexicon based approach and a machine learning approach is introduced. A wide range of comparative experiments are conducted on two Arabic data sets; Opinion Corpus for Arabic (OCA). Experimental results show that feature reduction methods are found to improve performance of the classifiers. In one hand, the experiments show that the SVM classifier with the SVM-based feature selection method yields the best classification method with 92.4 % accuracy. Experimental results also show that the proposed machine learning and lexicon-based method significantly improves the overall performance of 94.98% over the baseline methods.,Master/Sarjana
Pages:	85
Call Number:	QA76.5913.M639 2014 3 tesis
Publisher:	UKM, Bangi
URI:	https://ptsldigital.ukm.my/jspui/handle/123456789/476210
Appears in Collections:	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:

File	Description	Size	Format
ukmvital_75397+Source01+Source010.PDF Restricted Access		1.63 MB	Adobe PDF	View/Open

Show full item record Recommend this item