Machine learning and lexicon-based approach for Arabic sentiment analysis

Tareq Abdo Abdullah Al-Moslmi

Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476210

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Nazlia Omar, Prof. Dr.
dc.contributor.author	Tareq Abdo Abdullah Al-Moslmi
dc.contributor.other	P63385	-
dc.date.accessioned	2023-10-06T09:14:43Z	-
dc.date.available	2023-10-06T09:14:43Z	-
dc.date.issued	2014
dc.identifier.other	ukmvital:75397
dc.identifier.other	P63385	-
dc.identifier.uri	https://ptsldigital.ukm.my/jspui/handle/123456789/476210	-
dc.description	The fast-growing of Web opinion data has led to the need for automatic tools to analyse and understand people's sentiments toward different topics. However, opinions and sentiments are often conveyed implicitly through Latent Semantics using informal language, which make the sentiment analysis tasks more difficult. Sentiment analysis in Arabic language is a challenging task due to its morphological richness and a large variation of Arabic dialects. In addition, most of the resources in the domain of sentiment analysis, such as SentiWordNet, publicly available for research are mainly for English and therefore we believe there is a need for resources for languages like Arabic. Such sentiment lexicons can be used to design cross-domain Arabic Sentiment Analyzer (ASA), as current approaches applied to Arabic sentiment analysis are typically based on Machine learning approaches which are domain-independent techniques. This research presents a new hybrid approach for Arabic sentiment analysis and opinion mining. First, we present a new open source sentiment lexicon for Arabic sentiment words and phrases. The Arabic senti-lexicon is a list of 4,800 positive and negative words and phrases that occur frequently in Arabic online opinion data, manually annotated with their part-of-speech, polarity strength, subjectivity and intensity. In addition, we present a lexicon-based method for Arabic sentiment analysis. The Arabic Sentiment Analyzer (ASA) uses lexicons of positive and negative words annotated with their semantic orientation (polarity and strength), and incorporates intensification and negation. Secondly, we have investigated seven feature selection methods (Information Gain, Principal Components Analysis, Relief-F, Gini index, Uncertainty, Chi-square and SVM) and three machine learning classifiers (SVM, Naive Bayes and KNN) to classify the polarity of Arabic sentiment classification. Finally, a hybrid approach based on the integration of both a lexicon based approach and a machine learning approach is introduced. A wide range of comparative experiments are conducted on two Arabic data sets; Opinion Corpus for Arabic (OCA). Experimental results show that feature reduction methods are found to improve performance of the classifiers. In one hand, the experiments show that the SVM classifier with the SVM-based feature selection method yields the best classification method with 92.4 % accuracy. Experimental results also show that the proposed machine learning and lexicon-based method significantly improves the overall performance of 94.98% over the baseline methods.,Master/Sarjana
dc.language.iso	eng
dc.publisher	UKM, Bangi
dc.relation	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat
dc.subject	Sentiment analysis
dc.subject	Semantic computing
dc.title	Machine learning and lexicon-based approach for Arabic sentiment analysis
dc.type	theses
dc.rights.holder	UKM	-
dc.format.pages	85
dc.identifier.callno	QA76.5913.M639 2014 3 tesis
Appears in Collections:	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:

File	Description	Size	Format
ukmvital_75397+Source01+Source010.PDF Restricted Access		1.63 MB	Adobe PDF	View/Open

Show simple item record Recommend this item