Part of speech tagging model for Arabic tweet based on machine learning

Mustafa Abdulkareem Hussein (P80533)

Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476454

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Sabrina Tiun, Dr.
dc.contributor.author	Mustafa Abdulkareem Hussein (P80533)
dc.date.accessioned	2023-10-06T09:18:46Z	-
dc.date.available	2023-10-06T09:18:46Z	-
dc.date.issued	2016-10-21
dc.identifier.other	ukmvital:96935
dc.identifier.uri	https://ptsldigital.ukm.my/jspui/handle/123456789/476454	-
dc.description	Social media text, such as tweets is a challenge for natural language processing. Twitter text is difficult to Part Of Speech (POS) tag as it is noisy, with linguistic errors and idiosyncratic style. In addition, most tweets are linguistically fairly well-formed and words in sentences are not in proper syntactic. Furthermore, dealing with Arabic creates additional challenges for researchers working on NLP, since tweets are written in several Arabic spoken dialects which lack of standardization, written in free-text and show significant variation from Modern Standard Arabic (MSA). This research aims to design and implement Part Of Speech tagging models for Arabic Tweets based on the investigation of several machine learning models, Naive Bayes (NB), KNN, Support Vector Machine (SVM) and Decision tree (DT) models. In addition, this research also investigates the contribution of domain-independent features on the performance of the Part Of Speech tagging models. To evaluate different state-of-the-art POS tagging models, this work uses a new Arabic twitter corpus and Modern Standard Arabic corpus as the dataset. Results show that when classifiers trained using standard Arabic corpus, the highest result is obtained by DT classifier with 85.54 % of F-measure. Whereas, when Arabic twitter corpus was used as a trained data, the highest result yield by SVM classifier with 91.36% of F-measure. Thus, we conclude that classifiers trained using Twitter corpus performed better than classifiers trained using standard Arabic corpus and achieved highest accuracy of 91.36% for POS tagging models for Arabic tweets.,Certification of Master's/Doctoral Thesis" is not available
dc.language.iso	eng
dc.publisher	UKM, Bangi
dc.relation	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat
dc.rights	UKM
dc.subject	Arabic tweet
dc.subject	Machine learning
dc.subject	Speech tagging
dc.subject	Universiti Kebangsaan Malaysia -- Dissertations
dc.title	Part of speech tagging model for Arabic tweet based on machine learning
dc.type	theses
dc.format.pages	63
dc.identifier.barcode	002831(2017)
Appears in Collections:	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:

File	Description	Size	Format
ukmvital_96935+SOURCE1+SOURCE1.0.PDF Restricted Access		394.55 kB	Adobe PDF	View/Open

Show simple item record Recommend this item