Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476454
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorSabrina Tiun, Dr.
dc.contributor.authorMustafa Abdulkareem Hussein (P80533)
dc.date.accessioned2023-10-06T09:18:46Z-
dc.date.available2023-10-06T09:18:46Z-
dc.date.issued2016-10-21
dc.identifier.otherukmvital:96935
dc.identifier.urihttps://ptsldigital.ukm.my/jspui/handle/123456789/476454-
dc.descriptionSocial media text, such as tweets is a challenge for natural language processing. Twitter text is difficult to Part Of Speech (POS) tag as it is noisy, with linguistic errors and idiosyncratic style. In addition, most tweets are linguistically fairly well-formed and words in sentences are not in proper syntactic. Furthermore, dealing with Arabic creates additional challenges for researchers working on NLP, since tweets are written in several Arabic spoken dialects which lack of standardization, written in free-text and show significant variation from Modern Standard Arabic (MSA). This research aims to design and implement Part Of Speech tagging models for Arabic Tweets based on the investigation of several machine learning models, Naive Bayes (NB), KNN, Support Vector Machine (SVM) and Decision tree (DT) models. In addition, this research also investigates the contribution of domain-independent features on the performance of the Part Of Speech tagging models. To evaluate different state-of-the-art POS tagging models, this work uses a new Arabic twitter corpus and Modern Standard Arabic corpus as the dataset. Results show that when classifiers trained using standard Arabic corpus, the highest result is obtained by DT classifier with 85.54 % of F-measure. Whereas, when Arabic twitter corpus was used as a trained data, the highest result yield by SVM classifier with 91.36% of F-measure. Thus, we conclude that classifiers trained using Twitter corpus performed better than classifiers trained using standard Arabic corpus and achieved highest accuracy of 91.36% for POS tagging models for Arabic tweets.,Certification of Master's/Doctoral Thesis" is not available
dc.language.isoeng
dc.publisherUKM, Bangi
dc.relationFaculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat
dc.rightsUKM
dc.subjectArabic tweet
dc.subjectMachine learning
dc.subjectSpeech tagging
dc.subjectUniversiti Kebangsaan Malaysia -- Dissertations
dc.titlePart of speech tagging model for Arabic tweet based on machine learning
dc.typetheses
dc.format.pages63
dc.identifier.barcode002831(2017)
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_96935+SOURCE1+SOURCE1.0.PDF
  Restricted Access
394.55 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.