Please use this identifier to cite or link to this item:
https://ptsldigital.ukm.my/jspui/handle/123456789/476454
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Sabrina Tiun, Dr. | |
dc.contributor.author | Mustafa Abdulkareem Hussein (P80533) | |
dc.date.accessioned | 2023-10-06T09:18:46Z | - |
dc.date.available | 2023-10-06T09:18:46Z | - |
dc.date.issued | 2016-10-21 | |
dc.identifier.other | ukmvital:96935 | |
dc.identifier.uri | https://ptsldigital.ukm.my/jspui/handle/123456789/476454 | - |
dc.description | Social media text, such as tweets is a challenge for natural language processing. Twitter text is difficult to Part Of Speech (POS) tag as it is noisy, with linguistic errors and idiosyncratic style. In addition, most tweets are linguistically fairly well-formed and words in sentences are not in proper syntactic. Furthermore, dealing with Arabic creates additional challenges for researchers working on NLP, since tweets are written in several Arabic spoken dialects which lack of standardization, written in free-text and show significant variation from Modern Standard Arabic (MSA). This research aims to design and implement Part Of Speech tagging models for Arabic Tweets based on the investigation of several machine learning models, Naive Bayes (NB), KNN, Support Vector Machine (SVM) and Decision tree (DT) models. In addition, this research also investigates the contribution of domain-independent features on the performance of the Part Of Speech tagging models. To evaluate different state-of-the-art POS tagging models, this work uses a new Arabic twitter corpus and Modern Standard Arabic corpus as the dataset. Results show that when classifiers trained using standard Arabic corpus, the highest result is obtained by DT classifier with 85.54 % of F-measure. Whereas, when Arabic twitter corpus was used as a trained data, the highest result yield by SVM classifier with 91.36% of F-measure. Thus, we conclude that classifiers trained using Twitter corpus performed better than classifiers trained using standard Arabic corpus and achieved highest accuracy of 91.36% for POS tagging models for Arabic tweets.,Certification of Master's/Doctoral Thesis" is not available | |
dc.language.iso | eng | |
dc.publisher | UKM, Bangi | |
dc.relation | Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat | |
dc.rights | UKM | |
dc.subject | Arabic tweet | |
dc.subject | Machine learning | |
dc.subject | Speech tagging | |
dc.subject | Universiti Kebangsaan Malaysia -- Dissertations | |
dc.title | Part of speech tagging model for Arabic tweet based on machine learning | |
dc.type | theses | |
dc.format.pages | 63 | |
dc.identifier.barcode | 002831(2017) | |
Appears in Collections: | Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ukmvital_96935+SOURCE1+SOURCE1.0.PDF Restricted Access | 394.55 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.