Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476387
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorNazlia Omar, Prof. Madya Dr.
dc.contributor.authorKadhim Mahmood H. (P59480)
dc.date.accessioned2023-10-06T09:17:34Z-
dc.date.available2023-10-06T09:17:34Z-
dc.date.issued2013-01-14
dc.identifier.otherukmvital:84859
dc.identifier.urihttps://ptsldigital.ukm.my/jspui/handle/123456789/476387-
dc.descriptionAutomatic Text Categorization (ATC) is a task of categorising an electronic document to a predefined category automatically based on its content. There are many supervised Machine Learning (ML) techniques that has been used to solve Text Categorization (TC) problem. Statistical learning is one of ML techniques that is based on providing a prospect that a given document belongs in every category. One of the common statistical learning techniques is Bayesian learning which is based on Bayesian theorem. At present, many researchers are interested in using Arabic ATC. In fact, most of the used method in this area is based on Bayesian learning algorithm. However, some of the Bayesian learning techniques are still under investigation. This effort deals with Arabic ATC problem based on probabilistic Bayesian learning. Bayesian learning classifiers that has been applied are Multivariate Guess Naive Bayes (MGNB), Flexible Bayes (FB), Multivariate Bernoulli Naive Bayes (MBNB), and Multinomial Naive Bayes (MNB). The proposed method covers three parts. The first part is the text Pre-processing which include Bag-of-Word (BOW), the second part is the text representation which include word level N-Gram; 1-Gram, 2-Gram and 3-Gram , and the third part is the feature selection technique which include Chi-Square Statistic, Odd Ratio, Mutual Information, and GSS Coefficient. For Arabic stemming, a simple stemmer called TREC-2002 Light Stemmer is used in the prototype. The Arabic corpus is collected from online newspapers which consist of 3172 documents varying in length which fill into four predefined categories; Art, Economy, Politics, and Sport. 1732 documents are allocated for the training set, and 1440 documents for the test set. The results showed that FB outperforms MNB, MBNB, and MGNB. The experimental results of this work proved that using word level n-gram for ATC based on Bayesian learning leads to acceptable results, although BOW (1-gram) leads to the finest performance on the whole.,Master / Sarjana
dc.language.isoeng
dc.publisherUKM, Bangi
dc.relationFaculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat
dc.rightsUKM
dc.subjectAutomatic Text Categorization
dc.subjectBayesian learning
dc.subjectText processing (Computer science)
dc.titleAutomatic Arabic text categorization using Bayesian learning
dc.typetheses
dc.format.pages76
dc.identifier.callnoQA76.9.T48K335 2013
dc.identifier.barcode002014
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_84859+SOURCE1+SOURCE1.0.PDF
  Restricted Access
2.01 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.