A hybrid method of linguistic approach and statistical method for nested noun compound extraction

Hamed Hamdoon Ali Al-Balushi (P65643 )

Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476185

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Mohd. Juzaiddin Ab Aziz, Prof. Dr.
dc.contributor.author	Hamed Hamdoon Ali Al-Balushi (P65643 )
dc.date.accessioned	2023-10-06T09:14:27Z	-
dc.date.available	2023-10-06T09:14:27Z	-
dc.date.issued	2014-06-09
dc.identifier.other	ukmvital:75235
dc.identifier.uri	https://ptsldigital.ukm.my/jspui/handle/123456789/476185	-
dc.description	Arabic noun compound extraction has become a challenging issue in the field of NLP. Several approaches have been proposed in terms of extracting Arabic noun compounds. Some of them have used linguistic-based approach, statistical methods and the rest have used a hybrid between them. However, there is still a significant demand for improving nested Arabic noun compound extraction in terms of the accuracy. This research proposes a hybrid method of linguistic-based approach and statistical method in order to enhance the extraction of nested Arabic noun compound. The dataset has been collected from online Arabic newspaper archive from Aljazeara.net and Almotamar.net. Several pre-processing steps have been carried out on the data including transformation, normalization, stemming and POS tagging. After that, an n-gram is used to generate bi-gram, tri-gram, 4-gram, and 5-gram candidates of noun compound. Then three association measures which are NC-value, PMI and LLR have been used in order to rank the candidates. The evaluation has been performed using the n-best method with a human annotation (manual selection by expertise). NC-value has outperformed PMI and LLR in terms of extracting nested noun compounds.,Master of Information Technology
dc.language.iso	eng
dc.publisher	UKM, Bangi
dc.relation	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat
dc.rights	UKM
dc.subject	Hybrid method
dc.subject	Universiti Kebangsaan Malaysia -- Dissertations
dc.subject	Dissertations, Academic -- Malaysia
dc.title	A hybrid method of linguistic approach and statistical method for nested noun compound extraction
dc.type	theses
dc.format.pages	84
dc.identifier.callno	P98.A434 2014 3 tesis
dc.identifier.barcode	001232
dc.identifier.barcode	005656(2021)(PL2)
Appears in Collections:	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:

File	Description	Size	Format
ukmvital_75235+Source01+Source010.PDF Restricted Access		1.7 MB	Adobe PDF	View/Open

Show simple item record Recommend this item