A hybrid method of linguistic approach and statistical method for nested noun compound extraction

Hamed Hamdoon Ali Al-Balushi

Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/475807

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Mohd. Juzaiddin Ab Aziz, Prof. Dr.	-
dc.contributor.author	Hamed Hamdoon Ali Al-Balushi	-
dc.contributor.other	P65643	-
dc.date.accessioned	2023-10-05T06:42:01Z	-
dc.date.available	2023-10-05T06:42:01Z	-
dc.date.issued	2013-05-27	-
dc.identifier.other	P65643	-
dc.identifier.uri	https://ptsldigital.ukm.my/jspui/handle/123456789/475807	-
dc.description	Arabic noun compound extraction has become a challenging issue in the field of NLP. Several approaches have been proposed in terms of extracting Arabic noun compounds. Some of them have used linguistic-based approach, statistical methods and the rest have used a hybrid between them. However, there is still a significant demand for improving nested Arabic noun compound extraction in terms of the accuracy. This research proposes a hybrid method of linguistic-based approach and statistical method in order to enhance the extraction of nested Arabic noun compound. The dataset has been collected from online Arabic newspaper archive from Aljazeara.net and Almotamar.net. Several pre-processing steps have been carried out on the data including transformation, normalization, stemming and POS tagging. After that, an n-gram is used to generate bigram, tri-gram, 4-gram, and 5-gram candidates of noun compound. Then three association measures which are NC-value, PMI and LLR have been used in order to rank the candidates. The evaluation has been performed using the n-best method with a human annotation (manual selection by expertise). NC-value has outperformed PMI and LLR in terms of extracting nested noun compounds.,Certification of Master's/Doctoral Thesis" is not available	-
dc.language.iso	eng	-
dc.publisher	UKM, Bangi	-
dc.relation	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat	-
dc.subject	Arabic noun	-
dc.subject	Universiti Kebangsaan Malaysia -- Dissertations.	-
dc.title	A hybrid method of linguistic approach and statistical method for nested noun compound extraction	-
dc.type	Theses	-
dc.rights.holder	UKM	-
dc.format.pages	84	-
dc.identifier.callno	P98.A434 2014 3 tesis	-
Appears in Collections:	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:

There are no files associated with this item.

Show simple item record Recommend this item