Compound binarisation with hybrid feature extraction for Arabic font recognition via handphone camera

Musab Kasim Soliman Alqudah (P49991)

Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/513258

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Mohammad Faidzul Nasrudin, Assoc. Prof. Dr.
dc.contributor.author	Musab Kasim Soliman Alqudah (P49991)
dc.date.accessioned	2023-10-16T04:35:04Z	-
dc.date.available	2023-10-16T04:35:04Z	-
dc.date.issued	2016-11-11
dc.identifier.other	ukmvital:96520
dc.identifier.uri	https://ptsldigital.ukm.my/jspui/handle/123456789/513258	-
dc.description	The steps in Document image analysis and recognition (DIAR) technique to process and recognize objects from document images are numerous. The hand-phone camera nowadays is also known to be utilized for capturing information from the environment surrounding the user such as of which the example is the Arabic Font Recognition (AFR). In spite of the popularity, document images captured from hand-phone cameras are susceptible to impairment from low resolution, perspective distortion, skew, blur, and variance of illumination. The current binarization techniques, which is a pre-processing stage by which an image is processed and prepared prior to the recognition stage, fail to solve the common problem in images from hand-phone camera, namely illumination. Correspondingly, AFR techniques fail to score highly in recognition accuracy due to the reliance on contour information. Hence, this study focuses on three objectives: first is to propose a compound binarization that is able to overcome most binarization problem cases; second is to propose a hybrid feature extraction based on stroke information to increase the AFR accuracy; third is to propose framework of AFR. The compound binarisation is based on the idea of executing multiple thresholdings on different region of the image, which can yield better results in complex and common binarisation cases. The proposed hybrid feature is extracted based on the maximum length of connected component, circle direction and length direction of pixels between the vertical and horizontal directions as well as the statistical equations of skeleton strokes from Arabic text images. Experiment in this work used 66 images from DIBCO benchmarks datasets published during 2009 to 2013 and 350 self-collected document images captured via a five megapixels hand-phone camera. The proposed compound binarization techniques are evaluated and compared with six well-known benchmark techniques namely Otsu, Niblack, Sauvola, Wolf, Nick and Bataineh. Based on the experimental results, the F-measure of proposed compound technique for DIBCO benchmarks had achieved 86.14%, whereas, Bataineh, Niblack, Nick, Otsu, Sauvola and Wolf methods achieved 85.19%, 36.11%, 77.81%, 79.76%, 73.84% and 67.62% respectively. In addition, the F-measure of proposed compound technique for self-collected dataset had achieved 84.59% which also surpassed the others in benchmark techniques. The proposed feature extraction technique is evaluated by comparing it with gray level co-occurrences matrix (GLCM), Local Binary Pattern (LBP), Edge Direction Matrixes (EDMS) and Gabor methods. From the experiments, the proposed method surpassed the GLCM, LBP, EDMS and Gabor and the accuracy in Multilayer Neural Network (MNN) classifier for self-collected dataset are 100%, 87.77%, 91.23%, 97.16.7% and 97.8% respectively. This results show that the proposed AFR, based on the proposed binarization and feature techniques, is able to recognize Arabic font types in hand-phone captured document images captured with in high accuracy.,Certification of Master's/Doctoral Thesis" is not available
dc.language.iso	eng
dc.publisher	UKM, Bangi
dc.relation	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat
dc.rights	UKM
dc.subject	Compound binarisation
dc.subject	Hybrid feature
dc.subject	Arabic font
dc.subject	Handphone camera
dc.subject	Optical character recognition
dc.title	Compound binarisation with hybrid feature extraction for Arabic font recognition via handphone camera
dc.type	Theses
dc.format.pages	242
dc.identifier.callno	TA1640.A449 2016 3 tesis
dc.identifier.barcode	002728(2017)
Appears in Collections:	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:

File	Description	Size	Format
ukmvital_96520+SOURCE1+SOURCE1.0.PDF Restricted Access		963 kB	Adobe PDF	View/Open

Show simple item record Recommend this item