Offline OCR system for machine-printed Turkish using template

Dena Rafaa Ahmed

Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476516

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Md Jan Nordin, Assoc. Prof. Dr.	-
dc.contributor.author	Dena Rafaa Ahmed	-
dc.contributor.other	P50142	-
dc.date.accessioned	2023-10-06T09:20:03Z	-
dc.date.available	2023-10-06T09:20:03Z	-
dc.date.issued	2011-03-30	-
dc.identifier.other	P50142	-
dc.identifier.uri	https://ptsldigital.ukm.my/jspui/handle/123456789/476516	-
dc.description	One of the most important application these days in Pattern Recognition (PR) is Optical Character recognition (OCR) which is a system used to convert scanned printed or handwritten image files into machine readable and editable format such as text documents. OCR software receives its input as an image file, processes it and compares its contains with a template set stored in OCR database. OCR systems can be integrated into devices such as mobile phones to convert any image file, captured by camera or scanned by a scanner, into machine readable and editable format. The main motivation behind this study is to build an OCR system for offline machine-printed Turkish characters to convert any image file into a readable and editable format. This OCR system started from preprocessing step to convert the image file into a binary format with less noise to be ready for recognition. The preprocessing step includes digitization, binarization, thresholding, and noise removal. Next, horizontal projection method is used for line detection and word allocation and 8-connected neighbors’ schema is used to extract characters as a set of connected components. Then, the Template matching method is utilized to implement the matching process between the segmented characters and the template set stored in OCR database in order to recognize the text. Unlike neural networks and another approaches, template matching takes shorter time and does not require sample training but it has a lot of disadvantages. For example, it is not able to recognize some letters with similar shape or combined letters, for this reason, it is used together with other approaches and additional features such as feature extraction approaches in modern systems, size feature of the segmented character to get more accurate results. This OCR system combines both the template matching and the size of the segmented characters to achieve accurate results. Finally, upon a successful implementation of the OCR, the recognized patterns are displayed in notepad as readable and editable text. The Turkish machine-printed database consists of a list of 600 name of cities in Turkey written by using Arial font in uppercase, lowercase and capitalize the first character for each word. The proposed OCR’s result show that the accuracy of the system is from 96% to 100%.,“Certification of Master’s/Doctoral Thesis” is not available,Master Information Technology	-
dc.language.iso	eng	-
dc.publisher	UKM, Bangi	-
dc.relation	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat	-
dc.subject	Optical character recognition devices	-
dc.subject	Pattern recognition systems	-
dc.subject	Template matching (Digital image processing)	-
dc.subject	Image processing -- Digital techniques	-
dc.title	Offline OCR system for machine-printed Turkish using template	-
dc.type	theses	-
dc.rights.holder	UKM	-
dc.format.pages	88	-
dc.identifier.callno	TK7895.O6A378 2011 3 tesis	-
dc.identifier.barcode	002472(2011)	-
Appears in Collections:	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:

File	Description	Size	Format
ukmvital_114891+SOURCE1+SOURCE1.0.PDF Restricted Access		10.62 MB	Adobe PDF	View/Open

Show simple item record Recommend this item