Offline OCR system for machine-printed Turkish using template

Dena Rafaa Ahmed

Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476516

Title:	Offline OCR system for machine-printed Turkish using template
Authors:	Dena Rafaa Ahmed
Supervisor:	Md Jan Nordin, Assoc. Prof. Dr.
Keywords:	Optical character recognition devices Pattern recognition systems Template matching (Digital image processing) Image processing -- Digital techniques
Issue Date:	30-Mar-2011
Description:	One of the most important application these days in Pattern Recognition (PR) is Optical Character recognition (OCR) which is a system used to convert scanned printed or handwritten image files into machine readable and editable format such as text documents. OCR software receives its input as an image file, processes it and compares its contains with a template set stored in OCR database. OCR systems can be integrated into devices such as mobile phones to convert any image file, captured by camera or scanned by a scanner, into machine readable and editable format. The main motivation behind this study is to build an OCR system for offline machine-printed Turkish characters to convert any image file into a readable and editable format. This OCR system started from preprocessing step to convert the image file into a binary format with less noise to be ready for recognition. The preprocessing step includes digitization, binarization, thresholding, and noise removal. Next, horizontal projection method is used for line detection and word allocation and 8-connected neighbors’ schema is used to extract characters as a set of connected components. Then, the Template matching method is utilized to implement the matching process between the segmented characters and the template set stored in OCR database in order to recognize the text. Unlike neural networks and another approaches, template matching takes shorter time and does not require sample training but it has a lot of disadvantages. For example, it is not able to recognize some letters with similar shape or combined letters, for this reason, it is used together with other approaches and additional features such as feature extraction approaches in modern systems, size feature of the segmented character to get more accurate results. This OCR system combines both the template matching and the size of the segmented characters to achieve accurate results. Finally, upon a successful implementation of the OCR, the recognized patterns are displayed in notepad as readable and editable text. The Turkish machine-printed database consists of a list of 600 name of cities in Turkey written by using Arial font in uppercase, lowercase and capitalize the first character for each word. The proposed OCR’s result show that the accuracy of the system is from 96% to 100%.,“Certification of Master’s/Doctoral Thesis” is not available,Master Information Technology
Pages:	88
Call Number:	TK7895.O6A378 2011 3 tesis
Publisher:	UKM, Bangi
URI:	https://ptsldigital.ukm.my/jspui/handle/123456789/476516
Appears in Collections:	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:

File	Description	Size	Format
ukmvital_114891+SOURCE1+SOURCE1.0.PDF Restricted Access		10.62 MB	Adobe PDF	View/Open

Show full item record Recommend this item