Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476516
Title: Offline OCR system for machine-printed Turkish using template
Authors: Dena Rafaa Ahmed (P50142)
Supervisor: Md Jan Nordin, Assoc. Prof. Dr.
Keywords: Optical character recognition devices
Pattern recognition systems
Template matching (Digital image processing)
Image processing -- Digital techniques
Issue Date: 30-Mar-2011
Description: One of the most important application these days in Pattern Recognition (PR) is Optical Character recognition (OCR) which is a system used to convert scanned printed or handwritten image files into machine readable and editable format such as text documents. OCR software receives its input as an image file, processes it and compares its contains with a template set stored in OCR database. OCR systems can be integrated into devices such as mobile phones to convert any image file, captured by camera or scanned by a scanner, into machine readable and editable format. The main motivation behind this study is to build an OCR system for offline machine-printed Turkish characters to convert any image file into a readable and editable format. This OCR system started from preprocessing step to convert the image file into a binary format with less noise to be ready for recognition. The preprocessing step includes digitization, binarization, thresholding, and noise removal. Next, horizontal projection method is used for line detection and word allocation and 8-connected neighbors’ schema is used to extract characters as a set of connected components. Then, the Template matching method is utilized to implement the matching process between the segmented characters and the template set stored in OCR database in order to recognize the text. Unlike neural networks and another approaches, template matching takes shorter time and does not require sample training but it has a lot of disadvantages. For example, it is not able to recognize some letters with similar shape or combined letters, for this reason, it is used together with other approaches and additional features such as feature extraction approaches in modern systems, size feature of the segmented character to get more accurate results. This OCR system combines both the template matching and the size of the segmented characters to achieve accurate results. Finally, upon a successful implementation of the OCR, the recognized patterns are displayed in notepad as readable and editable text. The Turkish machine-printed database consists of a list of 600 name of cities in Turkey written by using Arial font in uppercase, lowercase and capitalize the first character for each word. The proposed OCR’s result show that the accuracy of the system is from 96% to 100%.,“Certification of Master’s/Doctoral Thesis” is not available,Master Information Technology
Pages: 88
Call Number: TK7895.O6A378 2011 3 tesis
Publisher: UKM, Bangi
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_114891+SOURCE1+SOURCE1.0.PDF
  Restricted Access
10.62 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.