Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/577636
Title: Factored statistical machine translation system for English to Tamil language
Authors: Anand Kumar M.
Soman K. P.
Dhanalakshmi V.
Rajendran S
Keywords: Statistical machine translation
Preprocessing
English-Tamil machine translation
Linguistic tools
Morphologically rich language
Issue Date: Dec-2014
Description: This paper proposes a morphology based Factored Statistical Machine Translation (SMT) system for translating English language sentences into Tamil language sentences. Automatic translation from English into morphologically rich languages like Tamil is a challenging task. Morphologically rich languages need extensive morphological preprocessing before the SMT training to make the source language structurally similar to target language. English and Tamil languages have disparate morphological and syntactical structure. Because of the highly rich morphological nature of the Tamil language, a simple lexical mapping alone does not help for retrieving and mapping all the morpho-syntactic information from the English language sentences. The main objective of this proposed work is to develop a machine translation system from English to Tamil using a novel pre-processing methodology. This pre-processing methodology is used to pre-process the English language sentences according to the Tamil language. These pre-processed sentences are given to the factored Statistical Machine Translation models for training. Finally, the Tamil morphological generator is used for generating a new surface word-form from the output factors of SMT. Experiments are conducted with nine different type of models, which are trained, tuned and tested with the help of general domain corpora and developed linguistic tools. These models are different combinations of developed pre-processing tools with baseline models and factored models and the accuracies are evaluated using the well known evaluation metric BLEU and METOR. In addition, accuracies are also compared with the existing online “Google- Translate” machine translation system. Results show that the proposed method significantly outperforms the other models and the existing system.
News Source: Pertanika Journal of Social Sciences & Humanities
ISSN: 0128-7702
Volume: 22
Pages: 1045-1061
Publisher: Universiti Putra Malaysia Press
Appears in Collections:Journal Content Pages/ Kandungan Halaman Jurnal

Files in This Item:
File Description SizeFormat 
ukmvital_78474+Source01+Source010.PDF509.85 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.