AI_Site

Multiple System Combination for PersoArabic-Latin Transliteration

5bdc316717c44a1f58a07467  ·  Nima Hemmati,Heshaam Faili,Jalal Maleki ·

In this paper, we model a PersoArabic to Latin transliteration system as grapheme-to-phoneme (G2P) and word lattice methods combined with statistical machine translation (SMT). Persian is an Indo-Iranian branch of the Indo-European family of languages belonging to Arabic script-based languages. Our transliteration model is induced from a parallel corpus containing the PersoArabic script of a Persian book together with its Romanized transcription in Dabire. We manually aligned the sentences of this book in both scripts and used it as a parallel corpus. Our results indicate that the performance of the system is improved by adding grapheme-to-phoneme and word lattice methods for out-of-vocabulary handling task into the monotonic statistical machine transliteration system. In addition, the final performance on the test corpus shows that our system achieves comparable results with other state-of-the-art systems.

Code


Tasks


Datasets


Problems


Methods


Results from the Paper