15 May 2012

Ann Clifton did her PhD depth examination on April 12, 2012.

Title: Multilingual Statistical Machine Translation


We examine approaches towards using multilingual information for statistical machine translation (SMT). Multilingual information has been successfully leveraged for a variety of monolingual natural language processing tasks, such as word-sense disambiguation. SMT is an inherently multilingual task (since model training requires parallel data between the source and target language), but most work has focused on training translation models in the bilingual setting only. This has meant that robust SMT systems largely only exist for the few language pairs for which extensive parallel corpora are available. However, by injecting multilingual information into the models, we can hope to exploit the orthogonality of ambiguity in different language sources to improve SMT models; in particular, we can make it possible to train viable translation models for resource-poor language pairs.

We categorize multilingual translation methods based on the level at which the multilingual information is combined: at the word alignment level, the phrasal level, or the source-side sentence level. We characterize the methods at each level of combination in terms of their robustness to data sparsity, as well as their extensibility to broader multilingual settings.