21 Sep 2012

Majid will defend his thesis proposal on 21st September, 2012 at 01:30 PM.

Razmara, M. Combining Diverse Sources In Statistical Machine Translation.

Statistical machine translation is often faced with the problem of combining training data from many diverse sources into a single translation model. We propose a novel approach, ensemble decoding,which combines a number of translation systems dynamically at the decoding step. We evaluate performance on a domain adaptation setting where a model trained on large parliamentary domain is adapted to the medical domain, we then translate sentences from the medical domain. Our experimental results show that ensemble decoding outperforms various strong baselines including mixture models, the current state-of-the-art for domain adaptation in machine translation. Moreover, we propose a number of extensions, both in experiments and methods, to ensemble decoding. Combining arbitrary number of (heterogeneous) translation models at decoding time and studying characteristics of different mixture operations are among those. In addition, new methods for adjusting the contribution of each component model (i.e. tuning component hyper-parameters) are proposed. We also propose approaches to extend our method for the multi-parallel-corpus scenario where we can take advantage of a number of pivot languages to foster translating between language-pairs with scarce parallel data.