Baskaran will be giving a lab practice talk on 20 July (Wednesday). The paper is on Bayesian rule extraction for Hiero-style SMT, which has been accepted in WMT-2011.
Date & Time: 11 am on 20 July ‘11
Room: TASC1 9408
Title: Bayesian Extraction of Minimal SCFG Rules for Hierarchical Phrase-based Translation
We present a novel approach for extracting a minimal synchronous context-free grammar (SCFG) for Hiero-style statistical machine translation using a non-parametric Bayesian framework. Our approach is designed to extract rules that are licensed by the word alignments and heuristically extracted phrase pairs. Our Bayesian model limits the number of SCFG rules extracted, by sampling from the space of all possible hierarchical rules; additionally our informed prior based on the lexical alignment probabilities biases the grammar to extract high quality rules leading to improved generalization and the automatic identification of commonly re-used rules. We show that our Bayesian model is able to extract minimal set of hierarchical phrase rules without impacting the translation quality as measured by the BLEU score.