There were two papers from the lab at ACL this year. Ann presented at the MT: Methods session, and Marzieh presented at the poster session.
This paper extends the training and tuning regime for phrase-based statistical machine translation to obtain fluent translations into morphologically complex languages (we build an English to Finnish translation system). Our methods are not language specific, and we use unsupervised morphology induction. Unlike previous work we focus on morphologically productive phrase pairs – our decoder can combine morphemes across phrase boundaries. Morphemes in the target language may not have a corresponding morpheme or word in the source language. Therefore, we propose a novel combination of post-processing morphology prediction with morpheme-based translation. We show, using both automatic evaluation scores and linguistically motivated analyses of the output, that our methods outperform previously proposed ones and provide the best known results on the English-Finnish EuroParl translation task. Our methods are mostly language independent, so they should improve translation into other target languages with complex morphology.
We combine multiple word representations based on semantic clusters extracted from the (Brown 1992) algorithm and syntactic clusters obtained from the Berkeley parser (Petrov et. al., 2006) in order to improve discriminative dependency parsing in the MSTParser framework (McDonald et. al., 2005). We also provide an ensemble method for combining diverse cluster-based models, which is a discriminative parsing analog to the generative product of experts model for parsing in (Petrov, 2010). The two contributions together significantly improves unlabeled dependency accuracy from 90.82\% to 92.13\%.