This week we will have Ann Clifton and Majid Razmara talk about their Johns Hopkins University (JHU) CLSP summer workshop experience along with the presentation they gave at the final wrap-up of the workshop.
You can learn more about the JHU summer workshop series.
Ann and Majid were in the team exploring domain adaptation for machine translation.
Previous work has shown that document-level information such as provenance or topics can be helpful for SMT. We are interested in the case of domain adaptation where there is little parallel data in the new domain. We consider various topic-conditioning lexical weighting models for this task, using both document- and token-level topic distributions, which can be used to take advantage either parallel or monolingual new-domain data.
Standard phrase-based MT systems use no or little context information in translating source phrases. In this work, we try to use some context information when choosing between translation options for each phrase. We extract some features in the phrase-sense-disambiguation style and train a classifier on them. We use the score of this psd-style classifier as an additional feature in the log-linear framework of Moses. Particularly, I will talk about different approaches for adapting the psd file from an old domain (Hansard) to new domains (EMEA and Science).