Natural Language Laboratory
School of Computing Science
Simon Fraser University

Hassan Shavarani MSc Thesis Defence

06 May 2016

On May 9th 2pm, Hassan will defend his thesis defence on the topic “Training Data Annotation for Segmentation Classification in Simultaneous Translation”.

Abstract

Segmentation of the incoming speech stream and translating segments incrementally is a commonly used technique that improves latency in spoken language translation. Previous work (Oda et al. 2014) has explored creating training data for segmentation by ﬁnding segments that maximize translation quality with a user-deﬁned bound on segment length. In this work, we provide a new algorithm, using Pareto-optimality, for ﬁnding good segment boundaries that can balance the trade-oﬀ between latency versus translation quality. We compare against the state-of-the-art greedy algorithm from (Oda et al. 2014). Our experimental results show that we can improve latency by up to 12% without harming the Bleu score for the same average segment length. Another beneﬁt is that for any segment size, Pareto-optimal segments maximize latency and translation quality.

M.Sc. Examining Committee:

Dr. Anoop Sarkar, Senior Supervisor
Dr. Fred Popowich, Supervisor
Dr. William D. Lewis, Examiner, Microsoft Research and University of Washington
Dr. Arrvindh Shriraman, Chair