Natural Language Laboratory
School of Computing Science
Simon Fraser University

Hasan's presentation on Training Data Annotation for Segmentation Classification

28 Sep 2017

In the lab meeting on september 28 (Thursday), Hasan will talk about his M.Sc. Thesis about Training Data Annotation for Segmentation Classification in simultaneous translation. Here’s the abstarct of his talk: Abstract: Segmentation of the incoming speech stream and translating segments incrementally is a commonly used technique that improves latency in spoken language translation. Previous work has explored creating training data for segmentation by finding segments that maximize translation quality with a user-defined bound on segment length.

In this work, we provide a new algorithm, using Pareto-optimality, for finding good segment boundaries that can balance the trade-off between latency versus translation quality. Our experimental results show that we can provide qualitatively better segments that improve latency without substantially hurting translation quality.