In the lab meeting on september 28 (Thursday), Hasan will talk about his M.Sc. Thesis about Training Data Annotation for Segmentation Classification in simultaneous translation. Here’s the abstarct of his talk: Abstract: Segmentation of the incoming speech stream and translating segments incrementally is a commonly used technique that improves latency in spoken language translation. Previous work has explored creating training data for segmentation by finding segments that maximize translation quality with a user-defined bound on segment length.
In this work, we provide a new algorithm, using Pareto-optimality, for finding good segment boundaries that can balance the trade-off between latency versus translation quality. Our experimental results show that we can provide qualitatively better segments that improve latency without substantially hurting translation quality.