In our lab meeting tomorrow Ashkan will go over various methods of end-to-end speech translation. Here is the title and abstract of his talk:
Towards supervised speech-to-text translation without transcription.
Abstract: Speech translation has traditionally been approached through cascaded models consisting of a speech recognizer trained on a corpus of transcribed speech, and a machine translation system trained on parallel texts. Several recent works have shown the feasibility of collapsing the cascade into a single, direct model that can be trained in an end-to-end fashion on a corpus of translated speech (Sperber et al 19). End-to-end Speech Translation (ST) models have many potential advantages when compared to the cascade of Automatic Speech Recognition (ASR) and text Machine Translation (MT) models, including lowered inference latency and the avoidance of error compounding (Jia et al 19). In our meeting we will talk about various approaches to make end-to-end models more accurate.
Tuesday, October 1st, 10:30 a.m. TASC1 9408.