News

Pooya and Zhenqi will present their recent papers due to appear in EMNLP 2019.
21 Oct 2019

In our lab meeting tomorrow Zhenqi and Pooya will present their EMNLP2019 papers. Here is the title and abstract of their talks:

Interrogating the Explanatory Power of Attention in Neural Machine Translation

Abstract: Attention models have become a crucial component in neural machine translation (NMT). They are often implicitly or explicitly used to justify the model’s decision in generating a specific token but it has not yet been rigorously established to what extent attention is a reliable source of information in NMT. To evaluate the explanatory power of attention for NMT, we examine the possibility of yielding the same prediction but with counterfactual attention models that modify crucial aspects of the trained attention model. Using these counterfactual attention mechanisms we assess the extent to which they still preserve the generation of function and content words in the translation process. Compared to a state of the art attention model, our counterfactual attention models produce 68% of function words and 21% of content words in our German-English dataset. Our experiments demonstrate that attention models by themselves cannot reliably explain the decisions made by a NMT model.

Deconstructing Supertagging into Multi-task Sequence Prediction

Abstract: In this paper, we present a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks. MT-DNN not only leverages large amounts of cross-task data, but also benefits from a regularisation effect that leads to more general representations to help adapt to new tasks and domains. MT-DNN extends the model proposed in Liu et al. (2015) by incorporating a pre-trained bidirectional transformer language model, known as BERT (Devlin et al., 2018). MT-DNN obtains new state-of-the-art results on ten NLU tasks, including SNLI, SciTail, and eight out of nine GLUE tasks, pushing the GLUE benchmark to 82.7% (2.2% absolute improvement) as of February 25, 2019 on the latest GLUE test set. We also demonstrate using the SNLI and SciTail datasets that the representations learned by MT-DNN allow domain adaptation with substantially fewer in-domain labels than the pre-trained BERT representations. Our code and pre-trained models will be made publicly available.

Tuesday, October 22th, 10:30 a.m. TASC1 9408.  

Pooya will present his WNGT 2019 paper.
15 Oct 2019

In our lab meeting tomorrow Pooya will present on interpretibility of attention mechanism in NMT. Here is the title and abstract of his talk:

Interrogating the Explanatory Power of Attention in Neural Machine Translation

Abstract: Attention models have become a crucial component in neural machine translation (NMT). They are often implicitly or explicitly used to justify the model’s decision in generating a specific token but it has not yet been rigorously established to what extent attention is a reliable source of information in NMT. To evaluate the explanatory power of attention for NMT, we examine the possibility of yielding the same prediction but with counterfactual attention models that modify crucial aspects of the trained attention model. Using these counterfactual attention mechanisms we assess the extent to which they still preserve the generation of function and content words in the translation process. Compared to a state of the art attention model, our counterfactual attention models produce 68% of function words and 21% of content words in our German-English dataset. Our experiments demonstrate that attention models by themselves cannot reliably explain the decisions made by a NMT model.

Tuesday, October 15th, 10:30 a.m. TASC1 9408.

Nadia's will talk about Text Summarization. Pooya will talk about interpretibility of attention mechanism in NLP.
08 Oct 2019

In our lab meeting tomorrow Nadia will present on text summarisation with pretrained encoders. Here is the title and abstract of her talk:

Towards supervised speech-to-text translation without transcription.

Abstract: Bidirectional Encoder Representations from Transformers (BERT) represents the latest incarnation of pretrained language models which have recently advanced a wide range of natural language processing tasks. In this paper, we showcase how BERT can be usefully applied in text summarization and propose a general framework for both extractive and abstractive models. We introduce a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences. Our extractive model is built on top of this encoder by stacking several intersentence Transformer layers. For abstractive summarization, we propose a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two (the former is pretrained while the latter is not). We also demonstrate that a two-staged fine-tuning approach can further boost the quality of the generated summaries. Experiments on three datasets show that our model achieves stateof-the-art results across the board in both extractive and abstractive settings.

Tuesday, October 8th, 10:30 a.m. TASC1 9408.

Ashkan will talk about end-to-end speech translation.
01 Oct 2019

In our lab meeting tomorrow Ashkan will go over various methods of end-to-end speech translation. Here is the title and abstract of his talk:

Towards supervised speech-to-text translation without transcription.

Abstract: Speech translation has traditionally been approached through cascaded models consisting of a speech recognizer trained on a corpus of transcribed speech, and a machine translation system trained on parallel texts. Several recent works have shown the feasibility of collapsing the cascade into a single, direct model that can be trained in an end-to-end fashion on a corpus of translated speech (Sperber et al 19). End-to-end Speech Translation (ST) models have many potential advantages when compared to the cascade of Automatic Speech Recognition (ASR) and text Machine Translation (MT) models, including lowered inference latency and the avoidance of error compounding (Jia et al 19). In our meeting we will talk about various approaches to make end-to-end models more accurate.

Tuesday, October 1st, 10:30 a.m. TASC1 9408.

Hassan will practice his depth presentation.
23 Sep 2019

In our lab meeting this week, Hassan will talk about handling out-of-domain and out-of-vocabulary words in NMT. The title and abstract of the talk:

Imposing Bilingual Lexical Constraints to Neural Machine Translation

Abstract: eural Machine Translation (NMT) models have reached astonishing results in recent years, and yet infrequent words remain a problem to them as well as out-of-domain terminologies. Bilingual lexical resources can be used to guide the model through difficulties when it faces infrequent and out-of-domain vocabulary words. Hence, imposing bilingual lexical preferences (constraints) into NMT models have received rising attention in the past few years. In this talk, we summarize different threads of work on Constrained NMT including approaches that modify the input or output of the model (pre-processing and post-processing) without changing the model itself, as well as, the approaches that change the inference algorithms and target vocabulary set.

Tuesday, September 23rd, 10:30 a.m. TASC1 9408.

Recent Publications