News

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
04 Aug 2020

In our lab meeting tomorrow, Vincent will introduce a paper: Don’t Stop Pretraining on ACL2020 (Honorable mention for best paper).

A Zoom link will be posted to Twist on the morning of the meeting.

Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks

Abstract: Language models pretrained on text from a wide variety of sources form the foundation of today’s NLP. In light of the success of these broad-coverage models, we investigate whether it is still helpful to tailor a pretrained model to the domain of a target task. We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks, showing that a second phase of pretraining indomain (domain-adaptive pretraining) leads to performance gains, under both high- and low-resource settings. Moreover, adapting to the task’s unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining. Finally, we show that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable. Overall, we consistently find that multiphase adaptive pretraining offers large gains in task performance.

https://www.aclweb.org/anthology/2020.acl-main.740.pdf

Tuesday, Aug 4th, 09:30 a.m.

Neural Module Networks
28 Jul 2020

In our lab meeting tomorrow, Anoop will cover two papers on neural module networks.

A Zoom link will be posted to Twist on the morning of the meeting.

Neural Module Networks

Abstract: Anoop will cover some papers on neural module networks which have previously been applied to VQA and to reading comprehension tasks. However, in my estimation their potential applicability will be quite widespread in NLP even though it was introduced quite some time ago (in 2015) and we haven’t seen it take off yet. We will cover the following papers:

Neural Module Networks. Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Dan Klein. https://arxiv.org/abs/1511.02799.

Neural Module Networks for Reasoning over Text. Nitish Gupta, Kevin Lin, Dan Roth, Sameer Singh, Matt Gardner. https://arxiv.org/abs/1912.04971

Tuesday, July 28th, 09:30 a.m.

Speech synthesis from neural decoding of spoken sentences
21 Jul 2020

In our lab meeting tomorrow, Jetic will lead the reading of a recent paper from Nature on Neural Speech Synthesis.

https://www.nature.com/articles/s41586-019-1119-1

A Zoom link will be posted to Twist on the morning of the meeting.

Speech synthesis from neural decoding of spoken sentences

Abstract: Technology that translates neural activity into speech would be transformative for people who are unable to communicate as a result of neurological impairments. Decoding speech from neural activity is challenging because speaking requires very precise and rapid multi-dimensional control of vocal tract articulators. Here we designed a neural decoder that explicitly leverages kinematic and sound representations encoded in human cortical activity to synthesize audible speech. Recurrent neural networks first decoded directly recorded cortical activity into representations of articulatory movement, and then transformed these representations into speech acoustics. In closed vocabulary tests, listeners could readily identify and transcribe speech synthesized from cortical activity. Intermediate articulatory dynamics enhanced performance even with limited data. Decoded articulatory representations were highly conserved across speakers, enabling a component of the decoder to be transferrable across participants. Furthermore, the decoder could synthesize speech when a participant silently mimed sentences. These findings advance the clinical viability of using speech neuroprosthetic technology to restore spoken communication.

Tuesday, July 21st, 09:30 a.m.

Training with Adversaries to Improve Faithfulness of Attention in Neural Machine Translation
14 Jul 2020

In our lab meeting tomorrow, Pooya will practice his thesis defence. A Zoom link will be posted to Twist on the morning of the meeting.

Training with Adversaries to Improve Faithfulness of Attention in Neural Machine Translation

Abstract: Can we trust that the attention heatmaps produced by a neural machine translation (NMT) model reflect its true internal reasoning? We isolate and examine in detail the notion of faithfulness in NMT models. We provide a measure of faithfulness for NMT based on a variety of stress tests where model parameters are perturbed and measuring faithfulness based on how often the model output changes. We show that our proposed faithfulness measure for NMT models can be improved using a novel differentiable objective that rewards faithful behaviour by the model through probability divergence. Our experimental results on multiple language pairs show that our objective function is effective in increasing faithfulness and can lead to a useful analysis of NMT model behaviour and more trustworthy attention heatmaps. Our proposed objective improves faithfulness without reducing the translation quality and it also seems to have a useful regularization effect on the NMT model and can even improve translation quality in some cases.

Tuesday, July 14th, 09:30 a.m.

Synthesizer: Rethinking Self-Attention in Transformer Models
07 Jul 2020

In our lab meeting tomorrow, Nishant will introduce the Transformer Models. A Zoom link will be posted to Twist on the morning of the meeting.

Synthesizer: Rethinking Self-Attention in Transformer Models

Abstract: The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models. But is it really required? This paper investigates the true importance and contribution of the dot product-based self-attention mechanism on the performance of Transformer models. Via extensive experiments, we find that (1) random alignment matrices surprisingly perform quite competitively and (2) learning attention weights from token-token (query-key) interactions is not that important after all. To this end, we propose Synthesizer, a model that learns synthetic attention weights without token-token interactions. Our experimental results show that Synthesizer is competitive against vanilla Transformer models across a range of tasks, including MT (EnDe, EnFr), language modeling (LM1B), abstractive summarization (CNN/Dailymail), dialogue generation (PersonaChat) and Multi-task language understanding (GLUE, SuperGLUE).

Tuesday, July 7th, 09:30 a.m.

Recent Publications