Two Papers on Multi-Task Learning
10 Mar 2020

In our lab meeting tomorrow, Golnar will discuss two papers about multi-task learning. The titles and abstract follow:

BAM! Born-Again Multi-Task Networks for Natural Language Understanding

Abstract: It can be challenging to train multi-task neural networks that outperform or even match their single-task counterparts. To help address this, we propose using knowledge distillation where single-task models teach a multi-task model. We enhance this training with teacher annealing, a novel method that gradually transitions the model from distillation to supervised learning, helping the multi-task model surpass its single-task teachers. We evaluate our approach by multi-task fine-tuning BERT on the GLUE benchmark. Our method consistently improves over standard single-task and multi-task training.

Soft Representation Learning for Sparse Transfer

Abstract Transfer learning is effective for improving the performance of tasks that are related, and Multi-task learning (MTL) and Cross-lingual learning (CLL) are important instances. This paper argues that hard-parameter sharing, of hard-coding layers shared across different tasks or languages, cannot generalize well, when sharing with a loosely related task. Such case, which we call sparse transfer, might actually hurt performance, a phenomenon known as negative transfer. Our contribution is using adversarial training across tasks, to “softcode” shared and private spaces, to avoid the shared space gets too sparse. In CLL, our proposed architecture considers another challenge of dealing with low-quality input.

Tuesday, Mar 10th, 09:30 a.m. TASC1 9408.

Sequential Neural Networks as Automata
03 Mar 2020

In our lab meeting tomorrow, Logan will discuss recent work by Merrill (2019) which attempts to find equivalences between neural network architectures and automata.

Here are the title and abstract:

Sequential Neural Networks as Automata

Abstract: This work attempts to explain the types of computation that neural networks can perform by relating them to automata. We first define what it means for a real-time network with bounded precision to accept a language. A measure of network memory follows from this definition. We then characterize the classes of languages acceptable by various recurrent networks, attention, and convolutional networks. We find that LSTMs function like counter machines and relate convolutional networks to the subregular hierarchy. Overall, this work attempts to increase our understanding and ability to interpret neural networks through the lens of theory. These theoretical insights help explain neural computation, as well as the relationship between neural networks and natural language grammar.

Tuesday, Mar 3rd, 09:30 a.m. TASC1 9408.

Monotonic Infinite Lookback Attention for Simultaneous Machine Translation
25 Feb 2020

In our lab meeting tomorrow, Ashkan will give his presentation that was delayed due to the snow day. The presentation will discuss Arivazhagan et al. 2019 on attention for simultaneous machine translation.

Here are the title and abstract:

Monotonic Infinite Lookback Attention for Simultaneous Machine Translation

Abstract: Simultaneous machine translation begins to translate each source sentence before the source speaker is finished speaking, with applications to live and streaming scenarios. Simultaneous systems must carefully schedule their reading of the source sentence to balance quality against latency. We present the first simultaneous translation system to learn an adaptive schedule jointly with a neural machine translation (NMT) model that attends over all source tokens read thus far. We do so by introducing Monotonic Infinite Lookback (MILk) attention, which maintains both a hard, monotonic attention head to schedule the reading of the source sentence, and a soft attention head that extends from the monotonic head back to the beginning of the source. We show that MILk’s adaptive schedule allows it to arrive at latency-quality trade-offs that are favorable to those of a recently proposed wait-k strategy for many latency values.

Tuesday, Feb 25th, 09:30 a.m. TASC1 9408.

Lab meeting cancelled for reading break
18 Feb 2020

Due to the spring reading break, there is no lab meeting scheduled for tomorrow, 18 February. Meetings will resume as usual next week.

Nested Named Entity Recognition
11 Feb 2020

In our lab meeting tomorrow, Vincent will give a brief talk about his research in Nested Named Entity Recognition.

Here are the title and abstract:

Nested Named Entity Recognition

Abstract: Many named entities may contain other named entities inside them, especially in the biomedical domain. For instance, “Bank of China and University of Washington”, both organizations with nested locations. However, due to some technological reasons, the nested structure had been arbitrarily ignored for a long time, and only the outmost entities were considered, which literally lost the original semantic meanings. Since Manning firstly proposed a discriminative constituency parser for nested named entities recognition in 2009, many methods have been employed to detect them successfully. In this talk, we will review the main datasets for Nested NER and some approaches published in recent conferences.

Tuesday, Feb 11th, 09:30 a.m. TASC1 9408.

Recent Publications