News

Blank Language Models
01 Dec 2020

In our lab meeting tomorrow, Logan will present Blank Language Models, from Shen and Quach et al.

Blank Language Models

Abstract: We propose Blank Language Model (BLM), a model that generates sequences by dynamically creating and filling in blanks. The blanks control which part of the sequence to expand, making BLM ideal for a variety of text editing and rewriting tasks. The model can start from a single blank or partially completed text with blanks at specified locations. It iteratively determines which word to place in a blank and whether to insert new blanks, and stops generating when no blanks are left to fill. BLM can be efficiently trained using a lower bound of the marginal data likelihood. On the task of filling missing text snippets, BLM significantly outperforms all other baselines in terms of both accuracy and fluency. Experiments on style transfer and damaged ancient text restoration demonstrate the potential of this framework for a wide range of applications.

https://arxiv.org/pdf/2002.03079.pdf

Tuesday, Dec 1st, 09:30 a.m.

Federated Machine Learning: Concept and Applications
24 Nov 2020

In our lab meeting tomorrow, Vincent will give us a brief introduction on Federated Machine Learning from Qiang Yang. A Zoom link will be posted to Twist on the morning of the meeting.

Federated Machine Learning: Concept and Applications

Abstract: Today’s AI still faces two major challenges. One is that in most industries, data exists in the form of isolated islands. The other is the strengthening of data privacy and security. We propose a possible solution to these challenges: secure federated learning. Beyond the federated learning framework first proposed by Google in 2016, we introduce a comprehensive secure federated learning framework, which includes horizontal federated learning, vertical federated learning and federated transfer learning. We provide definitions, architectures and applications for the federated learning framework, and provide a comprehensive survey of existing works on this subject. In addition, we propose building data networks among organizations based on federated mechanisms as an effective solution to allow knowledge to be shared without compromising user privacy.

https://arxiv.org/pdf/1902.04885.pdf

Tuesday, November 24th, 09:30 a.m.

Moving down the tail of the wrong monster!
10 Nov 2020

In our lab meeting tomorrow, Golnar will introduce her ongoing work.

A Zoom link will be posted to Twist on the morning of the meeting.

Moving down the tail of the wrong monster!

Abstract: I’ll argue that some recent WSD methods have been too fixated on improving over benchmark datasets when alternative evaluation shows that may not always be the best approach.

Tuesday, November 10th, 09:30 a.m.

Why is Attention Not So Interpretable?
03 Nov 2020

In our lab meeting tomorrow, Jetic will introduce a paper on Attention model.

A Zoom link will be posted to Twist on the morning of the meeting.

Why is Attention Not So Interpretable?

Abstract: Attention-based methods have played an important role in model interpretations, where the calculated attention weights are expected to highlight the critical parts of inputs (e.g., keywords in sentences). However, recent research points out that attention-as-importance interpretations often do not work as well as we expect. For example, learned attention weights sometimes highlight less meaningful tokens like “[SEP]”, “,”, and “.”, and are frequently uncorrelated with other feature importance indicators like gradient-based measures. Finally, a debate on the effectiveness of attention-based interpretations has been raised. In this paper, we reveal that one root cause of this phenomenon can be ascribed to the combinatorial shortcuts, which stands for that in addition to the highlighted parts, the attention weights themselves may carry extra information which could be utilized by downstream models of attention layers. As a result, the attention weights are no longer pure importance indicators. We theoretically analyze the combinatorial shortcuts, design one intuitive experiment to demonstrate their existence, and propose two methods to mitigate this issue. Empirical studies on attention-based interpretation models are conducted, and the results show that the proposed methods can effectively improve the interpretability of attention mechanisms on a variety of datasets.

https://arxiv.org/abs/2006.05656

Tuesday, November 3rd, 09:30 a.m.

Better Neural Machine Translation by Extracting Linguistic Information from BERT
27 Oct 2020

In our lab meeting tomorrow, Hassan will introduce his work on Neural Machine Translation, which is also his submission of EACL 2021.

A Zoom link will be posted to Twist on the morning of the meeting.

Better Neural Machine Translation by Extracting Linguistic Information from BERT

Abstract: Adding linguistic information (syntax or semantics) to neural machine translation (NMT) have mostly focused on using point estimates from pre-trained models. Directly using the capacity of massive pre-trained contextual word embedding models such as BERT has been marginally useful in NMT because effective fine-tuning is difficult to obtain for NMT without making training brittle and unreliable. We augment NMT by extracting dense fine tuned vector-based linguistic information from BERT instead of using point estimates. Experimental results show that our method of incorporating linguistic information helps NMT to generalize better in a variety of training contexts and is no more difficult to train than conventional Transformer-based NMT.

Tuesday, October 27th, 09:30 a.m.

Recent Publications