In our lab meeting tomorrow, Golnar will give us a review on a paper of word sense induction.
Towards better substitution-based word sense induction
Abstract: Word sense induction (WSI) is the task of unsupervised clustering of word usages within a sentence to distinguish senses. Recent work obtain strong results by clustering lexical substitutes derived from pre-trained RNN language models (ELMo). Adapting the method to BERT improves the scores even further. We extend the previous method to support a dynamic rather than a fixed number of clusters as supported by other prominent methods, and propose a method for interpreting the resulting clusters by associating them with their most informative substitutes. We then perform extensive error analysis revealing the remaining sources of errors in the WSI task.
Tuesday, Jan 19th, 09:30 a.m.
In our lab meeting tomorrow, Nishant will give us a review of recent papers in EMNLP.
Notes from EMNLP 2020
Abstract: EMNLP just got over. Of the several interesting research published this year, we will take a quick look at a few papers. Multilingual models, while recording ever increasing performance scores, suffer severely when subjected to controlled test cases, specially the ones where we focus on the model-behaviour wrt certain languages. In this presentation, we look at how improving the vocabulary generation can lead to better model generalizations. We will also look at the factors essential for the multilinguality in the popular mBERT model. Finally, we will look at the adequacy of the reference-translations used to evaluate our favourite NMT models.
Tuesday, Dec 7th, 09:30 a.m.
In our lab meeting tomorrow, Logan will present Blank Language Models, from Shen and Quach et al.
Blank Language Models
Abstract: We propose Blank Language Model (BLM), a model that generates sequences by dynamically creating and filling in blanks. The blanks control which part of the sequence to expand, making BLM ideal for a variety of text editing and rewriting tasks. The model can start from a single blank or partially completed text with blanks at specified locations. It iteratively determines which word to place in a blank and whether to insert new blanks, and stops generating when no blanks are left to fill. BLM can be efficiently trained using a lower bound of the marginal data likelihood. On the task of filling missing text snippets, BLM significantly outperforms all other baselines in terms of both accuracy and fluency. Experiments on style transfer and damaged ancient text restoration demonstrate the potential of this framework for a wide range of applications.
Tuesday, Dec 1st, 09:30 a.m.
In our lab meeting tomorrow, Vincent will give us a brief introduction on Federated Machine Learning from Qiang Yang. A Zoom link will be posted to Twist on the morning of the meeting.
Federated Machine Learning: Concept and Applications
Abstract: Today’s AI still faces two major challenges. One is that in most industries, data exists in the form of isolated islands. The other is the strengthening of data privacy and security. We propose a possible solution to these challenges: secure federated learning. Beyond the federated learning framework first proposed by Google in 2016, we introduce a comprehensive secure federated learning framework, which includes horizontal federated learning, vertical federated learning and federated transfer learning. We provide definitions, architectures and applications for the federated learning framework, and provide a comprehensive survey of existing works on this subject. In addition, we propose building data networks among organizations based on federated mechanisms as an effective solution to allow knowledge to be shared without compromising user privacy.
Tuesday, November 24th, 09:30 a.m.
In our lab meeting tomorrow, Golnar will introduce her ongoing work.
A Zoom link will be posted to Twist on the morning of the meeting.
Moving down the tail of the wrong monster!
Abstract: I’ll argue that some recent WSD methods have been too fixated on improving over benchmark datasets when alternative evaluation shows that may not always be the best approach.
Tuesday, November 10th, 09:30 a.m.