In our lab meeting tomorrow, Nishant will introduce the Transformer Models. A Zoom link will be posted to Twist on the morning of the meeting.
Synthesizer: Rethinking Self-Attention in Transformer Models
Abstract: The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models. But is it really required? This paper investigates the true importance and contribution of the dot product-based self-attention mechanism on the performance of Transformer models. Via extensive experiments, we find that (1) random alignment matrices surprisingly perform quite competitively and (2) learning attention weights from token-token (query-key) interactions is not that important after all. To this end, we propose Synthesizer, a model that learns synthetic attention weights without token-token interactions. Our experimental results show that Synthesizer is competitive against vanilla Transformer models across a range of tasks, including MT (EnDe, EnFr), language modeling (LM1B), abstractive summarization (CNN/Dailymail), dialogue generation (PersonaChat) and Multi-task language understanding (GLUE, SuperGLUE).
Tuesday, July 7th, 09:30 a.m.
In our lab meeting tomorrow, Hassan will introduce the online privacy. A Zoom link will be posted to Twist on the morning of the meeting.
Does Privacy Exist When We Are Online?
Tuesday, June 30th, 09:30 a.m.
In our lab meeting tomorrow, Ashkan will introduce Automatic Speech Translation. A Zoom link will be posted to Twist on the morning of the meeting.
Recent trends in Automatic Speech Translation
Abstract: Automatic Speech Translation (AST) aims to directly translate audio signals in the source language, into the text words in the target language. For many years, the pipeline of transcribing speech with ASR and then translating with the MT component was a standard method to address the speech translation problem. In recent years, it has shown that we can remove the transcription step and build an end-to-end model that is strong enough to compete with the cascaded model. In this talk I go through the most influential ideas in this research direction.
Tuesday, June 23th, 09:30 a.m.
In our lab meeting tomorrow, Pooya will introduce Efficient Transformer. A Zoom link will be posted to Twist on the morning of the meeting.
Reformer: The Efficient Transformer
Abstract: Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O(L2) to O(LlogL), where L is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of N times, where N is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.
Tuesday, June 9th, 09:30 a.m.
In our lab meeting tomorrow, Vincent will introduce a review on Nested Named Entity Recognizion. A Zoom link will be posted to Twist on the morning of the meeting.
A review on Nested Named Entity Recognizion
Abstract: Named entity recognition (NER) is the task to extract some certain semantic entities, such as person, organization, etc, from a sentence or a paragraph. In other words, it detects the span and the semantic categories that each entity belongs to. NER plays an important role in many downstream tasks such as relation extraction, co-reference resolution and entity linking. Nested NER, namely, refers to the stuation where some entities may contain others. Due to the technical problems, not semantic ones, Nested NER had been ignored for a lone time. However, Nested NER is very common, especially in biomedical domain, and fine-grained entities provide a necessary and detailed information for the downstream tasks. This review mainly focuses on three parts: i. Sequence labeling model with multiple labels classification. ii. Sequence labeling model with modified Decoder. iii. Other models apart from sequence labeling.
Tuesday, May 19th, 09:30 a.m.