24 Jul 2018

In our lab meeting this week, Nishant and Fariha will give us a practice talk for 30 minutes each. Nishant will defend his master thesis on Thursday this week, and Fariha will have her depth exam on Wednesday next week. Here are the title and abstract of their presentation:

Nishant: Decipherment of Substitution Ciphers with Neural Language Models Abstract: The decipherment of homophonic substitution ciphers using language models (LMs) is a well-studied task in Natural Language Processing (NLP). Previous work in this topic score short local spans of possible plaintext decipherments using n-gram LMs. The most widely used technique is the use of a beam search with n-gram LMs proposed by Nuhn et al. (2013). We propose a new approach on decipherment using a beam search algorithm that scores the entire candidate plaintext at each step with a neural LM. We augment beam search with a novel rest cost estimation that exploits the predictive power of a neural LM. This work, to our knowledge, is the first to use a large pretrained neural language model for decipherment. Our neural decipherment approach outperforms the state-of-the-art n-gram based methods on many different ciphers. On challenging ciphers such as the Beale cipher, our system reports significantly lower error rates with much smaller beam sizes.

Fariha: GENERATING TEXTUAL DESCRIPTION FROM TIME SERIES DATA Abstract: Natural language generation (NLG), which is a subfield of natural language processing (NLP), deals with non-linguistic and linguistic representations to construct written text in natural language. The generated text can be presented in the form of reports, summaries, explanations, messages, etc. Various approaches are proposed for analyzing numerical or time series data to produce a written text description. In this presentation, we will explore different approaches from the area of NLG and data-to-text technology that are proposed for time series to automatically generate natural language responses that will reflect the subject matter expertise. We will shed light on the popular approaches including knowledge-based and machine learning techniques applied to identify relevant content from time series datasets to generate textual descriptions.

Wednesday, July 25th, 10:00 a.m. TASC1 9408.