02 Oct 2018

In our lab meeting this week, Nishant and Hassan will present their work for 30 minutes each. Nishant will go over his accepted EMNLP paper and Hassan will summarize his internship work at Riken.Here are the title and abstract of their presentations:

Nishant: Decipherment of Substitution Cipher Using Neural Language Models

Abstract: The decipherment of homophonic substitution ciphers using language models (LMs) is a well-studied task in Natural Language Processing (NLP). Previous work in this topic score short local spans of possible plaintext decipherments using n-gram LMs. The most widely used technique is the use of a beam search with n-gram LMs proposed by Nuhn et al. (2013). We propose a new approach on decipherment using a beam search algorithm that scores the entire candidate plaintext at each step with a neural LM. We augment beam search with a novel rest cost estimation that exploits the predictive power of a neural LM. This work, to our knowledge, is the first to use a large pretrained neural language model for decipherment. Our neural decipherment approach outperforms the state-of-the-art n-gram based methods on many different ciphers. On challenging ciphers such as the Beale cipher, our system reports significantly lower error rates with much smaller beam sizes.

Hassan: SHINRA; a Dataset for Multi-Labled Multilingual Classification of Wikipedia Articles

Abstract: In order to construct a language understanding system, such as Question-Answering, which can also explain its decision in language, we need the world knowledge which can be processed by machines. Wikipedia is a great resource of such knowledge, but it is hardly understandable for machines to process. In order to create machine processable knowledge base, we are trying to structure Wikipedia. Our first step towards this goal would be to classify the wikipedia entities into predefined categories. This project summarizes our attempt to prepare the data for the wikipedia entity classification task.

Tuesday, October 2nd, 11:30 a.m. TASC1 9408.