In the lab meeting this week, 15th of Oct, Maryam will give a talk about Expressive Hierarchical Rule Extraction for Left-to-Right Translation. The meeting will be at TASC1 9408 from 1030 hours. Following is the abstract of the paper:
Left-to-right (LR) decoding Watanabe et al. (2006) is a promising decoding algorithm for hi- erarchical phrase-based translation (Hiero) that visits input spans in arbitrary order producing the output translation in left to right order. This leads to far fewer language model calls. But the constrained SCFG grammar used in LR-Hiero (GNF) with at most two non-terminals is unable to account for some complex phrasal reordering. Allowing more non-terminals in the rules results in a more expressive grammar. LR-decoding can be used to decode with SCFGs with more than two non-terminals, but the CKY decoders used for Hiero systems cannot deal with such expressive grammars due to a blowup in computational complexity. In this paper we present a dynamic programming algorithm for GNF rule extraction which efficiently ex- tracts sentence level SCFG rule sets with an arbitrary number of non-terminals. We analyze the performance of the obtained grammar for statistical machine translation on three language pairs.
In the lab meeting this week, 8th of Oct, Maryam and Ramtin will give a talk about their recent paper. The meeting will be at TASC1 9408 from 1030 hours.
Abstract : Hierarchical phrase-based machine translation (Hiero) is a prominent approach for Statistical Machine Translation usually comparable to or better than conventional phrase-based systems. But Hiero typically uses the CKY decoding algorithm which requires the entire input sentence before decoding begins, as it produces the translation in a bottom-up fashion. Left-to-right (LR) decoding is a promising decoding algorithm for Hiero that produces the output translation in left to right order. In this paper we focus on simultaneous translation using the Hiero translation framework. In simultaneous translation, translations are generated incrementally as source language speech input is processed. We propose a novel approach for incremental translation by integrating segmentation and decoding in LR-Hiero. We compare two incremental decoding algorithms for LR-Hiero and present translation quality scores (BLEU) and the latency of generating translations for both decoders on audio lectures from the TED collection.
In the lab meeting this week, 1st of Oct, Milan will give an overview of the evaluation of visual text analytics. The talk will focus on characterizing the target problem space which forms the framework for the evaluation process. The meeting will be at TASC1 9408 from 1030 hours.
In the lab meeting this week, 24th of Sep, we will discuss paper ideas/plans that lab members have or what they are currently working on for the upcoming TACL or *ACL conference deadlines. The meeting will be at TASC1 9408 from 1030 hours.
In the lab meeting on Wednesday, 17th September in TASC1 9408 at 1030 hours, Golnar will give a talk about Detecting Health Related Discussions in Everyday Telephone Conversations for Studying Medical Events in the Lives of Older Adults. Following is the abstract of the paper:
We apply semi-supervised topic modeling techniques to detect health-related discus- sions in everyday telephone conversations, which has applications in large-scale epidemiological studies and for clinical interventions for older adults. The privacy requirements associated with utilizing everyday telephone conversations preclude manual annotations; hence, we explore semi-supervised methods in this task. We adopt a semi-supervised version of Latent Dirichlet Allocation (LDA) to guide the learning process. Within this framework, we investigate a strategy to discard irrelevant words in the topic distribution and demonstrate that this strategy improves the average F-score on the in-domain task and an out-of-domain task (Fisher corpus). Our results show that the increase in discussion of health related conversations is statistically associated with actual medical events obtained through weekly self-reports.