Improving Statistical Machine Translation with a Multilingual Paraphrase Database - Accepted long paper at EMNLP 2015
13 Aug 2015

The list of accepted papers in EMNLP 2015 is now available at accepted papers

Natlang students Ramtin and Maryam have the following long paper accepted at EMNLP 2015:

Improving Statistical Machine Translation with a Multilingual Paraphrase Database
Abstract :
The multilingual Paraphrase Database (PPDB) is a freely available automatically created resource of paraphrases in multiple languages. In statistical machine translation, paraphrases can be used to provide translation for out-of-vocabulary (OOV) phrases. In this paper, we show that a graph propagation approach that uses PPDB paraphrases can be used to improve overall translation quality. We provide an extensive comparison with previous work and show that our PPDB-based method improves the BLEU score by up to 1.79 percent points. We show that our approach improves on the state of the art in three different settings: when faced with limited amount of parallel training data; a domain shift between training and test data; and handling a morphologically complex source language. Our PPDB-based method outperforms the use of distributional profiles from monolingual source data.

Out of 1315 valid submissions, only 312 were accepted, which gives an acceptance rate of 24% for EMNLP 2015.

Congratulations to Ramtin, Maryam and Anoop.

Leveraging submetered electricity loads to disaggregate household water use
04 Aug 2015

On Tuesday, August 11th, Bradley will give a practice talk on “Leveraging submetered electricity loads to disaggregate household water use”. The talk will be inside TASC1 9408 at 11:30. Here is the information of this talk.

The world’s water supply is rapidly dwindling. Informing homeowners of their water use patterns can help them reduce consumption. Today’s smart meters only show a whole house’s water consumption over time. People need to be able to see where they are using water most to be able to change their habits. The task of inferring the breakdown of water use from smart meter data is called water disaggregation. Water disaggregation has been dominated by studies that rely on high-frequency data, proprietary meters, and/or labelled datasets. In contrast, this thesis uses low-frequency data from standardized meters and does not rely on labelled data. To accomplish this, we leverage information from non-intrusive load monitoring (the electricity counterpart of this task). We propose a modification of the Viterbi Algorithm that applies a supervised method to an unsupervised disaggregation problem. Using this, we are able to achieve mean squared errors of under 0.02.

Keywords: water disaggregation; water conservation; smart homes; sustainability

Optimizing Multivariate Performance Measures for Learning Relation Extraction Models
02 Jul 2015

Dr. Reza Haffari will visit our lab tomorrow, Friday 3rd of July. He will give a talk on Optimizing Multivariate Performance Measures for Learning Relation Extraction Models. The talk will be at 1PM at TASC1 9204 West. Here is the abstract and a short bio:

Title: Optimizing Multivariate Performance Measures for Learning Relation Extraction Models

Abstract: We describe a novel max-margin learning approach to optimize non-linear performance measures for distantly-supervised relation extraction models. Our approach can be generally used to learn latent variable models under multivariate non-linear performance measures, such as Fβ-score. Our approach interleaves Concave-Convex Procedure (CCCP) for populating latent variables with dual decomposition to factorize the original hard problem into smaller independent sub-problems. The experimental results demonstrate that our learning algorithm is more effective than the ones commonly used in the literature for distant supervision of information extraction models. On several data conditions, we show that our method outperforms the baseline and results in up to 8.5% improvement in the F1-score.

Bio: Reza Haffari is an Assistant Prof. in the Faculty of IT, Monash University. His research is in the intersection of Machine Learning and Natural Language Processing (NLP). His primary research is developing new models and learning algorithms for real-life problems, particularly those arise in NLP. This includes topics like structured prediction, domain adaptation, and semi-supervised learning for problems such as machine translation, parsing, language modelling, and information extraction.

Training in Big Text Data Workshop
29 Apr 2015

Wed 4/28 we will have an extended “lab meeting” with special guests,
Evangelos Milios and Axel Soto from Dalhousie University who will
be joining us for a collection of presentations and discussion in
the natural language lab.

Schedule for Training in Big Text Data Workshop from 9:30am to 2:30pm

  • 9:30 – 12:00: Student Presentations (Ellert, Odilinye, Marques, Sabharwal, Tofiloski)
  • 12:00 – 1:30: Lunch
  • 1:30 – 2:30: General Research Discussions

New Features of Lensing Wikipedia + Apache Spark
14 Apr 2015

In the lab meeting this week, 15th of April, Anoop will give a demo on new features in Lensing Wikipedia Project. In the second half of the meeting, Anoop will talk about how to exploit Apache Spark for distributed computing. The meeting would be the same location at the usual time.

Recent Publications