05 Dec 2010

The depth examination of Baskaran Sankaran was held on 25 Nov 2010.

Title: A Survey of Unsupervised Grammar Induction

Abstract

The automatic induction of natural language grammars without human guidance has been an important challenge in Computational Linguistics. In recent years, there has been a steady improvement in what can be learned from raw text using such methods. We present a survey of the core approaches in the unsupervised learning of formal grammars for natural language. The unsupervised learning of natural language grammars provides an interesting challenge for machine learning methods since the output of learning is a complex structure and typical supervised approaches require a very large annotated sample in order to obtain accurate generalization over unseen input which is not provided to an unsupervised learner. Without any such annotations to learn from, unsupervised grammar induction seeks to automatically uncover the hidden regularities of language. We show how constraining estimation of the hidden parameters of the model, employing parametric and structural search and the use of prior knowledge in building a probabilistic model of grammar are used to effectively learn natural language grammars from text. We also survey unsupervised grammar induction in a multilingual setting, where the translation in another language can provide useful information to help constrain the learning of a monolingual grammar.