11 Mar 2009

New publication:

Training Global Linear Models for Chinese Word Segmentation. Dong Song and Anoop Sarkar. In Proceedings of the 22nd Canadian Conference on Artificial Intelligence, Canadian AI 2009. Kelowna, BC. May 25-27, 2009.

This paper examines how one can obtain state of the art Chinese word segmentation using global linear models. We provide experimental comparisons that give a detailed road-map for obtaining state of the art accuracy on various datasets. In particular, we compare the use of reranking with full beam search; we compare various methods for learning weights for features that are full sentence features, such as language model features; and, we compare an Averaged Perceptron global linear model with the Exponentiated Gradient max-margin algorithm.