In our lab meeting this week, Fatemeh Torabi will talk about word embeddings. Here is the abstract of her talk:
Abstract: Word embeddings obtained from neural networks trained on big text corpora have become popular representations of word meaning in computational linguistics. The most popular model recently, i.e., word2vec, simultaneously generates a set of word and context embeddings, the latter usually discarded after training. We demonstrate how these two layers of distributional representation can be used in predicting taxonomic similarity vs. asymmetric association between words. Our study is composed of both artificial language experiments and evaluations based on word similarity and relatedness datasets collected through crowdsourcing and psycholinguistic experiments. In particular, we use two recently published datasets: SimLex-999 (Hills et al. 2016) including explicitly instructed ratings for word similarity, and explicitly instructed production norms (Jouravlev & McRae, 2016) for word relatedness. We find that people respond with words closer to the cue within the context embedding space (rather than the word embedding space) when they are explicitly asked to generate thematically related words. Taxonomic similarity ratings are however better predicted by word embeddings alone. This suggests that the distributional information encoded in different layers of the neural network reflect different aspects of word meaning. Our experiments also elaborate on word2vec as a model of human lexical memory by showing that both types of semantic relations among words are encoded within a unified network through reinforcement learning. Recommendations for biasing the model to organize words based either on taxonomic similarity vs. relatedness are introduced for practical applications.
Wednesday, April 4th, 10-11 AM, Location: TASC1 9408.