02 Feb 2021

In our lab meeting tomorrow, Logan will introduce his research on the Undeciphered Proto-Elamite Script.

Compositionality of Complex Graphemes in the Undeciphered Proto-Elamite Script using Image and Text Embedding Models

Abstract: We introduce a language modeling architecture which operates over sequences of images, or over multimodal sequences of images with associated labels. We use this architecture alongside other embedding models to investigate a category of signs called complex graphemes (CGs) in the undeciphered proto-Elamite script. We argue that CGs have meanings which are at least partly compositional, and we demonstrate quantifiable differences in the distribution of two categories of signs used in CGs. We find that a language model over sign images produces more interpretable results than a model over text or over sign images and text, which suggests that the names given to signs may be obscuring signals in the corpus. Our results indicate the presence of previously unknown regularities in proto-Elamite sign use, and our image-aware language model provides a novel way to abstract away from biases introduced by human annotators.

Tuesday, Feb 2nd, 09:30 a.m.