Natural Language Laboratory
School of Computing Science
Simon Fraser University

Our recent progress about the Undeciphered Proto-Elamite Script

08 Jun 2021

This week, Logan will discuss parts of his upcoming publication in Findings of ACL 2021, as well as his submission to EMNLP 2021.

Our recent progress about the Undeciphered Proto-Elamite Script

Abstract: Compositionality of Complex Graphemes in the Undeciphered Proto-Elamite Script using Image and Text Embedding Models

We introduce a language modeling architecture which operates over sequences of images, or over multimodal sequences of images with associated labels. We use this architecture alongside other embedding models to investigate a category of signs called complex graphemes (CGs) in the undeciphered proto-Elamite script. We argue that CGs have meanings which are at least partly compositional, and we discover novel rules governing the construction of CGs. We find that a language model over sign images produces more interpretable results than a model over text or over sign images and text, which suggests that the names given to signs may be obscuring signals in the corpus. Our results reveal previously unknown regularities in proto-Elamite sign use that can inform future decipherment efforts, and our image-aware language model provides a novel way to abstract away from biases introduced by human annotators.

Creating a Signlist from Sign Images in an Undeciphered Script using Deep Clustering

We propose an architecture for revising transliterations of an undeciphered script by clustering sign images from that script. The clustering is optimized on a multi-part objective that includes unsupervised tasks such as entropy of the sign labels, visual similarity between signs and partial supervision that exploits existing transliterations for language modeling. This allows us to learn revised labelings for an undeciphered script which may be difficult for human annotators to transliterate since distinctions between signs may be relevant or irrelevant based on contextual information spread across the entire corpus. By automating this process we obtain a simplified signlist which we find to give better results than the existing transliterations on downstream tasks.

Tuesday, June 8th, 09:30 a.m.