What Do Long Text Semantic Representations Capture?

Supervisor(s): Andrianos Michail & Dr. Juri Opitz &Dr. Simon Clematide

Summary

We can now embed texts up to 8192 Tokens long. That’s a lot of meaning to be squeezed. Are we representing the entire text fairly or is there biases in the representation based on the order of the text or other nuances? We aim to understand how/which part of the texts do long context multilingual representations capture through a series of evaluations of augmented texts.

If interested, please send an email addressed to all three of us for maximum visibility.

Requirements

Machine Learning
Python/PyTorch

Department of Computational Linguistics

Quicklinks und Sprachwechsel

Main navigation

What Do Long Text Semantic Representations Capture?

Summary

Requirements