Header

Search

Automatic-Subtitling-for-Romansh-TV

Supervisors: Annette RiosJannis Vamvas

Summary

Create a dataset consisting of Romansh audio/video plus (German) subtitles (via the SRG API) to create a system that can automatically provide subtitles in Romansh idioms. You test two approaches (or more, if you have an idea):

  1. combine Automatic Speech Recognition (ASR) and Machine Translation (MT) at inference time
  2. adapt a multimodal model (audio and text) to generate Romansh subtitles

Tasks:

  1. Create the dataset
  2. Define evaluation method: as far as we know, there are no existing Romansh subtitles, so you will have to find another way to evaluate
  3. Train/adapt models
  4. Evaluate approaches across Romansh idioms

Expected Outcome:

 A system that can provide subtitles in Romansh idioms

You will deepen the following skills:

  • HuggingFace Transformers
  • Basics of multimodal processing (audio, video)
  • Python/PyTorch