Automatic-Subtitling-for-Romansh-TV
Supervisors: Annette Rios, Jannis Vamvas
Summary
Create a dataset consisting of Romansh audio/video plus (German) subtitles (via the SRG API) to create a system that can automatically provide subtitles in Romansh idioms. You test two approaches (or more, if you have an idea):
- combine Automatic Speech Recognition (ASR) and Machine Translation (MT) at inference time
- adapt a multimodal model (audio and text) to generate Romansh subtitles
Tasks:
- Create the dataset
- Define evaluation method: as far as we know, there are no existing Romansh subtitles, so you will have to find another way to evaluate
- Train/adapt models
- Evaluate approaches across Romansh idioms
Expected Outcome:
A system that can provide subtitles in Romansh idioms
You will deepen the following skills:
- HuggingFace Transformers
- Basics of multimodal processing (audio, video)
- Python/PyTorch