I am a PhD student under the supervision of Prof. Rico Sennrich. My research is a part of the NCCR Evolving Language , where I will be specializing in the knowledge representations of the multilingual LLMs.
Research Interests
- Subword tokenization, especially robustness towards noise, multilingual tokenizers, and token-free models
- Factual knowledge in LLMs, especially its multilingual alignment and mechanistic interpretability
- MT metrics, especially terminology in MT
- term consistency metric at WMT2022: text
- terminology shared task at WMT2023 - organizer: text
- terminology shared task at WMT 2025 - organizer: text, Github repo
- Corpus linguistics and language data resources:
- manager of the Russian-Chinese parallel corpus (2018-2021)
- contributed to data collection of the Russian dialect corpus (Ustja River Basin, 2017)
- contributed to data collection and annotation of the Russian learner corpus (2017-2018)
Publications
Teaching
University of Zurich:
| HS 2025 |
Other Universities:
- Teaching "General Morphology" course at the Faculty of Liberal Arts and Sciences, Montenegro
Popular Science, Talks
- "Do we need linguists for IT in the ChatGPT era?". At Auditoria Budva, 17 August 2025. video [in Russian]
- Talk with Boris Orekhov about Charles University. Ayva League podcast, 9 July 2024. audio [in Russian]
- Russian-Chinese Parallel Corpus. Laowaicast, 12 January 2021. audio, podcast page [in Russian]