I am a final year PhD candidate working in the Impresso II – Media Monitoring of the Past project under the supervision of Simon Clematide, Rico Sennrich and external supervision by Mrinmaya Sachan. My research focuses on the controlled evaluation of (cross-lingual) semantic search with multilingual embedding models on long, noisy, digitized historical newspapers.
Education
- September 2023 - Now: PhD Candidate at the Department of Computational Linguistics, University of Zurich
- 2020 - 2023: MSc in Computing and Economics (90) and Data Science (30) at the University of Zurich
- 2017 - 2020 : BSc in Artificial Intelligence and Computer Science(180) at the University of Sheffield
Selected Publications
- Elias Schuhmacher*, Andrianos Michail*, Juri Opitz, Rico Sennrich, and Simon Clematide. Information Representation Fairness in Long-Document Embeddings: The Peculiar Interaction of Positional and Language Biasto appear in ACL2026 Findings
- Juri Opitz*, Andrianos Michail*, Lucas Moeller, Sebastian Padó, and Simon Clematide. Similar, but why? A Toolkit for Explaining Text Similarity. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations), pp. 203-214. 2026.
- Andrianos Michail, Simon Clematide, and Rico Sennrich. Examining Multilingual Embedding Models Cross-Lingually Through LLM-Generated Adversarial ExamplesIn Findings of the Association for Computational Linguistics: EMNLP 2025, pp. 2161-2170. 2025.
- Juri Opitz, Lucas Moeller, Andrianos Michail, Sebastian Padó, and Simon Clematide. Interpretable text embeddings and text similarity explanation: A survey. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp. 22314-22330. 2025.
- Hongji Li, Andrianos Michail, Reto Gubelmann, Simon Clematide, and Juri Opitz. Sentence Smith: Controllable Edits for Evaluating Text Embeddings. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp. 26439-26456. 2025.
- Andrianos Michail, Juri Opitz, Yining Wang, Robin Meister, Rico Sennrich, and Simon Clematide. 2025. Cheap Character Noise for OCR-Robust Multilingual Embeddings. In Findings of the Association for Computational Linguistics: ACL 2025, pages 11705–11716, Vienna, Austria. Association for Computational Linguistics.
- Andrianos Michail, Corina Julia Raclé, Juri Opitz, and Simon Clematide. Adapting Multilingual Embedding Models to Historical Luxembourgish. In 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, vol. 2025, pp. 291-298. 2025.
- Andrianos Michail, Simon Clematide, and Juri Opitz. 2025. PARAPHRASUS: A Comprehensive Benchmark for Evaluating Paraphrase Detection Models.
In Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025), pages 8749–8762, Abu Dhabi, UAE. Association for Computational Linguistics. - Uluslu, Ahmet Yavuz*, Andrianos Michail*, and Simon Clematide. 2024. Utilizing large language models to identify evidence of suicidality risk through analysis of emotionally charged postsIn Proceedings of the 9th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2024), pages 264-269, St. Julians, Malta. Association for Computational Linguistics.
- Andrianos Michail, Stefanos Konstantinou, and Simon Clematide. 2023. UZH_CLyp at SemEval-2023 Task 9: Head-First Fine-Tuning and ChatGPT Data Generation for Cross-Lingual Learning in Tweet Intimacy PredictionIn Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 1021–1029, Toronto, Canada. Association for Computational Linguistics.
Teaching
| HS 2026 | Lecturer for Information Retrieval(S) (link once published) |
| FS 2021-2026 |
(x5) TA for Machine Learning for Natural Language Processing II |
| HS 2020-2025 |
(x5) TA for Machine Learning for Natural Language Processing I |
| HS 2021 | |
| FS 2021 |
TA for Informatics II |