Portrait and Research Interests
I am Titulary Professor and postdoc staff member in Computational Linguistics (CL) at the Department of Computational Linguistics, head of the NLP group in Linguistics Research Infrastructure (LiRI Tech NLP) and of the Text Crunching Center (TCC), which offers computational linguistics services to the University and other partners.
I am a senior researcher in the URPP Digital Religion(s), in Project 8, where we advance hate speech detection tools, detect intolerance and apply content analysis methods on important social and religious issues.
I have been senior lecturer and computing scientist (wissenschaftlicher Informatiker) at the English Department of the University of Zurich (Gerold Schneider's homepage at the English Department).
My research interests include corpus linguistics, semantic mining, automated media content analysis, cognitive linguistics, digital humanities, robust parsing, syntax, formal grammar.
I am involved in research on automated media content analysis, and on Text Mining in the biomedical and many other domains. I am also doing research on Digital Humanities, learner language, variationist linguistics (genre, regions, contrastive, typology), and statistical methods.
I have published over 130 peer-reviewed articles and a coursebook on Statistics.
In the winter term 2017/18 I have worked as Substituting Professor for German Linguistics at TU Dortmund University.
I have worked at the linguistics department of University of Konstanz, substituting Prof. Dr. Miriam Butt from 2015 to 2017 as Professor of Computational and General Linguistics.
Selected articles in bibliographical databases can be
downloaded from ZORA
or downloaded from my Google Scholar profile
I co-supervise the following doctoral theses: Michi Amsler, Peter Makarov, Janis Goldzycher, Maud Reveilhac.
I have written my cumulative habilitation on using computational linguistics methods for descriptive linguistics, text mining and psycholinguistics.
I have written a a low-complexity, broad-coverage probabilistic Dependency Parser for English,as a part of
I have also ported it to German, together with Rico Sennrich.
My Recent Publications related to the Department of Computational Linguistics (ZORA)
ZORA Publikationsliste
Download-Optionen
Publikationen
-
Combining Collocation Measures and Distributional Semantics to Detect Idioms In M. Laitinen & P. Rautionaho (Eds.), Data-Intensive Investigations of English (pp. 104–135). Cambridge University Press. https://doi.org/10.1017/9781009415682.005
-
Measuring language complexity about European politics in Swiss parliamentary debates In A. Pawłowski, S. Embleton, J. Mačutek, & A. Xanthos (Eds.), Mathematical Modelling in Linguistics and Text Analysis (Vol. 370, pp. 191–206). John Benjamins Publishing. https://doi.org/10.1075/cilt.370
-
Entropy as a Lens: Exploring Visual Behavior Patterns in Architects Journal of Eye Movement Research, 18, 43. https://doi.org/10.3390/jemr18050043
-
Evaluating a transparent and interpretable approach to stance detection using linguistic markers in social media data International Journal of Corpus Linguistics, 30, 195–233. https://doi.org/10.1075/ijcl.24132.rev
-
PreClinIE: An Annotated Corpus for Information Extraction in Preclinical Studies (D. Demner-Fushman, S. Ananiadou, M. Miwa, & J. Tsujii, Eds.; pp. 74–87). Association for Computational Linguistics. https://doi.org/10.18653/v1/2025.bionlp-1.8
-
Detecting and Mapping Hate in Religious Contexts In T. Schlag & K. Yadav (Eds.), Religious Communication, Interaction and Transformation in a Culture of Digitality : Insights into the Zurich University Research Priority Program “Digital Religion(s)” (pp. 153–183). De Gruyter. https://doi.org/10.1515/9783111721729
-
How stable are multivariate findings about register variation across varieties of English? On the replicability of Geometric Multivariate Analysis ICAME Journal, 49, 23–45. https://doi.org/10.2478/icame-2025-0003
-
The ‘Spiritual’ and the ‘Religious’ in the Twittersphere: A Topic Model and Semantic Map Journal of Religion, Media & Digital Culture, 14, 1–22. https://doi.org/10.1163/21659214-bja10123
-
Investigating Linguistic Abilities of LLMs for Native Language Identification (No. 14). 81. https://hdl.handle.net/10062/107173
-
Robust Native Language Identification through Agentic Decomposition In C. Christodoulopoulos, Tanmoy Chakraborty, C. Rose, & Violet Peng (Eds.), Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (pp. 8398–8414). Association for Computational Linguistics. https://aclanthology.org/2025.emnlp-main.423/
-
Digital Dickens: An automated content analysis of Charles Dickens’ novels In S. Buschfeld, P. Ronan, T. Neumaier, A. Wellinghoff, & L. Westermayer (Eds.), Crossing Boundaries through Corpora: Innovative corpus approaches within and beyond linguistics (pp. 62–98). John Benjamins Publishing. https://doi.org/10.1075/scl.119
-
Automatically detecting directives with SPICE Ireland In M. Schweinberger & P. Ronan (Eds.), Socio-Pragmatic Variation in Ireland: Using Pragmatic Variation to Construct Social Identities (No. 378; pp. 205–234). De Gruyter. https://doi.org/10.1515/9783110791457-011
-
Evaluating Transformers on the Ethical Question of Euthanasia 241–246. https://aclanthology.org/2024.swisstext-1.55.pdf
-
The Visualisation and Evaluation of Semantic and Conceptual Maps In M. Laitinen & J. Tyrkkö (Eds.), Linguistics across Disciplinary Borders: The March of Data (pp. 67–94). Bloomsbury Publishing. https://doi.org/10.5040/9781350362291.0009
-
Native Language Identification Improves Authorship Attribution 289–296. https://aclanthology.org/2024.icnlsp-1.0
-
Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset 4405–4424. https://doi.org/10.18653/v1/2024.naacl-long.248
-
Investigating child language acquisition from a joint perspective: A comparison of traditional and new L1 speakers of English In M. Schmalz, M. Vida-Mannl, & S. Buschfeld (Eds.), Acquisition and Variation in World Englishes: Bridging Paradigms and Rethinking Approaches (No. 69; pp. 133–157). De Gruyter. https://doi.org/10.1515/9783110733723-007
-
Turkish Native Language Identification 303–307. https://aclanthology.org/2023.icnlsp-1.0.pdf
-
Exploring Hybrid Linguistic Features for Turkish Text Readability 223–232. https://aclanthology.org/2023.icnlsp-1.0.pdf
-
The LiRI Corpus Platform In K. Linden, J. Niemi, & T. Kontino (Eds.), CLARIN Annual Conference Proceedings (pp. 145–149). CLARIN ERIC. http://hdl.handle.net/10138/570996
-
“To boldly go where no man has gone before”: how iconic is the Star Trek split infinitive? Linguistics Vanguard, 9, 247–255. https://doi.org/10.1515/lingvan-2022-0168
-
Exploring the role of AI in classifying, analyzing, and generating case reports on assisted suicide cases: feasibility and ethical implications Frontiers in Artificial Intelligence, 6, 1328865. https://doi.org/10.3389/frai.2023.1328865
-
Colloquialisation, compression and democratisation in British parliamentary debates In M. Korhonen, H. Kotze, & J. Tyrkkö (Eds.), Exploring Language and Society with Big Data: Parliamentary discourse across time and space (pp. 336–372). John Benjamins Publishing. https://doi.org/10.1075/scl.111.12sch
-
Swissdox@ LiRI–a large database of media articles made accessible to researchers In K. Linden, J. Niemi, & T. Kontino (Eds.), CLARIN Annual Conference Proceedings (pp. 111–115). CLARIN ERIC. https://helda.helsinki.fi/bitstreams/6aa6e46b-697e-45da-b0f0-d5211d4e78bc/download#page=120
-
Differences in syntactic annotation affect retrieval International Journal of Corpus Linguistics, 28, 378–406. https://doi.org/10.1075/ijcl.21104.zeh
-
Evaluating the Effectiveness of Natural Language Inference for Hate Speech Detection in Languages with Limited Labeled Data 187–201. https://doi.org/10.18653/v1/2023.woah-1.19
-
Detecting and Analysing Learner Difficulties Using a Learner Corpus Without Error Tagging In K. Harrington & P. Ronan (Eds.), Demystifying Corpus Linguistics for English Language Teaching (pp. 229–257). Palgrave Macmillan. https://doi.org/10.1007/978-3-031-11220-1_12
-
Replicable semi-supervised approaches to state-of-the-art stance detection of tweets Information Processing & Management, 60, 103199. https://doi.org/10.1016/j.ipm.2022.103199
-
Do Non-native Speakers Read Differently? Predicting Reading Times with Surprisal and Language Models of Native and Non-native Eye Tracking Data In B. Busse, N. Dumrukcic, & I. Kleiber (Eds.), Language and Linguistics in a Complex World (pp. 153–188). De Gruyter. https://doi.org/10.1515/9783111017433-008
-
Scaling Native Language Identification with Transformer Adapters 5th International Conference on Natural Language and Speech Processing (ICNLSP), Trento. https://doi.org/10.48550/arXiv.2211.10117
-
Complementing Kernel Density Estimation and Topic Modelling to Visualise Political Discourse (J. H. Jantunen & et al, Eds.; pp. 12–27). University of Jyväskylä. https://jyx.jyu.fi/handle/123456789/84140
-
Assessing How Attitudes to Migration in Social Media Complement Public Attitudes Found in Opinion Surveys SPELL: Swiss Papers in English Language and Literature, 41, 119–153. https://doi.org/10.33675/SPELL/2022/41/10
-
Systematically Detecting Patterns of Social, Historical and Linguistic Change: The Framing of Poverty in Times of Poverty Transactions of the Philological Society, 120, 447–473. https://doi.org/10.1111/1467-968X.12252
-
Hypothesis Engineering for Zero-Shot Hate Speech Detection 75–90. https://aclanthology.org/2022.trac-1.10
-
Comparing the coverage of the “marriage for all” vote on Twitter and in the newspapers (I. Rehbein, G. Lapesa, C. Klamm, & S. Ponzetto, Eds.; pp. 55–62). CPSS. https://old.gscl.org/media/pages/arbeitskreise/cpss/cpss-2022/workshop-programme-2022/254133848-1662996927/cpss-2022-proceedings.pdf
-
Challenges and best practices for digital unstructured data enrichment in health research: a systematic narrative review (No. 22278137; MedRxiv). https://doi.org/10.1101/2022.07.28.22278137
-
Correlations and predictions of reading times using language models and surprisal In M. Krug, O. Schützler, F. Vetter, & V. Werner (Eds.), Perspectives on Contemporary English : Structure, Variation, Cognition (pp. 209–243). Peter Lang. https://doi.org/10.3726/b19739
-
Medical topics and style from 1500 to 2018 In T. Hiltunen & I. Taavitsainen (Eds.), Corpus pragmatic studies on the history of medical discourse (pp. 49–78). Benjamins. https://doi.org/10.1075/pbns.330.03sch
-
Comparing data-driven to corpus-based approaches for diachronic variation: document-classification and overuse metrics In J. Schlüter & O. Schützler (Eds.), Data and Methods in Corpus Linguistics: Comparative Approaches (pp. 291–322). Cambridge University Press.
-
Recent changes in spoken British English according to spoken BNC2014 In S. Flach & M. Hilpert (Eds.), Broadening the spectrum of corpus linguistics: New approaches to variability and change (No. 105; pp. 173–195). John Benjamins Publishing. https://doi.org/10.1075/scl.105.06sch
-
Syntactic changes in verbal clauses and noun phrases from 1500 onwards In B. Los, C. Cowie, & P. Honeybone (Eds.), English Historical Linguistics: Change in Structure and Meaning (No. 361; pp. 163–200). John Benjamins Publishing. https://doi.org/10.1075/cilt.358.07sch
-
Measuring Attitudes to Migration in the Media automatically with Complementary Data Sources and Methods In P. Ronan & E. Ziegler (Eds.), Approaches to Migration and Language Identity (pp. 207–252). Peter Lang. https://www.peterlang.com/document/1183598
-
With a little help from familiar interlocutors: real-world language use in young and older adults Aging & Mental Health, 25, 2310–2319. https://doi.org/10.1080/13607863.2020.1822288
-
Changes in society and language: charting poverty In P. Rautinaho, A. Nurmi, & J. Klemola (Eds.), Corpora and the changing society: studies in the evolution of English (No. 96; pp. 29–56). John Benjamins Publishing. https://doi.org/10.1075/scl.96.02sch
-
Using Multilingual Resources to Evaluate CEFRLex for Learner Applications 346–355. https://www.aclweb.org/anthology/2020.lrec-1.43.pdf
-
Spelling normalisation of Late Modern English: comparison and combination of VARD and character-based statistical machine translation In M. Kytö & E. Smitterberg (Eds.), Late Modern English: novel encounters (No. 214; pp. 243–268). John Benjamins Publishing. https://doi.org/10.1075/slcs.214.11sch
-
A Man who Was Just an Incredible Man, an Incredible Man: Age Factors and Coherence in Donald Trump’s Spontaneous Speech In U. Schneider & M. Eitelmann (Eds.), Linguistic Inquiries into Donald Trump’s Language : From ‘Fake News’ to ‘Tremendous Success’ (pp. 62–84). Bloomsbury. https://doi.org/10.5040/9781350115545.0009
-
Statistics for Linguists: A patient, slow-paced introduction to statistics and to the programming language R Digitale Lehre und Forschung UZH. https://dlf.uzh.ch/openbooks/statisticsforlinguists/
-
Cognitive Aging Effects on Language Use in Real-Life Contexts: A Naturalistic Observation Study The 41st Annual Meeting of the Cognitive Science Society, Montreal.
-
Topics of eighteenth-century medical writing with triangulation of methods: LMEMT and the underlying reality In I. Taavitsainen & T. Hiltunen (Eds.), Late Modern English medical texts: writing medicine in the eighteenth century (Including the LMEMT Corpus) (pp. 31–74). John Benjamins Publishing. https://doi.org/10.1075/z.221.03taa
-
Statistical MWE-aware parsing In Y. Parmentier & J. Waszczuk (Eds.), Representation and parsing of multiword expressions: current trends (No. 3; pp. 147–182). Language Science Press. https://doi.org/10.5281/zenodo.2579043
-
Scholastic argumentation in Early English medical writing and its afterlife: new corpus evidence In C. Suhr, T. Nevalianen, & I. Taavitsainen (Eds.), From data to evidence in English language research (Vol. 83, pp. 191–221). Brill. https://doi.org/10.1163/9789004390652_010
-
NLP Corpus Observatory – Looking for Constellations in Parallel Corpora to Improve Learners’ Collocational Skills 69–78. https://spraakbanken.gu.se/eng/icall/7th-nlp4call#prog
-
Detecting innovations in a parsed corpus of learner English In S. C. Deshors, S. Götz, & S. Laporte (Eds.), Rethinking linguistic creativity in non-native Englishes (No. 98; pp. 47–74). John Benjamins Publishing. https://doi.org/10.1075/bct.98.03sch
-
Differences between Swiss High German and German German via data-driven methods (M. Cieliebak & F. Benites, Eds.).
-
Differences between Swiss High German and German High German via data-driven methods In M. Cieliebak, D. Tuggener, & F. Benites (Eds.), CEUR Workshop Proceedings (No. 2226; pp. 17–25). CEUR-WS. http://ceur-ws.org/Vol-2226/
-
From Lexical Bundles to Surprisal and Language Models: measuring the idiom principle on native and learner language In J. Kopaczyk & J. Tyrkkö (Eds.), Applications of Pattern-driven Methods in Corpus Linguistics (No. 82; Vol. 82, pp. 15–56). Benjamins. https://doi.org/10.1075/scl.82.02sch
-
Tools and Methods for Processing and Visualizing Large Corpora Studies in Variation, Contacts and Change in English, 19, online. http://www.helsinki.fi/varieng/series/volumes/19/schneider_el-assady_lehmann/
-
Measuring Encoding Efficiency in Swedish and English Language Learner Speech Production 1779–1783. https://doi.org/10.21437/Interspeech.2017-337
-
Saying Whatever It Takes: Creating and Analyzing Corpora from US Presidential Debate Transcripts 537–544. http://paulslals.org.uk/ccr/CL2017ExtendedAbstracts.pdf
-
Crossing the Border Twice: Reimporting Prepositions to Alleviate L1-Specific Transfer Errors Linköping Electronic Conference Proceedings, 18–26. http://www.ep.liu.se/ecp/article.asp?issue=134&article=003
-
Comparing Rule-based and SMT-based Spelling Normalisation for English Historical Texts (No. 133). 40–46. http://www.ep.liu.se/ecp/article.asp?issue=133&article=008&volume=#
-
Part-Of-Speech in Historical Corpora: Tagger Evaluation and Ensemble Systems on ARCHER KONVENS 2016, Bochum.
-
Detecting innovations in a parsed corpus of learner english International Journal of Learner Corpus Research, 2, 177–204. https://doi.org/10.1075/ijlcr.2.2.03sch
-
Review of Automatic Treatment of Learner Corpus Data, Ana Diaz Negrillo, Nicolas Ballier and Paul Thompson, eds. (2013) International Journal of Learner Corpus Research, 172–177. https://benjamins.com/#catalog/journals/ijlcr.1.1.07sch/details
-
Determining light verb constructions in contemporary British and Irish English International Journal of Corpus Linguistics, 20, 326–354. https://doi.org/10.1075/ijcl.20.3.03ron
-
Parsing early and late modern English corpora Literary and Linguistic Computing, 30, 423–439. https://doi.org/10.1093/llc/fqu001
-
Automated Media Content Analysis from the Perspective of Computational Linguistics In K. Sommer, M. Wettstein, W. Wirth, & J. Matthes (Eds.), Automatisierung in der Inhaltsanalyse (pp. 40–54). Herbert von Halem Verlag.
-
Measuring the public accountability of new modes of governance 38–43. http://www.aclweb.org/anthology/W14-2512
-
Applying Computational Linguistics and Language Models: From Descriptive Linguistics to Text Mining and Psycholinguistics (Habilitation, University of Zurich) https://doi.org/10.5167/uzh-108379
-
ODIN: a customizable literature curation tool 1, 219–223. http://www.biocreative.org/media/store/files/2013/bc4_v1_30.pdf
-
Of-genitive versus s-genitive: A corpus-based analysis of possessive constructions in 20thcentury English In P. Bennett, M. Durrell, S. Scheible, & R. J. Whitt (Eds.), New Methods in Historical Corpora (pp. 163–180). Narr Verlag.
-
Exploiting Synergies Between Open Resources for German Dependency Parsing, POS-tagging, and Morphological Analysis 601–609. http://www.aclweb.org/anthology/R/R13/R13-1079.pdf
-
UZH in BioNLP 2013 116–120. http://www.aclweb.org/anthology/W13-2016
-
Using the OntoGene pipeline for the triage task of BioCreative 2012 Database, 2013, bas053. https://doi.org/10.1093/database/bas053
-
Investigating Irish English With ICE-Ireland Cahiers de l’institut de Linguistique et Des Sciences Du Langage, 38, 137–162. https://doi.org/10.26034/la.cdclsl.2013.749
-
Notes about the OntoGene pipeline AAAI-2012 Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text, Arlington. http://www.aaai.org/Symposia/Fall/fss12symposia.php
-
Using syntax features and document discourse for relation extraction on PharmGKB and CTD 52–57. https://doi.org/10.5167/uzh-64476
-
Dependency parsing for interaction detection in pharmacogenomics Proceedings of LREC 2012: The Eighth International Conference on Language Resources and Evaluation. LREC 2012: The eighth international conference on Language Resources and Evaluation, Istanbul.
-
Using semantic resources to improve a syntactic dependency parser (V. Barbu Mititlu, O. Popescu, & V. Pekar, Eds.; pp. 67–76). http://www.lrec-conf.org/proceedings/lrec2012/workshops/10.Semantic%20Relations%20II%20Proceedings.pdf
-
Dependency bank 23–28. http://www.lrec-conf.org/proceedings/lrec2012/workshops/05.CMLC-Proceedings.pdf
-
Relation Mining Experiments in the Pharmacogenomics Domain Journal of Biomedical Informatics, 45, 851–861. https://doi.org/10.1016/j.jbi.2012.04.014
-
Adapting a parser to historical English 10. http://www.helsinki.fi/varieng/journal/volumes/10/schneider/
-
Using automatically parsed corpora to discover lexico-grammatical features of English varieties 251–258. http://infolingu.univ-mlv.fr/Colloques/lgc/index.php?year=2011&lang=en&page=1
-
Detection of interaction articles and experimental methods in biomedical literature BMC Bioinformatics, 12, S13. https://doi.org/10.1186/1471-2105-12-S8-S13
-
Text-Mining-Methoden im Semantic Web Wirtschaftsinformatik und Management, 3, 28–35. http://www.wirtschaftsinformatik.de/index.php;do=show/site=wi/sid=d7e6638e9c1f514de6389e5cb0c4b23e/alloc=12/id=2895
-
A large-scale investigation of verb-attached prepositional phrases Methodological and Historical Dimensions of Corpus Linguistics, 6. http://www.helsinki.fi/varieng/journal/volumes/06/lehmann_schneider/
-
A data-driven approach to alternations based on protein-protein interactions 597–607. http://www.upv.es/pls/obib/sic_publ.FichPublica?P_ARM=6032
-
OntoGene (Team 65): preliminary analysis of participation in BioCreative III BioCreative III workshop, Bethesda. http://www.biocreative.org/events/biocreative-iii/
-
OntoGene in BioCreative II.5 IEEE - ACM Transactions on Computational Biology and Bioinformatics, 7, 472–480. https://doi.org/10.1109/TCBB.2010.50
-
Text Mining Methoden im Semantic Web HMD Praxis der Wirtschaftsinformatik, 35–46. http://hmd.dpunkt.de/271/index.html
-
Using a parser as a heuristic tool for the description of New Englishes online. http://www.liv.ac.uk/english/CL2009/index.htm
-
UZurich in the BioNLP 2009 Shared Task 28–36. http://www.aclweb.org/anthology/W/W09/W09-1400.pdf
-
Detecting protein-protein interactions in biomedical texts using a parser and linguistic resources In A. Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing (No. 5449; pp. 406–417). Springer. https://doi.org/10.1007/978-3-642-00382-0_33
-
Detecting Protein-Protein Interactions in Biomedical Literature Using a Parser In S. Clematide, M. Klenner, & M. Volk (Eds.), Searching Answers (pp. 109–118). MV Verlag.
-
Parser-based analysis of syntax-lexis interactions In A. H. Jucker, D. Schreier, & M. Hundt (Eds.), Corpora: Pragmatics and Discourse (No. 68; pp. 477–502). Rodopi. http://www.rodopi.nl/senj.asp?BookId=LC+68
-
A New Hybrid Dependency Parser for German In C. Chiarcos, R. E. de Castilho, & M. Stede (Eds.), Von der Form zur Bedeutung: Texte automatisch verarbeiten / From Form to Meaning: Processing Texts Automatically. Proceedings of the Biennial GSCL Conference 2009 (pp. 115–124). Narr.
-
Hybrid long-distance functional dependency parsing (Dissertation, University of Zurich) https://doi.org/10.5167/uzh-7188
-
A Broad-Coverage, Representationally Minimalist LFG Parser: Chunks and F-Structures Are Enough LFG05, Bergen. http://cslipublications.stanford.edu/LFG/10/lfg05schneider.pdf
My Recent Publications related to the English Department (ZORA)
ZORA Publikationsliste
Download-Optionen
Publikationen
-
Digital Dickens: An automated content analysis of Charles Dickens’ novels In S. Buschfeld, P. Ronan, T. Neumaier, A. Wellinghoff, & L. Westermayer (Eds.), Crossing Boundaries through Corpora: Innovative corpus approaches within and beyond linguistics (pp. 62–98). John Benjamins Publishing. https://doi.org/10.1075/scl.119
-
Automatically detecting directives with SPICE Ireland In M. Schweinberger & P. Ronan (Eds.), Socio-Pragmatic Variation in Ireland: Using Pragmatic Variation to Construct Social Identities (No. 378; pp. 205–234). De Gruyter. https://doi.org/10.1515/9783110791457-011
-
The Visualisation and Evaluation of Semantic and Conceptual Maps In M. Laitinen & J. Tyrkkö (Eds.), Linguistics across Disciplinary Borders: The March of Data (pp. 67–94). Bloomsbury Publishing. https://doi.org/10.5040/9781350362291.0009
-
Investigating child language acquisition from a joint perspective: A comparison of traditional and new L1 speakers of English In M. Schmalz, M. Vida-Mannl, & S. Buschfeld (Eds.), Acquisition and Variation in World Englishes: Bridging Paradigms and Rethinking Approaches (No. 69; pp. 133–157). De Gruyter. https://doi.org/10.1515/9783110733723-007
-
“To boldly go where no man has gone before”: how iconic is the Star Trek split infinitive? Linguistics Vanguard, 9, 247–255. https://doi.org/10.1515/lingvan-2022-0168
-
Colloquialisation, compression and democratisation in British parliamentary debates In M. Korhonen, H. Kotze, & J. Tyrkkö (Eds.), Exploring Language and Society with Big Data: Parliamentary discourse across time and space (pp. 336–372). John Benjamins Publishing. https://doi.org/10.1075/scl.111.12sch
-
Differences in syntactic annotation affect retrieval International Journal of Corpus Linguistics, 28, 378–406. https://doi.org/10.1075/ijcl.21104.zeh
-
Detecting and Analysing Learner Difficulties Using a Learner Corpus Without Error Tagging In K. Harrington & P. Ronan (Eds.), Demystifying Corpus Linguistics for English Language Teaching (pp. 229–257). Palgrave Macmillan. https://doi.org/10.1007/978-3-031-11220-1_12
-
Replicable semi-supervised approaches to state-of-the-art stance detection of tweets Information Processing & Management, 60, 103199. https://doi.org/10.1016/j.ipm.2022.103199
-
Assessing How Attitudes to Migration in Social Media Complement Public Attitudes Found in Opinion Surveys SPELL: Swiss Papers in English Language and Literature, 41, 119–153. https://doi.org/10.33675/SPELL/2022/41/10
-
Systematically Detecting Patterns of Social, Historical and Linguistic Change: The Framing of Poverty in Times of Poverty Transactions of the Philological Society, 120, 447–473. https://doi.org/10.1111/1467-968X.12252
-
Medical topics and style from 1500 to 2018 In T. Hiltunen & I. Taavitsainen (Eds.), Corpus pragmatic studies on the history of medical discourse (pp. 49–78). Benjamins. https://doi.org/10.1075/pbns.330.03sch
-
Comparing data-driven to corpus-based approaches for diachronic variation: document-classification and overuse metrics In J. Schlüter & O. Schützler (Eds.), Data and Methods in Corpus Linguistics: Comparative Approaches (pp. 291–322). Cambridge University Press.
-
Recent changes in spoken British English according to spoken BNC2014 In S. Flach & M. Hilpert (Eds.), Broadening the spectrum of corpus linguistics: New approaches to variability and change (No. 105; pp. 173–195). John Benjamins Publishing. https://doi.org/10.1075/scl.105.06sch
-
Syntactic changes in verbal clauses and noun phrases from 1500 onwards In B. Los, C. Cowie, & P. Honeybone (Eds.), English Historical Linguistics: Change in Structure and Meaning (No. 361; pp. 163–200). John Benjamins Publishing. https://doi.org/10.1075/cilt.358.07sch
-
Measuring Attitudes to Migration in the Media automatically with Complementary Data Sources and Methods In P. Ronan & E. Ziegler (Eds.), Approaches to Migration and Language Identity (pp. 207–252). Peter Lang. https://www.peterlang.com/document/1183598
-
With a little help from familiar interlocutors: real-world language use in young and older adults Aging & Mental Health, 25, 2310–2319. https://doi.org/10.1080/13607863.2020.1822288
-
Pluralized non-count nouns across Englishes: a corpus-linguistic approach to dialect typology Corpus Linguistics and Linguistic Theory, 16, 515–546. https://doi.org/10.1515/cllt-2018-0068
-
Changes in society and language: charting poverty In P. Rautinaho, A. Nurmi, & J. Klemola (Eds.), Corpora and the changing society: studies in the evolution of English (No. 96; pp. 29–56). John Benjamins Publishing. https://doi.org/10.1075/scl.96.02sch
-
Using Multilingual Resources to Evaluate CEFRLex for Learner Applications 346–355. https://www.aclweb.org/anthology/2020.lrec-1.43.pdf
-
Spelling normalisation of Late Modern English: comparison and combination of VARD and character-based statistical machine translation In M. Kytö & E. Smitterberg (Eds.), Late Modern English: novel encounters (No. 214; pp. 243–268). John Benjamins Publishing. https://doi.org/10.1075/slcs.214.11sch
-
A Man who Was Just an Incredible Man, an Incredible Man: Age Factors and Coherence in Donald Trump’s Spontaneous Speech In U. Schneider & M. Eitelmann (Eds.), Linguistic Inquiries into Donald Trump’s Language : From ‘Fake News’ to ‘Tremendous Success’ (pp. 62–84). Bloomsbury. https://doi.org/10.5040/9781350115545.0009
-
Statistics for Linguists: A patient, slow-paced introduction to statistics and to the programming language R Digitale Lehre und Forschung UZH. https://dlf.uzh.ch/openbooks/statisticsforlinguists/
-
Enhancing the linguistic discovery potential of historical corpora: a twin-track approach using ARCHER CL 2019 International Corpus Linguistics Conference, Cardiff. https://eprints.lancs.ac.uk/id/eprint/135949/1/CL2019_ARCHER_annotation_abstract.pdf
-
Topics of eighteenth-century medical writing with triangulation of methods: LMEMT and the underlying reality In I. Taavitsainen & T. Hiltunen (Eds.), Late Modern English medical texts: writing medicine in the eighteenth century (Including the LMEMT Corpus) (pp. 31–74). John Benjamins Publishing. https://doi.org/10.1075/z.221.03taa
-
Statistical MWE-aware parsing In Y. Parmentier & J. Waszczuk (Eds.), Representation and parsing of multiword expressions: current trends (No. 3; pp. 147–182). Language Science Press. https://doi.org/10.5281/zenodo.2579043
-
Scholastic argumentation in Early English medical writing and its afterlife: new corpus evidence In C. Suhr, T. Nevalianen, & I. Taavitsainen (Eds.), From data to evidence in English language research (Vol. 83, pp. 191–221). Brill. https://doi.org/10.1163/9789004390652_010
-
NLP Corpus Observatory – Looking for Constellations in Parallel Corpora to Improve Learners’ Collocational Skills 69–78. https://spraakbanken.gu.se/eng/icall/7th-nlp4call#prog
-
Detecting innovations in a parsed corpus of learner English In S. C. Deshors, S. Götz, & S. Laporte (Eds.), Rethinking linguistic creativity in non-native Englishes (No. 98; pp. 47–74). John Benjamins Publishing. https://doi.org/10.1075/bct.98.03sch
-
Differences between Swiss High German and German German via data-driven methods (M. Cieliebak & F. Benites, Eds.).
-
Differences between Swiss High German and German High German via data-driven methods In M. Cieliebak, D. Tuggener, & F. Benites (Eds.), CEUR Workshop Proceedings (No. 2226; pp. 17–25). CEUR-WS. http://ceur-ws.org/Vol-2226/
-
From Lexical Bundles to Surprisal and Language Models: measuring the idiom principle on native and learner language In J. Kopaczyk & J. Tyrkkö (Eds.), Applications of Pattern-driven Methods in Corpus Linguistics (No. 82; Vol. 82, pp. 15–56). Benjamins. https://doi.org/10.1075/scl.82.02sch
-
Tools and Methods for Processing and Visualizing Large Corpora Studies in Variation, Contacts and Change in English, 19, online. http://www.helsinki.fi/varieng/series/volumes/19/schneider_el-assady_lehmann/
-
Measuring Encoding Efficiency in Swedish and English Language Learner Speech Production 1779–1783. https://doi.org/10.21437/Interspeech.2017-337
-
Saying Whatever It Takes: Creating and Analyzing Corpora from US Presidential Debate Transcripts 537–544. http://paulslals.org.uk/ccr/CL2017ExtendedAbstracts.pdf
-
Comparing Rule-based and SMT-based Spelling Normalisation for English Historical Texts (No. 133). 40–46. http://www.ep.liu.se/ecp/article.asp?issue=133&article=008&volume=#
-
Part-Of-Speech in Historical Corpora: Tagger Evaluation and Ensemble Systems on ARCHER KONVENS 2016, Bochum.
-
The use of the be-passive in academic Englishes: local versus global usage in an international language Corpora, 11, 31–63. https://doi.org/10.3366/cor.2016.0084
-
Detecting innovations in a parsed corpus of learner english International Journal of Learner Corpus Research, 2, 177–204. https://doi.org/10.1075/ijlcr.2.2.03sch
-
Review of Automatic Treatment of Learner Corpus Data, Ana Diaz Negrillo, Nicolas Ballier and Paul Thompson, eds. (2013) International Journal of Learner Corpus Research, 172–177. https://benjamins.com/#catalog/journals/ijlcr.1.1.07sch/details
-
Determining light verb constructions in contemporary British and Irish English International Journal of Corpus Linguistics, 20, 326–354. https://doi.org/10.1075/ijcl.20.3.03ron
-
Parsing early and late modern English corpora Literary and Linguistic Computing, 30, 423–439. https://doi.org/10.1093/llc/fqu001
-
Of-genitive versus s-genitive: A corpus-based analysis of possessive constructions in 20thcentury English In P. Bennett, M. Durrell, S. Scheible, & R. J. Whitt (Eds.), New Methods in Historical Corpora (pp. 163–180). Narr Verlag.
-
Investigating Irish English With ICE-Ireland Cahiers de l’institut de Linguistique et Des Sciences Du Langage, 38, 137–162. https://doi.org/10.26034/la.cdclsl.2013.749
-
Discovering new verb-preposition combinations in New Englishes Studies in Variation, Contacts and Change in English, 13, online. http://www.helsinki.fi/varieng/series/volumes/13/schneider_zipp/
-
Using semantic resources to improve a syntactic dependency parser (V. Barbu Mititlu, O. Popescu, & V. Pekar, Eds.; pp. 67–76). http://www.lrec-conf.org/proceedings/lrec2012/workshops/10.Semantic%20Relations%20II%20Proceedings.pdf
-
Dependency bank 23–28. http://www.lrec-conf.org/proceedings/lrec2012/workshops/05.CMLC-Proceedings.pdf
-
Semantic corpus trawling: Expressions of “courtesy” and “politeness” in the Helsinki Corpus In C. Suhr & I. Taavitsainen (Eds.), Developing Corpus Methodology for Historical Pragmatics (No. 11; pp. 1–1). Research Unit for Variation, Contacts and Change in English. http://www.helsinki.fi/varieng/journal/volumes/11/jucker_taavitsainen_schneider/
-
BNC Dependency Bank 1.0 In S. Oksefjell, J. Ebeling, & H. Hasselgard (Eds.), Aspects of corpus linguistics: compilation, annotation, analysis (No. 12; p. online). Research Unit for Variation, Contacts, and Change in English. http://www.helsinki.fi/varieng/journal/volumes/12/lehmann_schneider/
-
Relative complexity in scientific discourse English Language and Linguistics, 16, 209–240. https://doi.org/10.1017/S1360674312000032
-
Retrieving relatives from historical data Literary and Linguistic Computing, 27, 3–16. https://doi.org/10.1093/llc/fqr049
-
Adapting a parser to historical English 10. http://www.helsinki.fi/varieng/journal/volumes/10/schneider/
-
Using automatically parsed corpora to discover lexico-grammatical features of English varieties 251–258. http://infolingu.univ-mlv.fr/Colloques/lgc/index.php?year=2011&lang=en&page=1
-
Detection of interaction articles and experimental methods in biomedical literature BMC Bioinformatics, 12, S13. https://doi.org/10.1186/1471-2105-12-S8-S13
-
Text-Mining-Methoden im Semantic Web Wirtschaftsinformatik und Management, 3, 28–35. http://www.wirtschaftsinformatik.de/index.php;do=show/site=wi/sid=d7e6638e9c1f514de6389e5cb0c4b23e/alloc=12/id=2895
-
A large-scale investigation of verb-attached prepositional phrases Methodological and Historical Dimensions of Corpus Linguistics, 6. http://www.helsinki.fi/varieng/journal/volumes/06/lehmann_schneider/
-
A data-driven approach to alternations based on protein-protein interactions 597–607. http://www.upv.es/pls/obib/sic_publ.FichPublica?P_ARM=6032
-
OntoGene (Team 65): preliminary analysis of participation in BioCreative III BioCreative III workshop, Bethesda. http://www.biocreative.org/events/biocreative-iii/
-
OntoGene in BioCreative II.5 IEEE - ACM Transactions on Computational Biology and Bioinformatics, 7, 472–480. https://doi.org/10.1109/TCBB.2010.50
-
Text Mining Methoden im Semantic Web HMD Praxis der Wirtschaftsinformatik, 35–46. http://hmd.dpunkt.de/271/index.html
-
Using a parser as a heuristic tool for the description of New Englishes online. http://www.liv.ac.uk/english/CL2009/index.htm
-
Multi-verbal expressions of ‘giving’ in Old English and Old Irish 116. http://ucrel.lancs.ac.uk/publications/cl2009/
-
UZurich in the BioNLP 2009 Shared Task 28–36. http://www.aclweb.org/anthology/W/W09/W09-1400.pdf
-
Detecting protein-protein interactions in biomedical texts using a parser and linguistic resources In A. Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing (No. 5449; pp. 406–417). Springer. https://doi.org/10.1007/978-3-642-00382-0_33
-
Detecting Protein-Protein Interactions in Biomedical Literature Using a Parser In S. Clematide, M. Klenner, & M. Volk (Eds.), Searching Answers (pp. 109–118). MV Verlag.
-
Parser-based analysis of syntax-lexis interactions In A. H. Jucker, D. Schreier, & M. Hundt (Eds.), Corpora: Pragmatics and Discourse (No. 68; pp. 477–502). Rodopi. http://www.rodopi.nl/senj.asp?BookId=LC+68
-
A New Hybrid Dependency Parser for German In C. Chiarcos, R. E. de Castilho, & M. Stede (Eds.), Von der Form zur Bedeutung: Texte automatisch verarbeiten / From Form to Meaning: Processing Texts Automatically. Proceedings of the Biennial GSCL Conference 2009 (pp. 115–124). Narr.
-
Fishing for compliments: precision and recall in corpus-linguistic compliment research In A. H. Jucker & I. Taavitsainen (Eds.), Speech acts in the history of English (No. 176; pp. 273–294). John Benjamins. http://www.benjamins.com/cgi-bin/t_bookview.cgi?bookid=P%26bns_176
-
A Broad-Coverage, Representationally Minimalist LFG Parser: Chunks and F-Structures Are Enough LFG05, Bergen. http://cslipublications.stanford.edu/LFG/10/lfg05schneider.pdf
Research Interests
My research interests iclude
- Natural Language Processing (NLP)
- Corpus Linguistics
- Robust Fast Broad-Coverage Parsing
- Dependency Grammar
- Text Mining, Information Extraction
- Semantic Web
- Information Retrieval
- BioMedical Parsing Applications
- Automated Media Content Analysis
- Formal Grammar
My interests also include UNIX and Mac OS X system administration, Prolog and Perl programming, desktop publishing, travelling, literature, jogging and cycling. I have taught Prolog, theoretical computing science, and semantic web at Fernfachhochschule Schweiz (Swiss distance learning UAS). I have taught Prolog and Perl at the CL department of the University of Geneva.
Dependency Grammar and Robust Parsing
I have written a low-complexity, broad-coverage probabilistic Dependency Parser for English, Pro3Gres, as part of my doctoral thesis.
I have written my Master's Paper on Dependency Grammar and the partly dependency-based Link Grammar. I am currently developing Pro3Gres: a robust, probabilistic parser for a Dependency Grammar. In winter 2003/2004 and winter 2005/2006 I am teaching Dependency Grammar Parsing. In winter 2006/2007/2014 I am teaching Parsing Technology.
Corpus Linguistics
Both the English Seminar and the Department of Computational Linguistics have a long tradition in Corpus Linguistics research. I am a member of the Archer consortium. At the English Department, I am involved in the compilation of and web interface access to several corpora. In summer 2003, I teach a seminar on Corpus Linguistics. In summer 2006, I teach a colloquium on Corpus Linguistics. In spring 2008, I teach a lecture on Corpus Linguistics, together with Fabio Rinaldi. In spring 2008, I teach the workshop at the ICAME conference, together with Hans Martin Lehmann and Nelleke Oostdjik. In autumn 2012, I teach a BA seminar on Corpus Linguistics.
BioMedical Parsing and Relation Finding
Our research on an important application of my high-precision robust parser has started in 2005, and is an NFS project from 2008 to 2013. OntoGene: Relation Finding in the BioMedical domain.
Automated Media Content Analysis
We are using parsing and Opinion Mining in Automated Media Content Analysis projects. I am leader of subproject I.6 in the Swiss NCCR democracy project and part of the scientific network of the European ERC project POLCON.
Information Retrieval
From 2000 to 2004, I have worked in an unsupervised text classification project at the CL department of the University of Geneva
Question Answering
From 1999 to 2000 I have worked in the ExtrAns Project in Zurich.
Formal Grammars
Since the winter term 1999/2000 I sometimes teach the syntax course of the Zurich CL curriculum. We focus on GB, LFG and HPSG.