Doing research with text corpora
136038 UE 2025S
Vortragende der Germanistik:
Nächster Termin
Mittwoch, 07.05.2025 09:45-13:00 Hörsaal 5 Hauptgebäude, Tiefparterre Stiege 9 Hof 5
Ziele, Inhalte und Methode der Lehrveranstaltung
The course introduces students to the study of text corpora. A corpus, in its broadest sense, is a structured collection of texts. In modern usage, this usually refers to a digital text collection that is annotated with respect to a pre-defined set of analytically relevant features. Although the systematic study of machine-readable text corpora as an empirically based method has mostly been developed within the field of linguistics, text corpora can be useful for investigating all sorts of research questions within the Digital Humanities.
The course will first introduce students to a number of corpora that are available online. Students will learn to apply various browser-based search and analysis tools. Next, students will learn how to compile and annotate their own corpus using tools based in machine-learning. Students will become acquainted with the various formats that different corpora are encoded in, with a particular focus on XML formats. Students will learn how to apply various methods for analysing corpus-derived data, including basic statistical testing, regression modelling, and sentiment analysis. They will also learn how to visualize their results and present their research in the form of a poster or oral presentation.
Art der Leistungskontrolle und erlaubte Hilfsmittel
Attendance and participation in class
Home exercises and assignments
Oral or poster presentation
Written project portfolio
Literatur
Braun, Christian & Elisabeth Scherr (eds.). 2023. Methoden zur Erforschung grammatischer Strukturen in historischen Quellen: Vom Einzelfall zum System. De Gruyter. https://doi.org/10.1515/9783110784282
Gillings, Mathew, Gerlinde Mautner & Paul Baker. 2023. Corpus-assisted discourse studies. Cambridge University Press. https://doi.org/10.1017/9781009168144
Gries, Stefan T. (2021). Statistics for linguistics with R: A practical introduction (Third edition). De Gruyter.
Levshina, Natalia. (2015). How to do linguistics with R. Data exploration and statistical analysis. John Benjamins.
McEnery, Tony & Wilson, Andrew. (2022). Corpus linguistics. Edinburgh University Press.
McDonnell, Duncan & Ondelli, Stefano. (2022). The language of right-wing populist leaders: Not so simple. Perspectives on Politics, 20(3), 828–841. https://doi.org/10.1017/S1537592720002418
Meyer, Charles F. (2023). English corpus linguistics: An introduction (Second edition). Cambridge University Press.
Winter, Bodo. (2019). Statistics for linguists: An introduction using R. Routledge.
Prüfungsstoff
There is no exam for the course. In addition to regular reading assigments, the grade will mostly be based on a research project showcasing the skills taught in class.
Mindestanforderungen und Beurteilungsmaßstab
Assessment will be based on:
regular attendance and participation (20%)
home exercises and assignments (20%)
oral or poster presentation (30%)
written project portfolio (30%)
Abkürzungen: ÄdL: Ältere deutsche Sprache und Literatur – DaF/Z: Deutsch als Fremd- und Zweitsprache – FD: Fachdidaktik Deutsch – NdL: Neuere deutsche Literatur – SpraWi: Sprachwissenschaft