Quantitative Approaches to Style: Literary Texts and Computer Stylometry 3003-C4TN-JK1

The course covers the following topics:
1. basic concepts of linguistic statistics.
Students will learn the concepts of quantitative methods of linguistics.
2. corpus construction.
Work with the https://korpusomat.eu program. How do you prepare and clean the corpus, import, process, and download the corpus? Format of the output data. Corpus queries.
3. Analysis of corpus data.
Work with ready-to-use corpora (KWJP) and corpora compiled by participants.
4. introduction to RStudio and the R language.
5. What is stylometry, and what are its methods?
This module is a brief overview of stylometry, its problems and applications.
6. stylometric discriminators.
What is a stylometric discriminator? An overview of discriminators and information they carry.
7. large-scale stylometry.
The module introduces the concepts of distant reading, large-scale stylometry, and large-scale stylometric methods. It also includes an introduction to the stylo package and the interpretation of analyses conducted with this set of linguistic tools.
8. preparation and stylometric analysis of the corpus; visualisation of data.
Students work in teams, preparing a corpus, analysing its results according to selected indicators, and preparing a presentation of the principles of their analysis. The module also introduces students to data visualisation.
9 Presentation of results.

Participants perform tasks on the Campus platform and actively participate in compiling the thematic/genre/author corpus, conducting its stylometric analysis, and presenting the results. Participants complete (no more than 2) team assignments during the semester.

Type of course

elective courses

Mode

Classroom

Prerequisites (description)

The course is an introduction to large-scale stylometry and computer methods of stylometry. It includes ongoing assignments on the Campus platform, no more than two team assignments during the semester, and a (team) assignment that allows you to create a thematic/genre/author corpus, conduct its stylometric analysis, and report on the results. This course also provides the opportunity to work with RStudio, a widely used program in the field, and to delve into the fascinating world of corpus creation and stylometric analysis. The estimated student workload is 30h+60h = 90h. Due to the number of computers in the computer laboratory, the number of participants is limited to ten. The course introduces students to the basics of quantitative methods in linguistics, basic concepts of stylistics and basic methods of stylometry; participants will also learn how to create corpora for stylistic research and conduct primary stylistic analyses and tests on these corpora. We will use the RStudio program for coding and the R programming language for calculations. Knowledge of simple Python scripts and console mode commands is not required, but may be helpful. Students must be able to install the software and necessary packages themselves, run the console window on the Windows operating system (or any other operating system they use), save completed tasks, and upload them to the Campus platform. Access to the computer with R and RStudio is needed to complete assignments and the final project.

Course coordinators

Marcin Będkowski
Jarosław Łachnik
Magdalena Sykurska-Derwojed

Learning outcomes

Upon completion of the course, the student:
— knows what a language corpus is, particularly one for stylistic research.
— knows how a corpus for stylistic research differs from corpora for other research.
— knows how to create a suitable corpus using the available tools.
— understands and interprets the results of calculations made with the available tools.
— distinguishes between basic types of data and knows how to select appropriate tests for their use.
— knows and understands basic stylometric tests and can carry them out.
— can use the programs' documentation.
— knows the basics of the R language for stylometric analysis.
— knows and understands the limitations of the methods and data types used.
— knows how to design stylometric tests, carry them out, and report the results.
— has a basic knowledge of the visualisation of data of different types.

Assessment criteria

1. completion of assignments and running tests in the classroom and on the platform: 20%.
2. team projects during class (no more than 2): 30%.
3. final team project: 50%.

Absences
1. A student is entitled to two unexcused absences per semester.
2. If a student has more than two unexcused absences, he/she fails the course.
3. If a student wishes to excuse absences, he/she must document objective reasons for the absences within one week (e.g. a medical release).
4. the instructor indicates how the student makes up the excess excused

Use of AI tools:
1. If a student wishes to use artificial intelligence tools for a credit (final) assignment or a partial assignment, he/she must:
a. obtain the instructor's permission to do so,
b. agree with the instructor on the objectives and scope of the use of AI tools.
2. A student may only use artificial intelligence tools to edit papers in Polish if the instructor agrees.
3. If a student uses artificial intelligence tools:
a. without the consent of the instructor or
b. in a manner not agreed with the instructor,
the teacher shall apply procedures analogous to those used in the anti-plagiarism procedure (cf. Resolution No. 14 of the University Education Council).

Practical placement

None.

Bibliography

A selection from the list.
Harald R. Baayen (2014), Analyzing linguistic data. A practical guide to statistics using R, Cambridge University Press.
David M. Blei (2012), Probabilistic topic models, Communications of the ACM, Vol. 55, No. 4,Association for Computing Machinery (ACM)
p. 77-84. http://www.cs.columbia.edu/~blei/papers/Blei2012.pdf
Maciej Eder (2014), Metody ścisłe w literaturoznawstwie i pułapki pozornego obiektywizmu – przykład stylometrii, Teksty Drugie, nr 2.
Maciej Eder, Mike Kestemont, Jan Rybicki (2013), Stylometry with R: a suite of tools, Digital Humanities 2013: Conference Abstracts, Lincoln: University of Nebraska-Lincoln, s. 487-489.
Rafał L. Górski, Magdalena Król, Maciej Eder (2019), Zmiana w języku.
Studia kwantytatywno-korpusowe, Kraków: IJP PAN.
Magdalena Kądzioła (2019).Czynniki różnicujące wypowiedzi informatorów — analiza stylometryczna wywiadów biograficznych, Wrocławski Rocznik Historii Mówionej, t. 8, Ośrodek Pamięć i Przyszłość
s. 63-80.
Witold Kieraś, Łukasz Kobyliński, Maciej Ogrodniczuk (2018)
Korpusomat — a Tool for Creating Searchable Morphosyntactically Tagged Corpora Computational Methods in Science and Technology, 24/1, s. 21-27. https://korpusomat.eu/
Władysław Kuraszkiewicz i Józef Łukaszewicz (1951), Ilość różnych wyrazów w zależności od długości tekstu, Pamiętnik Literacki : czasopismo kwartalne poświęcone historii i krytyce literatury polskiej, Vol. 42, No. 1, s. 168-182.
Natalia Levchina (2015), How to do linguistics with R, John Benjamins.
Małgorzata Marciniak, Witold Kieraś, Krystyna Bojałkowska, Piotr Borkowski, Monika Borys, Wiktor Eźlakowski, Wojciech Guz, Łukasz Kobyliński, Dorota Komosińska, Katarzyna Krasnowska-Kieraś, Marek Łaziński, Martyna Miernecka, Bartłomiej Nitoń, Maciej Ogrodniczuk, Michał Rudolf, Aleksandra Tomaszewska, Marcin Woliński, Joanna Wołoszyn, Beata Wójtowicz, Alina Wróblewska, Natalia Zawadzka-Paluektau (2023). Korpus Współczesnego Języka Polskiego. https://kwjp.pl/
Franco Moretti (2013), Distant Reading, Verso.
Adam Pawłowski, red. (2023), Od Gutenberga do Zuckerberga.
Jan Rybicki (2013), Stylometryczna niewidzialność tłumacza, Przekładaniec 27, s. 61-87.
Jadwiga Sambor (1977), Słowa i liczby, Wrocław-Warszawa-Kraków-Gdańsk.
Jadwiga Sambor i Rolf Hammerl (1990), Statystyka dla językoznawców, Warszawa: WUW.
StatSoft (2006). Elektroniczny Podręcznik Statystyki PL, Kraków, WEB: http://www.statsoft.pl/textbook/stathome.html.

Additional information

Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system:

Description of 3003-C4TN-JK1 in USOSweb