Quantitative Approaches to Style: Literary Texts and Computer Stylometry 3003-C4TN-JK1
The course covers the following topics:
1. basic concepts of linguistic statistics.
Students will learn the concepts of quantitative methods of linguistics.
2. corpus construction.
Work with the https://korpusomat.eu program. How do you prepare and clean the corpus, import, process, and download the corpus? Format of the output data. Corpus queries.
3. Analysis of corpus data.
Work with ready-to-use corpora (KWJP) and corpora compiled by participants.
4. introduction to RStudio and the R language.
5. What is stylometry, and what are its methods?
This module is a brief overview of stylometry, its problems and applications.
6. stylometric discriminators.
What is a stylometric discriminator? An overview of discriminators and information they carry.
7. large-scale stylometry.
The module introduces the concepts of distant reading, large-scale stylometry, and large-scale stylometric methods. It also includes an introduction to the stylo package and the interpretation of analyses conducted with this set of linguistic tools.
8. preparation and stylometric analysis of the corpus; visualisation of data.
Students work in teams, preparing a corpus, analysing its results according to selected indicators, and preparing a presentation of the principles of their analysis. The module also introduces students to data visualisation.
9 Presentation of results.
Participants perform tasks on the Campus platform and actively participate in compiling the thematic/genre/author corpus, conducting its stylometric analysis, and presenting the results. Participants complete (no more than 2) team assignments during the semester.
Type of course
Mode
Prerequisites (description)
Course coordinators
Learning outcomes
Upon completion of the course, the student:
— knows what a language corpus is, particularly one for stylistic research.
— knows how a corpus for stylistic research differs from corpora for other research.
— knows how to create a suitable corpus using the available tools.
— understands and interprets the results of calculations made with the available tools.
— distinguishes between basic types of data and knows how to select appropriate tests for their use.
— knows and understands basic stylometric tests and can carry them out.
— can use the programs' documentation.
— knows the basics of the R language for stylometric analysis.
— knows and understands the limitations of the methods and data types used.
— knows how to design stylometric tests, carry them out, and report the results.
— has a basic knowledge of the visualisation of data of different types.
Assessment criteria
1. completion of assignments and running tests in the classroom and on the platform: 20%.
2. team projects during class (no more than 2): 30%.
3. final team project: 50%.
Absences
1. A student is entitled to two unexcused absences per semester.
2. If a student has more than two unexcused absences, he/she fails the course.
3. If a student wishes to excuse absences, he/she must document objective reasons for the absences within one week (e.g. a medical release).
4. the instructor indicates how the student makes up the excess excused
Use of AI tools:
1. If a student wishes to use artificial intelligence tools for a credit (final) assignment or a partial assignment, he/she must:
a. obtain the instructor's permission to do so,
b. agree with the instructor on the objectives and scope of the use of AI tools.
2. A student may only use artificial intelligence tools to edit papers in Polish if the instructor agrees.
3. If a student uses artificial intelligence tools:
a. without the consent of the instructor or
b. in a manner not agreed with the instructor,
the teacher shall apply procedures analogous to those used in the anti-plagiarism procedure (cf. Resolution No. 14 of the University Education Council).
Practical placement
None.
Bibliography
A selection from the list.
Harald R. Baayen (2014), Analyzing linguistic data. A practical guide to statistics using R, Cambridge University Press.
David M. Blei (2012), Probabilistic topic models, Communications of the ACM, Vol. 55, No. 4,Association for Computing Machinery (ACM)
p. 77-84. http://www.cs.columbia.edu/~blei/papers/Blei2012.pdf
Maciej Eder (2014), Metody ścisłe w literaturoznawstwie i pułapki pozornego obiektywizmu – przykład stylometrii, Teksty Drugie, nr 2.
Maciej Eder, Mike Kestemont, Jan Rybicki (2013), Stylometry with R: a suite of tools, Digital Humanities 2013: Conference Abstracts, Lincoln: University of Nebraska-Lincoln, s. 487-489.
Rafał L. Górski, Magdalena Król, Maciej Eder (2019), Zmiana w języku.
Studia kwantytatywno-korpusowe, Kraków: IJP PAN.
Magdalena Kądzioła (2019).Czynniki różnicujące wypowiedzi informatorów — analiza stylometryczna wywiadów biograficznych, Wrocławski Rocznik Historii Mówionej, t. 8, Ośrodek Pamięć i Przyszłość
s. 63-80.
Witold Kieraś, Łukasz Kobyliński, Maciej Ogrodniczuk (2018)
Korpusomat — a Tool for Creating Searchable Morphosyntactically Tagged Corpora Computational Methods in Science and Technology, 24/1, s. 21-27. https://korpusomat.eu/
Władysław Kuraszkiewicz i Józef Łukaszewicz (1951), Ilość różnych wyrazów w zależności od długości tekstu, Pamiętnik Literacki : czasopismo kwartalne poświęcone historii i krytyce literatury polskiej, Vol. 42, No. 1, s. 168-182.
Natalia Levchina (2015), How to do linguistics with R, John Benjamins.
Małgorzata Marciniak, Witold Kieraś, Krystyna Bojałkowska, Piotr Borkowski, Monika Borys, Wiktor Eźlakowski, Wojciech Guz, Łukasz Kobyliński, Dorota Komosińska, Katarzyna Krasnowska-Kieraś, Marek Łaziński, Martyna Miernecka, Bartłomiej Nitoń, Maciej Ogrodniczuk, Michał Rudolf, Aleksandra Tomaszewska, Marcin Woliński, Joanna Wołoszyn, Beata Wójtowicz, Alina Wróblewska, Natalia Zawadzka-Paluektau (2023). Korpus Współczesnego Języka Polskiego. https://kwjp.pl/
Franco Moretti (2013), Distant Reading, Verso.
Adam Pawłowski, red. (2023), Od Gutenberga do Zuckerberga.
Jan Rybicki (2013), Stylometryczna niewidzialność tłumacza, Przekładaniec 27, s. 61-87.
Jadwiga Sambor (1977), Słowa i liczby, Wrocław-Warszawa-Kraków-Gdańsk.
Jadwiga Sambor i Rolf Hammerl (1990), Statystyka dla językoznawców, Warszawa: WUW.
StatSoft (2006). Elektroniczny Podręcznik Statystyki PL, Kraków, WEB: http://www.statsoft.pl/textbook/stathome.html.
Additional information
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: