Corpus Linguistics 3200-M1-2LK
The course program includes theoretical knowledge about the construction of language corpuses, techniques of collecting language data and the possibilities of their practical use:
1. the concept of a language corpus. Language corpus and a collection of texts. Theoretical and material research: the role of linguistic data in linguistics
2. corpus typology: monolingual, multilingual, parallel and comparable corpus. The concept of representativeness and adequacy of bodies.
3. basic information on the indexation of linguistic corpuses; ability to interpret data and use the acquired morphological and syntactic information.
4. basic corpus of Polish language. NKJP Corpus with accompanying tools.
Basic corpus of languages of taught language areas.
Available tools for text corpus analysis (AntConc, Jasnopis et al.).
Tools of parametric analysis of text, practical applications of measuring modality of text.
Possible applications of corpus in linguistic practice:
(a) examination of specialised languages
(b) the corpus as a tool to assist the translator
(c) foreign language teaching
(d) dictionaries and models of dictionaries on different data carriers
9. text corpuses and parallel texts working with translation support software (CAT tools).
The aim of the course is to familiarize students with basic IT tools supporting the process of collecting, verifying and using lexical, stylistic and syntagmatic structures appropriate for general or specialist language. After the course, the student should use basic programs in the field of digital linguistics and search for necessary linguistic data using the tools he or she learns.
The course is conducted with the use of presentation and visualization of the operation of individual programs, according to the available equipment.
Moreover, the course places emphasis on independent software search and deriving individual conclusions from conducted linguistic analyses and their visualization.
The acquired knowledge is to be used by the student (a group of students) to create a project of a text corpus, formally defined during the classes, which is the basis for passing the course, and to increase the competence of students to create independent lexicographic and corpus works (including conducting analyses for the needs of the diploma thesis).
Student workload (3 ECTS):
30 hours attendance at classroom classes (1)
30 hours preparation of text corpus (1)
15 hours own work with software (0.5)
15 hours reading and preparation for classes, preparation for the exam (0.5)
Type of course
Mode
Prerequisites (description)
Course coordinators
Learning outcomes
The student improves his or her qualifications with regard to the following criteria:
Knowledge:
The student knows the terminology used in linguistics and related fields at the extended level, is familiar with the most important directions and methods of linguistic research; understands grammatical terminology; has knowledge of selected pragmatic conditions of given language systems;
has in-depth knowledge of methodology and conducting linguistic or literary research; knows scientific style and lexis; has knowledge of databases for linguistics, has basic knowledge of interpretation of data obtained from analysis;
knows popular computer programs supporting translator's work (CAT) as well as selected programs for attendance and styleometric analysis; knows the possibilities of using machine translation;
Skills:
uses computer programs useful in the translator's work, can properly format text in Polish and at least one foreign language; is able to efficiently use spreadsheets and charts; is able to use generally available scientific databases (including terminological and corpus databases); is able to efficiently search for information, uses expert knowledge, encyclopaedic, linguistic, general scientific, general technical, interdisciplinary and industry dictionaries, language corpora, databases, parallel texts;
can identify gaps in scientific research and directions of its continuation; formulates research problems, selects adequate methods, constructs research tools, develops, presents and interprets research results, draws conclusions;
Social competences:
is aware of the need to constantly search for new dictionary and text sources, as well as to follow modern scientific theories; responds quickly to changing realities;
draws conclusions from feedback, knows how to manage time; maintains contact with the translator community, works in a multicultural environment; knows the translator's working environment;
can work in a group, collaborate with others, take on appropriate roles (functions); manage a small team (3-4 people in groups);
Assessment criteria
Methods of assessment of student's work
- activity assessment and current preparation for classes;
- project (thematic textual corpus);
- final written assessment (test or stylometric analysis of the corpus).
Evaluation criteria (components of the final evaluation):
- Continuous assessment in the classroom: 10%
- project (thematic textual corpus): 40%
- final credit: 50%
Examination (final pass):
In the case of a multiple-choice/one-choice test, the number of marks obtained is binding.
In the case of a Styletometric analysis of the corpus, the number of points obtained for each Styletometric parameter is binding.
The rules of scoring for the current grade and the exam/credit:
55%-69% = 3
70%-74% = 3+
75%-84% = 4
85%-89% = 4+
90%-100% = 5
Principles of cooperation between the teacher and students:
1. absences - 3 unjustified absences from the semester are allowed (this is in accordance with the regulations).
2. The final credit can be given after passing the project and receiving a positive assessment of the continuous course.
3. The student has the right to twice correct each written examination. Failure to take the exam in the first term without justification will result in the loss of that term.
Practical placement
---
Bibliography
Literatura podstawowa:
- Gruszczyńska E., Leńko-Szymanska A. (red.), Polskojęzyczne korpusy równoległe / Polish-language Parallel Corpora, WLS UW, Warszawa, 2016.
- Karpiński Ł., Systemy leksykalno-komunikacyjne, Campidoglio, Warszawa, 2017.
- "Prace filologiczne", tom. LXIII, WP UW, Warszawa 2012 (tom zawierający zbiór prac dot. lingwistyki korpusowej)
Literatura uzupełniająca:
- Biber C., Corpus Linguistics. Investigating language structure and use, Cambridge Univesrity Press 1998.
- Celiński P., 2013a, Postmedia. Cyfrowy kod i bazy danych, Wydawnictwo UMCS, Lublin.
- Hebal-Jezierska M., Podstawowe zasady korzystania z korpusów przy badaniu języka, [w:] "Prace Etnograficzne", 2018, Tom 46, Numer 1, s. 30-49
- Kamińska-Szmaj I., 1989, Słownictwo tekstów popularnonaukowych w ujęciu statystycznym, [w:] „Rozprawy Komisji Językowej”, t. XVI, Wrocławskie Towarzystwo Naukowe, Wyd. PAN, Wrocław, s. 69-87.
- Karpiński Ł., Zarys leksykografii terminologicznej, KJS UW, Warszawa, 2008
- Karpiński Ł., 2009a, Wybrane założenia komputerowej analizy tekstów i gromadzenia danych, [w] „Języki Specjalistyczne 9 – Kulturowy i leksykograficzny obraz języków specjalistycznych”, (red. eidem), KJS UW, Warszawa
- Karpiński Ł., 2012a, Analiza parametryczna tekstu a translacja maszynowa – wybrane zagadnienia, [w] „The Linguistic Journal of Applied Linguistics”, (red.), Lingwistyczna Szkoła Wyższa w Warszawie, Warszawa.
- Karpiński Ł., 2017, Maszynowa charakterystyka tekstów specjalistycznych na potrzeby
terminologicznych baz danych, [w:] "Komunikacja Specjalistyczna", t. 14/2017, s. 139-
163.
- Karpiński Ł., Michałowski P., 2012, Wybrane metody analizy terminologii specjalistycznej (na przykładzie technolektu geografii), [w:] „Edukacja dla Przyszłości”, t. IX, 2012, Wydawnictwo Wyższej Szkoły Finansów i Zarządzania w Białymstoku, Białystok, s. 19-46.
- Lewandowska-Tomaszczyk B., Gramatyka angielska na autentycznych materiałach językowych, Łódź, WSSM 2004.
- Lewandowska-Tomaszczyk B., Podstawy językoznawstwa korpusowego, Wyd. Uniwersytetu Łódzkiego, Łódź 2005.
- Ludskanow A., 1973, Tłumaczy człowiek i maszyna cyfrowa, WNT, Warszawa
- McEnery T., Wilson A., Corpus Linguistics: an Introduction, Edinburgh : Edinburgh University Press, 2001
- Pawłowski A., 2001, Metody kwantytatywne w sekwencyjnej analizie tekstu, Uniwersytet Warszawski Katedra Lingwistyki Formalnej, Warszawa.
- Przepiórkowski A., Bańko M., Górski R., Lewandowska-Tomaszczyk B., Narodowy Korpus Języka Polskiego, PWN, Warszawa 2012
- Sambor J., 1969, Badania statystyczne nad słownictwem. Na materiale „Pana Tadeusza”, Wrocław-Warszawa.
- Świdziński M., 2006, Lingwistyka korpusowa w Polsce – źródła, stan, perspektywy, [w:] „LingVaria”, nr 1, Wydział Polonistyki UJ, Kraków.
- Tognini-Bonelli E., Corpus Linguistics at Work, John Benjamins, Amsterdam/Philadelphia 2001
Materiały autorskie prowadzącego, zbiory analiz, wizualizacje danych stylometrycznych.
Additional information
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: