Corpus Linguistics 3223-1ULK
The course includes theoretical knowledge about the construction of language corpus, language data collection techniques and the possibilities of their practical use:
1. the definition of linguistic corpus. Language corpus and a collection of texts. Theoretical and material research: the role of language data in linguistics
2. Corpora typology: single and multilingual, parallel and comparable corpora. The concept of representativeness and adequacy of corps.
3. basic information on the indexation of linguistic corpora; ability to interpret data and use the morphological and syntactic information obtained
4. basic corpses of the Polish language: NKJP, PWN, IPI PAN, PELCRA. NKJP body with accompanying tools.
Basic corpus of languages of the taught language areas (preferred languages: English, German, Russian, Italian).
6. a lexicographic database as a basis for the creation of various dictionary models.
Available tools for analyzing text corpora (AntConc, Bright Writing, etc.).
Tools for parametric analysis of text, practical applications for measuring the modality of text.
Possible applications of corpus in linguistic practice:
(a) study of specialised languages
(b) the body as a tool to assist the translator
(c) teaching of foreign languages
(d) dictionaries and dictionary models on different data carriers
10. text corpora and parallel texts for translation support programs (CAT tools).
The aim of this course is to introduce the basic IT tools supporting the process of collecting, verifying and applying lexical, stylistic and syntagmatic structures appropriate for a general or specialist language. After the course, the student should use basic programs in the field of digital linguistics and search for necessary linguistic data with the help of the learned tools.
The course is conducted with the use of presentation and visualization of the operation of individual programs, as far as the available hardware capabilities are concerned.
In addition, the course places emphasis on independent software searching and deriving individual conclusions from the conducted linguistic analyses and their visualization.
The acquired knowledge is to be used by the student (a group of students) to create a formally defined text corpus, which forms the basis for the course credit and to increase the student's competence in creating independent lexicographic and corpus works (including conducting analyses for the purposes of the diploma thesis).
Student workload (3 ECTS):
30 hours attendance at classes in the classroom (1)
30 hours preparation of text corpus (1)
15 hours own work with software (0.5)
15 hours reading and preparation for classes, preparation for the exam (0.5)
Type of course
Mode
Prerequisites (description)
Learning outcomes
Knowledge:
knows the terminology used in linguistics and related fields at an extended level, is familiar with the most important directions and methods of linguistic research; understands grammatical terminology; has knowledge of selected pragmatic determinants of the language systems in question;
has in-depth knowledge of the methodology and conduct of linguistic or literary research; knows the scientific style and lexis; has knowledge of databases for linguistics, has basic knowledge of the interpretation of data obtained from analysis;
He is familiar with popular computer-aided interpreter software (CAT) and selected programs for frequency and stylometric analysis; he is familiar with the possibilities of using machine translation;
Skills:
uses computer programs useful in the work of a translator, can adequately format text in Polish and at least one foreign language, can efficiently use spreadsheets and diagrams, can use generally available scientific databases (including terminological and corpora databases), can efficiently search information, uses expert knowledge, encyclopedic, language, general scientific terminology, general technical, interdisciplinary and industry dictionaries, language corpora, databases and parallel texts;
is able to indicate gaps in scientific research and directions of its continuation; formulates research problems, selects adequate methods, constructs research tools, develops, presents and interprets research results, draws conclusions;
Social competences:
is able to supplement and improve the acquired knowledge of at least one language of specialisation and of one's own language; is aware of the need to constantly search for new dictionary and text sources, as well as to follow modern scientific theories; reacts quickly to changing realities;
draws conclusions from feedback, knows how to manage time; maintains contact with the translation community, works in a multicultural environment; knows the working environment of a translator;
can work in a group, cooperate with others in appropriate roles (functions); manage a small team (3-4 people in practice groups);
Assessment criteria
Methods of evaluating student's work
- evaluation of activity and ongoing preparation for classes;
- project (thematic text corpus);
- final written work (stylometric analysis of the corpus).
Evaluation criteria (final evaluation components):
- Continuous assessment from classes: 10%
- project (thematic text corpus): 40%
- final passing: 50%
Examination (final passing):
In the case of a stylometric analysis of the body, the number of points obtained for the individual stylometric parameters is binding.
The rules of scoring for the current assessment and the examination/credit:
55%-69% = 3
70%-74% = 3+
75%-84% = 4
85%-89% = 4+
90%-100% = 5
Principles of cooperation between the teacher and students:
Absences - allowed 3 unjustified absences in a semester (this is in accordance with the regulations).
The final grade can be approached after passing the project and having received a positive assessment of the continuous classes. All tests.
3. the student has the right to double-check each written test. Failure to take the test on the first date without justification will result in the loss of this date.
Practical placement
None
Bibliography
Basic literature:
- Gruszczyńska E., Leńko-Szymanska A. (red.), Polskojęzyczne korpusy równoległe / Polish-language Parallel Corpora, WLS UW, Warszawa, 2016.
- Karpiński Ł., Systemy leksykalno-komunikacyjne, Campidoglio, Warszawa, 2017.
- "Prace filologiczne", tom. LXIII, WP UW, Warszawa 2012 (tom zawierający zbiór prac dot. lingwistyki korpusowej)
Supplementary literature:
- Biber C., Corpus Linguistics. Investigating language structure and use, Cambridge Univesrity Press 1998.
- Celiński P., 2013a, Postmedia. Cyfrowy kod i bazy danych, Wydawnictwo UMCS, Lublin.
- Kamińska-Szmaj I., 1989, Słownictwo tekstów popularnonaukowych w ujęciu statystycznym, [w:] „Rozprawy Komisji Językowej”, t. XVI, Wrocławskie Towarzystwo Naukowe, Wyd. PAN, Wrocław, s. 69-87.
- Karpiński Ł., Zarys leksykografii terminologicznej, KJS UW, Warszawa, 2008
- Karpiński Ł., 2009a, Wybrane założenia komputerowej analizy tekstów i gromadzenia danych, [w] „Języki Specjalistyczne 9 – Kulturowy i leksykograficzny obraz języków specjalistycznych”, (red. eidem), KJS UW, Warszawa
- Karpiński Ł., 2012a, Analiza parametryczna tekstu a translacja maszynowa – wybrane zagadnienia, [w] „The Linguistic Journal of Applied Linguistics”, (red.), Lingwistyczna Szkoła Wyższa w Warszawie, Warszawa.
- Karpiński Ł., Michałowski P., 2012, Wybrane metody analizy terminologii specjalistycznej (na przykładzie technolektu geografii), [w:] „Edukacja dla Przyszłości”, t. IX, 2012, Wydawnictwo Wyższej Szkoły Finansów i Zarządzania w Białymstoku, Białystok, s. 19-46.
- Lewandowska-Tomaszczyk B., Gramatyka angielska na autentycznych materiałach językowych, Łódź, WSSM 2004.
- Lewandowska-Tomaszczyk B., Podstawy językoznawstwa korpusowego, Wyd. Uniwersytetu Łódzkiego, Łódź 2005.
- Ludskanow A., 1973, Tłumaczy człowiek i maszyna cyfrowa, WNT, Warszawa
- McEnery T., Wilson A., Corpus Linguistics: an Introduction, Edinburgh : Edinburgh University Press, 2001
- Pawłowski A., 2001, Metody kwantytatywne w sekwencyjnej analizie tekstu, Uniwersytet Warszawski Katedra Lingwistyki Formalnej, Warszawa.
- Przepiórkowski A., Bańko M., Górski R., Lewandowska-Tomaszczyk B., Narodowy Korpus Języka Polskiego, PWN, Warszawa 2012
- Sambor J., 1969, Badania statystyczne nad słownictwem. Na materiale „Pana Tadeusza”, Wrocław-Warszawa.
- Świdziński M., 2006, Lingwistyka korpusowa w Polsce – źródła, stan, perspektywy, [w:] „LingVaria”, nr 1, Wydział Polonistyki UJ, Kraków.
- Tognini-Bonelli E., Corpus Linguistics at Work, John Benjamins, Amsterdam/Philadelphia 2001
The author's resources, collections of analyses, visualizations of stylometric data.
Additional information
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: