- Inter-faculty Studies in Bioinformatics and Systems Biology
- Bachelor's degree, first cycle programme, Computer Science
- Bachelor's degree, first cycle programme, Mathematics
- Master's degree, second cycle programme, Bioinformatics and Systems Biology
- Master's degree, second cycle programme, Computer Science
- Master's degree, second cycle programme, Mathematics
Corpus linguistics 3223-LK-OG
The course program includes theoretical news on the construction of language corpora, techniques of collecting language data and the possibilities of their practical use:
1. The concept of a language corpus. Language corpus vs. collection of texts. Theoretical vs. material research: the role of linguistic data in social sciences, marketing, media and linguistics.
2. Typology of corpora: monolingual and multilingual, parallel and comparative corpora. The concept of representativeness and adequacy of corpora.
3. Basic information on indexing of language corpora; skills of interpretation of the given data.
4. Basic corpora of the Polish language. NKJP corpus with accompanying tools.
5. Available tools for analysis of text corpora (AntConc, Jasnopis, etc.).
6. Parametric text analysis tools, practical applications of text modality measurement.
7. Possible applications of corpora in practice:
(a) study of specialized languages
b) corpus as a tool to assist the translator
c) foreign language didactics
d) dictionaries and dictionary models on various data carriers
8. text corpora and parallel texts in working with translation support programs (CAT tools).
The purpose of the course is to familiarize the student with basic computer tools that allow us to learn more about the text we need, statistically and stylistically analyze its content and use authoritative data for our own (research) purposes. After the course, the student should be able to use basic programs in digital linguistics and search for the necessary data using the tools learned. The course will be conducted with the use of presentations and visualization of the operation of individual programs, as available hardware capabilities. In addition, the course assumes gaining the ability to derive individual conclusions from the linguistic analyses conducted and their visualization. The acquired knowledge is to be used for the creation by the student (group of students) of a formally defined in class project of a text corpus, forming the basis of the credit (the project or the acquired skills can be applied to make research in other subjects more attractive or to strengthen the content of the thesis). Alternatively, the subject can be passed by demonstrating subject knowledge on a credit test of the course content and literature.
Student workload (3 ECTS):
30 hours classroom attendance (1)
30 hours preparation of text corpus or alternative studies (1)
15 hours of own work with software (0.5)
15 hours reading and preparing for class, preparing for the exam/project discussion (0.5)
Term 2023Z:
None |
Term 2024L:
None |
Type of course
Mode
Prerequisites (description)
Course coordinators
Learning outcomes
The student, in terms of the following criteria, improves his qualifications:
is familiar with the terminology used in linguistics and related fields at an extended level, is oriented in the most important directions and methods of linguistic research; understands grammatical terminology; has knowledge of selected pragmatic conditions of the given language systems;
has in-depth knowledge of the methodology and conduct of linguistic or literary research; is familiar with scientific style and scientific lexis; has knowledge of databases for linguistics, has basic knowledge of the interpretation of data obtained from analysis;
is familiar with popular computer-aided translation (CAT) programs and selected programs for frequency and stylistic analysis; knows the possibilities of using machine translation;
Skills:
uses computer programs useful in translator's work, is able to properly format text in Polish, and at least one in a foreign language; proficiently uses spreadsheets and charts; is able to use generally available scientific databases (including terminology and corpus databases); proficiently searches for information, uses expert knowledge, encyclopedic, linguistic, general-scientific, general-technical, interdisciplinary and industry-specific terminology dictionaries, language corpora, databases, parallel texts;
is able to identify gaps in scientific research and directions for its continuation; formulates research problems, selects adequate methods, constructs research tools, develops, presents and interprets research results, draws conclusions;
Social competencies:
is able to supplement and improve the acquired knowledge of at least one language of the specialty and his own language; is aware of the need to constantly search for new dictionary and textual sources, as well as to follow contemporary emerging scientific theories; reacts quickly to the changing reality;
draws conclusions from feedback, knows how to manage time; maintains contact with the translation community, works in a multicultural environment; is familiar with the translator's work environment;
is able to work in a group, cooperate with others, assuming appropriate roles (functions); lead a small team (3-4 people in practice groups);
Assessment criteria
Methods of evaluation of student work
- Evaluation of activity and current preparation for classes;
- Project (thematic text corpus or alternative);
- final written credit (test or stylometric analysis of the corpus).
Assessment criteria (components of the final assessment):
- continuous evaluation from classes: 10%
- project (thematic text corpus): 40%
- final credit: 50%
Examination (final credit):
In the case of a mixed test (multiple and/or single-choice), the number of points obtained is binding.
In the case of preparation of a project of analysis of selected text corpora, the number of points obtained for each part of the study is binding.
Scoring rules for the current assessment and exam/assessment:
55%-69% = 3
70%-74% = 3+
75%-84% = 4
85%-89% = 4+
90%-100% = 5
Rules of cooperation of the instructor with students:
1. absences - allowed 3 unexcused absences per semester (this is in accordance with the regulations).
2.The final credit can be taken after passing the project (assignment) and receiving a positive evaluation of continuous classes.
(3) The student has the right to improve each written test twice. Failure to take the test on the first date without an excuse will result in the loss of the date.
Bibliography
see in Polish
Term 2023Z:
None |
Term 2024L:
None |
Notes
Term 2023Z:
None |
Term 2024L:
None |
Additional information
Information on level of this course, year of study and semester when the course unit is delivered, types and amount of class hours - can be found in course structure diagrams of apropriate study programmes. This course is related to the following study programmes:
- Inter-faculty Studies in Bioinformatics and Systems Biology
- Bachelor's degree, first cycle programme, Computer Science
- Bachelor's degree, first cycle programme, Mathematics
- Master's degree, second cycle programme, Bioinformatics and Systems Biology
- Master's degree, second cycle programme, Computer Science
- Master's degree, second cycle programme, Mathematics
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: