Experimental semantics - corpus analysis module 3501-KOG-SE-MAK
1. Corpora and search engines
Lesson 1. Available English na Polish corpora
- discussion on the structure of the following corpora: NKJP, BNC and COCA
- text structure of the corpora (balance od types and sources of texts)
- additional informations in corpora (metadata, tagsets etc.)
- available search engines
- practical aim: a student can search for words and phrases using different search engines for Polish and English corpora
Lesson 2. Advanced features of search engines. Corpus Query Language
- syntax of CQL
- regular expressions
- searching using metadata
- practical aim: a student can construct complex search query using regular expressions and metadata
2. Collocations
Lesson 1. Association measures
- t-score
- χ2
- Mutual Information
- logDice
- statistical hypothesis testing using association measures
- practical aim: a student can, provided with frequency data of the words, calculate and interpret different association measures, she can understand practical and theoretical differences among them, she can also use them to statistically test hypotheses on co-occurence of particular semantic units
Lesson 2. Association measures in corpus search engines
- using corpus search engines to compute association statistics and extract frequency data
- practical aim: a student can use available tools to compute association statistics, in addition to that, when certain measurment is not available, the student can extract frequency data and compute given statistic herself
3. SketchEngine
Lesson 1. What is SketchEngine and what it can do for you?
- discussion on corpora available in SketchEngine
- searching and saving results of the queries
- association measures available in SketchEngine
- WordSketches
- parallel corpora
- practical aim: a student can use her knowledge and skills from previous lessons in work with SketchEngine
4. WordNets
Lesson 1. WordNet and Słowosieć
- structure of WordNet - different semantic relations between semantic units
- using WordNets in combination with corpora
- practical aim: a student can use informations from WordNets in her work with text corpora
5. Scripting your work with corpora (for volunteers)
Lesson 1. Access to SketchEngine using Python
- discussion on SketchEngine API
- short introduction to JSON data format and simplejson Python library
- practical aim: a student can access whole functionality of SketchEngine using Python
Type of course
Mode
Assessment criteria
For every week there will be an assignment for the students (7 assignments total, 10 points max for each assigment). Final mark depends only on succesfully completing assignments.
0-35 - 2
35-50 - 3
50-60 - 4
60-70 - 5
Additional information
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: