Introduction to natural language processing 3003-C3N-JK1
The aim of the course will be to give participants a practical introduction to natural language processing, computational linguistics and programming, in particular - processing text corpora using natural language processing techniques available in the Python programming language.
Participants are not required to have prior knowledge of programming languages and programming skills, but are expected to have the motivation and commitment needed to acquire programming skills in natural language processing.
Topics defining the scope of the course:
1. Basics of programming in Python: variable types, data structures, conditions and loops, functions and classes, working with files and using packages.
2. Application of Python for text data collection and processing (scraping, API querying, OCR and audio transcription).
3. SpaCy and different levels of linguistic annotation: morpho-syntactic analysis and tagging, dependency parsing.
4. Vector semantics and language models.
5. Models for sequence classification and token classification in spaCy.
6. Search through text using spaCy: rule-based and layer-based annotation search, semantic search.
7. Stylometric analysis of texts using StyloMetrix, pandas and scikit learn.
8. Topic modelling using BERTopic.
9. Visualisation of corpus processing results.
Type of course
Mode
Prerequisites (description)
Course coordinators
Learning outcomes
Student
- is familiar with the tools for text data processing and analysis available in the Python language
- knows the basics of programming in Python and Python packages for text data processing and analysis
- knows the most important concepts and techniques of natural language processing
- is able to analyse a text data corpus with the use of Python language packages
- is able to formulate a hypothesis concerning a text corpus and verify it using natural language processing techniques
- is able to visualise the results of a text corpus analysis
- is able to critically evaluate information on artificial intelligence systems based on text data
- is able to understand the importance of natural language processing in solving both theoretical and practical problems and to apply the methods of this field to achieve their own research goals
Assessment criteria
Attendance in class (two absences allowed).
Regularly solving programming and natural language processing tasks.
Completion of a small individual or group project using natural language processing methods.
Bibliography
Altinuk, D. (2021). Mastering spaCy: An end-to-end practical guide to implementing NLP applications using the Python ecosystem. Birmingham: Packt Publishing.
Hobson, L., Cole, H., Hannes, H. (2021). Przetwarzanie języka naturalnego w akcji. Rozumienie, analiza i generowanie tekstu w Pythonie na przykładzie języka angielskiego. Warszawa: PWN.
Mattingly, W. (2022). Introduction to Python for Digital Humanities, 2022, URL: www.python-textbook.pythonhumanities.com.
Mattingly, W. (2021). Introduction to spaCy 3, URL: www.spacy.pythonhumanities.com.
Sweigart, A. (2020). Automatyzacja nudnych zadań z Pythonem. Nauka programowania. Gliwice: Helion.
Additional information
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: