Data Extraction and Text Data Analysis with Python 2400-ZEWW980
Class Schedule:
Collecting data from the Mastodon platform using the API
Collecting news articles from online portals
Overview of natural language processing (NLP) methods
NLTK library: word tokenization, stemming, n-grams, lemmatization, part-of-speech tagging
Vector representation of words
Hugging Face platform and working with large language models (LLMs)
Sentiment analysis
Topic modeling using BERTopic
Consultations on topic and methodology selection
Project presentations (2 sessions)
Type of course
Course coordinators
Learning outcomes
KNOWLEDGE
The student understands the basic principles of popular natural language processing (NLP) methods.
The student is familiar with core Python libraries used for text analysis.
The student understands the possibilities and limitations associated with text data analysis.
SKILLS
The student can create and manage a database using Python.
The student can analyze a large text dataset using text mining techniques.
The student can effectively visualize data using best practices in presentation.
The student can work with models available on the Hugging Face platform.
SOCIAL COMPETENCES
The student understands the principles of ethical and legal data processing.
The student can present their work and draw conclusions based on data analysis.
Assessment criteria
Requirements:
Final project
Attendance (maximum of 2 absences allowed)
The course is assessed based on the completion of a team project. The project will focus on the use of Python for text data analysis — for example, analyzing a selected social phenomenon based on an empirical study. The final deliverables include a Python script and a short presentation.
Bibliography
Hobson Lane Cole Howard, Hannes Max Hapke (2021), Przetwarzanie języka naturalnego w akcji. Rozumienie, analiza i generowanie tekstu w Pythonie na przykładzie języka angielskiego, Wydawnictwo Naukowe PWN
Original teaching materials prepared based on various sources (e.g., library documentation).
Additional information
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: