1. Introductory matters (labs 1).
a. What is topic modelling?
b. What is the procedure for obtaining topics and drawing conclusions?
c. Example practical applications of topic modelling.
2. Collecting textual data for topic modelling (labs 2).
a. Review of web scraping and crawling techniques.
b. Most common technical issues.
c. Ethics and possible legal problems.
d. Review of Python libraries: Selenium and Beautiful Soup with example codes.
3. Textual data preprocessing (labs 3).
a. Tokenization.
b. Stemming.
c. Lemmatization.
d. Stopwords.
e. N-grams.
f. TermFrequence (TF).
g. Inverse Document Frequency (IDF).
h. TF-IDF.
4. Semantic topic modelling algorithms (labs 4-5).
a. Latent Semantic Analysis (LSA).
b. Non-Negative Matrix Factorization (NNMF).
5. Probabilistic topic modelling algorithms (labs 5-6).
a. Probabilistic Latent Semantic Analysis (PLSA).
b. Latent Dirichlet Allocation (LDA).
6. Measures of models’ performance (labs 7).
a. Topic coherence.
b. Perplexity.
c. Optimisation of models’ hyperparameters.
7. BERTopic algorithm (labs 8).
8. Supervised topic models (labs 9).
a. Supervised LDA (sLDA).
b. Making predictions with the BERTopic algorithm.
9. Hierarchical topic models (labs 10).
a. Hierarchical Dirichlet Process.
b. Hierarchical LDA (hLDA).
c. ‘Hierarchical’ BERTopic.
10. Time series analysis of the topic model’s output (labs 11).
11. Correlated topic models (labs 12).
a. Correlated Topic Model (CTM).
b. Pachinko Allocation Model (PAM).
12. Dynamic topic models (labs 13).
13. Students’ presentations (labs 14-15).
Learning outcomes Students will learn how to collect textual data and prepare it for further analysis. Also, they will get to know the theoretical basis of various topic modelling algorithms. Students will be able to build different topic models depending on the issue they face. Furthermore, they will know how to measure a model's performance and compare it between different algorithms applied. FInally, at the end of the course students will be aware of current topic modelling challenges and problems.
Methods and criteria of evaluation Final grade is to be established based on points obtained for preparing a home-taken project (80%) and its presentation (20%).
