Large language models 2400-ZEWW973

The course will sequentially cover the following topics:

1. Introduction to text data processing:
a. tokenisation,
b. stemming,
c. lemmatisation,
d. stopwords,
e. n-grams,
f. TF, IDF, and TF-IDF metrics.

2. Introduction to deep learning:
a. neurons, layers, activations,
b. perceptron, architecture of Multi-Layer Perceptron,
c. activation functions: ReLU, Sigmoid, Tanh,
d. backpropagation algorithm,
e. gradient and optimization: SGD, Adam,
f. regularization, dropout,
g. batch normalization,
h. training improvements: LR scheduling, data augmentation,
i. filters, convolutions, convolutional layers,
j. max pooling, CNN structure,
k. CNN architectures: LeNet, AlexNet, ResNet,

3. Word2Vec:
a. CBOW,
b. Skip-gram.

4. Word2Vec’s extensions and alternatives:
a. FastText,
b. GloVe,
c. Negative Sampling,
d. Hierarchical Softmax.

5. Recursive neural networks:
a. vanishing gradient problem and solutions,
b. RNNs, LSTMs, GRUs,
c. biLSTM, ELMo.

6. Transformers:
a. self-attention mechanism, positional encoding,
b. transformer architecture,
c. BERT,
d. GPT,
e. T5,
f. fine-tuning,
g. transfer learning,
h. RLHF.

Type of course

optional courses

Prerequisites (description)

Basic knowledge of the Python programming language.

Course coordinators

Maciej Świtała

Learning outcomes

Students will learn how to prepare textual data for further analysis. They will gain a solid understanding of the theoretical foundations of deep learning algorithms, with particular emphasis on those that enable the extraction and processing of numerical representations of text, such as word and sentence embeddings. Naturally, participants will also become familiar with the practical aspects of implementing these methods through programming. By the end of the course, students will be able to obtain and apply contextual representations of text, selecting appropriate algorithms in accordance with the specific nature of the problem at hand. Furthermore, they will acquire knowledge of how to assess the performance of models tailored to different tasks. Students will also develop an awareness of the current challenges and limitations associated with the use of large language models.

Assessment criteria

The final grade will be determined based on: a home-taken project (70% of the grade) and a project presentation (30% of the grade).

Bibliography

Basic:

Goodfellow, I., Bengio, Y., & Courville, A. (2016) Deep learning. MIT Press.
Tunstall, L., Von Werra, L., & Wolf, T. (2022). Natural language processing with transformers. Building Language Applications with Hugging Face. 1st Edition. O'Reilly Media.

Supplementary:

Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. O'Reilly Media.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint, DOI: 10.48550/arXiv.1810.04805.
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of tricks for efficient text classification. arXiv preprint, DOI: 10.48550/arXiv.1607.01759.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint, DOI: 10.48550/arXiv.1301.3781.
Pennington, J., Socher, R., & Manning, C. D. (2014, October). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543), DOI: 10.3115/v1/D14-1162.
Petters, M. E., Neumann M., Iyyer M., Gardner M., Clark C., Lee K., & Zettlemoyer L. (2018). Deep contextualized word representations. arXiv preprint, DOI: 10.48550/arXiv.1802.05365.
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. Retrieved from: cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint, DOI: 10.48550/arXiv.1908.10084.

Additional information

Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system:

Description of 2400-ZEWW973 in USOSweb