Large-scale machine learning 1000-319bBML

- Hardware: from a GPU to a datacenter, and why the architecture matters at scale
- Parallel and distributed optimization: how to parallelize algorithms and how to reason about their performance
- Parallelizing classic ML algorithms
- LLMs introduction: motivation, transformers and scaling laws
- Parallelizing LLM training: parallelization types, bottlenecks, common memory optimizations
- Datasets and benchmarking LLMs
- Handling data: an introduction to data engineering
- ML in production: risks, rewards, common problems
- Case study: ML in computational infrastructure

Course coordinators

Marek Cygan
Krzysztof Rządca

Type of course

elective monographs

Requirements

Deep neural networks
Natural language processing
Statistical machine learning

Prerequisites (description)

parallel programming, computer networks, algorithms and data structures

Assessment criteria

Final score based on programming assignments, points for participation in laboratories and a written exam.

Bibliography

- Scientific papers used during lectures
- “The Datacenter as a Computer: Designing Warehouse-Scale Machines”, Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle
- “Fundamentals of Data Engineering”, Joe Reis and Matt Housley

Additional information

Information on level of this course, year of study and semester when the course unit is delivered, types and amount of class hours - can be found in course structure diagrams of apropriate study programmes. This course is related to the following study programmes: