Large-scale machine learning 1000-319bBML
- Hardware: from a GPU to a datacenter, and why the architecture matters at scale
- Parallel and distributed optimization: how to parallelize algorithms and how to reason about their performance
- Parallelizing classic ML algorithms
- LLMs introduction: motivation, transformers and scaling laws
- Parallelizing LLM training: parallelization types, bottlenecks, common memory optimizations
- Datasets and benchmarking LLMs
- Handling data: an introduction to data engineering
- ML in production: risks, rewards, common problems
- Case study: ML in computational infrastructure
Type of course
Requirements
Prerequisites (description)
Course coordinators
Term 2025Z: | Term 2024Z: |
Assessment criteria
Final score based on programming assignments, points for participation in laboratories and a written exam.
Bibliography
- Scientific papers used during lectures
- “The Datacenter as a Computer: Designing Warehouse-Scale Machines”, Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle
- “Fundamentals of Data Engineering”, Joe Reis and Matt Housley
Additional information
Information on level of this course, year of study and semester when the course unit is delivered, types and amount of class hours - can be found in course structure diagrams of apropriate study programmes. This course is related to the following study programmes:
- Bachelor's degree, first cycle programme, Computer Science
- Master's degree, second cycle programme, Computer Science
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: