Introduction to Machine Learning 1600-SZD-SPEC-WML-EF
The aim of the course is to give participants theoretical background related to building and evaluation of predictive models and intuitive explanation for different machine learning algorithms. Students will learn how to select, implement, assess and compare predictive models for regression and classification tasks and to apply machine learning tools in R on real data. Therefore prior at least basic knowledge of R is expected.We will NOT discuss all technical details behind machine learning methods, such as optimization algorithms and theoretical properties. Most users do not need deep understanding of these aspects to become informed users of ML algorithms. The aim is to focus on intuitions, strengths and weaknesses of various algorithms and present methods which are most widely used in practical applications. We will describe basic assumptions and intuition together with trade-offs behind each of the approaches, assuming that a student is comfortable with basic mathematical concepts (e.g. statistical tests, matrix notation).The course is planned in a form of a workshop. Each topic will be started with a theoretical introduction (presentation) and then illustrated with practical examples performed in R (RStudio) on real datasets. Detailed list of topics: (1) What is machine learning? Basic terminology, (2) Parametric benchmarks - linear regression and logistic regression, (3) Model assessment metrics, the issue of overfitting, (4) Model validation - why and how?, (5) Regularization methods: ridge regression, LASSO and elastic net, (6) K-nearest neighbours, (7) Decision and regression trees, (8) Tree extensions: random forests and boosting, (9) Sample rebalancing, (10) Basics of eXplainable Artificial Intelligence (XAI).
Type of course
Course coordinators
Learning outcomes
Knowledge | The graduate knows and understands:
WG_01 - to the extent necessary for existing paradigms to be revised - a worldwide body of work, covering theoretical foundations as well as general and selected specific issues - relevant to a particular discipline
within the social sciences
WG_02 - the main development trends in the disciplines of the social sciences in which the education is provided
WG_03 - scientific research methodology in the field of the social sciences
WK_01 - fundamental dilemmas of modern civilisation from the perspective of the social sciences
Skills | The graduate is able to:
UK_05 - speaking a foreign language at B2 level of the Common European Framework of Reference for Languages using the professional terminology specific to the discipline within the social sciences, to the extent enabling participation in an international scientific and professional environment
Social competences | The graduate is ready to
KO_01 - fulfilling the social obligations of researchers and creators
KO_02 - fulfilling social obligations and taking actions in the public interest, in particular in initiating actions in the public interest
KO_03 - think and acting in an entrepreneurial manner
Assessment criteria
Description of requirements related to participation in classes, including the
permitted number of explained absences: Understanding basic mathematical and statistical concepts (e.g. matrix notation, probability, statistical tests, significance, p-value), prior at least basic knowledge of R programming language and RStudio interface. One explained absence allowed.
Principles for passing the classes and the subject (including resit session): Attendance at classes and preparation of an individual analytical project on a data set selected by the student in consultation with the instructor. The data set should be large enough (number of columns * number of observations above 100,000) and consist of at least 20 variables of various types (numerical, qualitative).
Methods for the verification of learning outcomes: Active participation in classes and evaluation of prepared final projects.
Evaluation criteria: Correctness of the data analysis, use of appropriate machine learning algorithms, proper selection of model hyperparameters, assessment of the effectiveness of the models on a randomly selected training and testing samples, discussion and interpretation of the results, linguistic and formal correctness of the presented research report.
Practical placement
-
Bibliography
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani (2021/2023), “Introduction to statistical learning. With Applications in R/Python”, Springer-Verlag, freely available online https://www.statlearning.com/ Hastie Trevor, Robert Tibshirani and Jerome Friedman (2009), “Elements of statistical learning” , Springer-Verlag, freely available online: https://hastie.su.domains/ElemStatLearn/
Additional information
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: