Machine Learning 1: classification methods 2400-DS1ML1

1. Introduction to Machine Learning
a. What is and what is not machine learning
b. Differences between classification, regression and clustering
c. Introducing a cost function
d. Sample parametric methods - linear regression and logistic regression
2. Measuring performance, machine learning diagnostics
a. Performance measures of supervised learning algorithms (model performance, error, confusion matrix and ratios, ROC curve, AUC, RMSE)
b. Learning curves
c. Training set and test set
3. Testing the model
a. Extending model complexity to increase fit
b. The concept of bias and variance and their trade-off
c. Cross-validation, selection of number of folds
4. Feature engineering
a. Feature transformation
b. Discretization of continuous features
c. Feature standardization/normalization
5. k-NN
a. Classification with k-nearest neighbours
b. Regression with k-nearest neighbours
6. Support Vector Machines
a. Optimization objective
b. Separating the data with a maximum margin
c. Kernel selection for more complex data
d. Modification of SVM algorithm for regression problems
7. Feature selection methods
a. Wrapper methods including automated selection (forward, backward and stepwise)
b. Filter methods – applying scoring to features (e.g. Chi squared test, information gain and correlation coefficient scores)
8. Regularization methods
a. introducing penalty for complexity
b. L1 regularization for additional sparsity in coefficients
c. L2 regularization for penalization of large coefficients
d. Regularized linear regression
e. Regularized logistic regression
9. Lasso regression
10. Workshops on real data
11. Project presentations

Type of course

obligatory courses

Course coordinators

Piotr Wójcik

Learning outcomes

After completing the course, the average student will have reliable, structured knowledge on a wide range of unsupervised learning algorithms for regression and classification problems, such as linear and logistic regression, linear discriminant analysis, kNN, ridge regression, LASSO, Support Vector Machine. They will know the theoretical foundations of these algorithms, as well as have programming skills allowing their application in practice. They will be able to select predictive modeling algorithms that are best suited to the specific research problem, perform reliable validation of models, select and transform variables, and perform an independent research project using the methods learned.
K_U02, K_U05

Assessment criteria

Harrington, Peter. Machine learning in action. Vol. 5. Greenwich, CT: Manning, 2012.
Zumel, Nina, John Mount, and Jim Porzak. Practical data science with R. Manning, 2014.
Lantz, Brett. Machine learning with R. Packt Publishing Ltd, 2013.
Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. "The Elements of Statistical Learning: Data Mining, Inference, and Prediction." Springer Series in Statistics ( (2009).

Bibliography

Additional information

Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system:

Description of 2400-DS1ML1 in USOSweb