Machine Learning in Finance I 2400-QFU1MLF

The course is conducted in two sequential stages: 1) lectures 2) case study laboratories. The lecture stage is followed by an exam which verifies the required theoretical knowledge. On the other hand, the laboratory stage is followed by practical projects realised by the student.

I part - Lectures

1. Introduction to Machine Learning
a. Types of Machine Learning
b. Introduction to Supervised Statistical Learning
c. Types of predictions, types of models, types of tabular data structures
d. Notations and general concepts - loss function, cost function, gradient descent
e. Simple Supervised Learning models - linear regression and logistic regression

2. Assessing model accuracy, machine learning diagnostics
a. Evaluation metrics- regression and classification
b. Learning curves
c. Training, validation and testing sets
d. Cross-validation technique
e. The concept of bias and variance and their trade-off
f. Possible remedies of underfitting or overfitting

3. Basic Supervised Learning models
a. K-nearest neighbours
b. Support Vector Machines
c. Decision trees and Random Forest (bagging idea)

4. Crucial machine learning techniques
a. Dataset preparation steps
b. Initial feature selection methods
c. Feature engineering
d. Regularization
e. Rebalancing
f. Explainable Artificial Intelligence*

Exam, in the convention of recruitment questions for the position of Data Scientist, Quantitative analysis, Machine Learning Engineer

II part - Labs

5. Python - lightning fast course
a. Preparation of the environment
b. Variables, data type, operators and control structure
c. Functions
d. Modules
e. Data science toolkit: NumPy, Pandas, Matplotlib, Sklearn

6. Case study - credit risk modeling
a. Construction of the first classification machine learning end-to-end pipeline
b. Solving the problem of unbalanced data set
c. Testing multiple machine learning models
d. Comparing the results of the models in the business context
e. Solving the problem of explainability of the solution*

7. Case study - medical insurance premium prediction
a. Construction of the first regression machine learning end-to-end pipeline
b. Testing multiple machine learning models
c. Comparing the results of the models in the business context
d. Solving the problem of explainability of the solution*

8. Case study - simple algorithmic trading
a. Construction of the first time series machine learning pipeline
b. Solving the problem of specifics of data preparation for the problem of time series
c. Testing multiple machine learning models
d. Playing a simple investment strategy based on trained model

Additional case study - life insurance assessment (multiclass classification)

9. Project presentations

Estimated student workload:
Type of activity K (contractual) S (independent)
lecture classes: 15h (K) 15h (S)
practical classes: 15h (K) 15h (S)
exam: 2h (K) 0h (S)
consultations with lecturer: 3h (K) 0h (S)
preparing for practical classes:0h (K) 15h (S)
preparing for lectures: 0h (K) 5h (S)
preparinf for test: 0h (K) 10h (S)
preparing fro exam: 0h (K) 5h (S)
…: 0h (K) 0h (S)
Total: 35h (K) + 65h (S) = 100h

Type of course

obligatory courses

Course coordinators

Szymon Lis
Michał Woźniak

Learning outcomes

After completing the course, the student will have reliable, structured knowledge on a wide range of supervised learning algorithms for regression and classification problems, such as linear and logistic regression, linear discriminant analysis, kNN, ridge regression, LASSO, Support Vector Machine, decision trees, and random forest. They will know the theoretical foundations of these algorithms, as well as have programming skills allowing their application in finance. They will be able to select predictive modeling algorithms that are best suited to the specific research problem, perform reliable validation of models, select and transform variables, and perform an independent research project using the methods learned.

K_U02, K_U05

Assessment criteria

There are three elements that the final grade consists of. The first one is the theoretical part exam, which consists of 10 open-ended questions. The second is to prepare individual machine learning projects and write down an extended report in a Python notebook, containing blocks of code that will allow the teacher to fully reproduce the applied analysis. Each project should be prepared on a different dataset selected by the students - one reasonably small dataset and one large dataset - approved by the tutor (for example from https://www.kaggle.com). The third component is to present the results in public.

The following weights are used to determine the final grade:

40% - Exam
20% - Presentation
40% - Extended report

The threshold to pass is equal to 60%.

Additional information

Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system:

Description of 2400-QFU1MLF in USOSweb