Data Science in finance and other applications 2400-ENSM067A
Seminar is aimed at students with good understanding of traditional regression models (linear and logistic) and at least intermediate programming experience (preferred R, but python also welcome) interested in conducting advanced empirical research on up-to-date topics with the use of modern data science and machine learning methods in financial or any other applications.
The lecturer offers help in further development of analytical (machine learning) and programming skills, awaited by future employers. Potential joint article in scientific economic journal based on the thesis are also possible.
The seminar will begin with the discussion of the rules of writing the master's thesis, its structure and composition. Alternative tools for writing the thesis, but also articles and effective presentations (LaTeX and RMarkdown) will be presented with practical examples. Students will get to know several available machine learning data databases with real (big) data ready for use in the thesis. Model cross-validation techniques and model quality assessment measures will be discussed. Various machine learning algorithms and their application to predictive modelling in classification and regression tasks will be presented and explained, including the k-Nearest Neighbours, Support Vector Machines / Support Vector Regression, elastic nets – ridge and LASSO regression, decision and regression trees, bootstrap averaging of models, random forests, different algorithms of boosting trees (incl. xgboost), several types of neural networks, model ensembling and stacking and deep learning. And last, but not least, additional topics related to feature engineering, feature selection and resampling methods will be mentioned as well. Students will present and discuss various financial and non-financial applications of predictive modelling based on real data and previous research. Seminar participants will also present their research concepts – selected topic, research framework, hypotheses, results of empirical analyses and the discussion of conclusions.
Type of course
Course coordinators
Learning outcomes
Master thesis using selected machine learning algorithms
KW01, KW02, KW03, KU01, KU02, KU03, KK01, KK02, KK03
Assessment criteria
Students’ presentations during seminar meetings, activity in discussions on the presentations of other seminar participants, progress with the master thesis.
Bibliography
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani (2017), “Introduction to statistical learning. With Applications in R”, Springer-Verlag.
Hastie Trevor, Robert Tibshirani and Jerome Friedman (2009), “Elements of statistical learning”, Springer-Verlag.
Kuhn Max, Johnson Kjell (2013), “Applied predictive modelling”, Springer-Verlag.
Wickham Hadley, Grolemund Garrett (2017), “R for Data Science”, O'Reilly Media.
Additional information
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: