Some topics in the analysis of large data bases 1000-1M19ADB
In the first part of this lecture we will use the classical setup of the multivariate normal distribution to discuss the generic problems in the analysis of large data sets: needle in haystack problem, limits of signal detectability, multiple testing and regularization techniques. In the second part of the lecture we will apply these notions in the context of the analysis of high dimensional generalized linear models. We will discuss the properties of the information model selection criteria and the regularization techniques like ridge regression, LASSO and SLOPE. If time permits we will also discuss some unsupervised techniques for dimensionality reduction like PCA, sparse PCA and sparse subspace clustering. Student will have a chance to verify the properties of different methods using the computer simulations and real data analysis.
Type of course
Additional information
Information on level of this course, year of study and semester when the course unit is delivered, types and amount of class hours - can be found in course structure diagrams of apropriate study programmes. This course is related to the following study programmes:
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: