Statistical data analysis 1000-714SAD
1. Basic notions of probability calculus and statistics: random variables, their distributions, expected value and variance, probability space.
2. Basic notions of statistics: statistical space, random experiment, statistic, statistical model, model evaluation methods.
3. Parameter estimation. Bias and efficiency, maximum likelihood estimatoes, confidence intervals.
4. Summary and visualisation of data. Quantile-quantile plots. Histograms, kernel density estimation, boxplot.
5. Hypothesis testing. The notion of a statistical hypothesis, the procedure of hypothesis testing, type I and type II errors, power of a test, Neyman-Pearson lemma, parametric statistical significance tests, significance tests for a mean, significance tests for a variance.
6. The notion of p-value and potential misunderstandings and misusage, effect size, multiple hypothesis testing.
7. Useful statistical tests. Statistical significance test for two means, non-parametric tests for two medians, Pearson's chi-squared test, analysis of variance.
8. Linear regression, simple, multiple, with extensions: assumptions, parameter estimation, evaluation of goodness of fit.
9. Classification. Logistic regression, LDA, QDA, KNN.
10. Resampling methods. Cross-validation, bootstrap.
11. Model selection and regularisation. Feature selection, usage of a validation set, usage of cross-validation, analysis of high-dimensional data, lasso and ridge regression, partial least squares.
12. Tree-based models: decision trees, bagging, random forests, boosting
13. Support vector machines. Separating hyperplanes, maximum margin classifyier, support vector machines.
14. Dimensionality reduction. PCA.
15. Unsupervised learning. The notion of clustering, methods of hierarchical clustering and k-means.
16. Nonlinear models. Polynomial regression, splines, generalized additive models.
Type of course
Prerequisites (description)
Course coordinators
Learning outcomes
Knowledge:
1. general knowledge of the problems of statistical data analysis.
2. basic knowledge of the statistical tools used in the modeling and analysis of data.
3. basic notions and methods of probability calculus and statistics, including parameter estimation and hypothesis testing methods.
Skills:
1. performing simple statistical analysis and statistical testing.
2. using modern statistical analysis tools.
Social skills:
1. Ability to explain statistical inference in plain words.
Assessment criteria
Impact on the final grade: the exam grade 40%, mid-term test 20%, assignment 10%, in class activity 10%, in lab activity 10%.
Bibliography
Lesław Gajek, Marek Kałuszka, Wnioskowanie statystyczne, modele i metody.
John A. Rice, Mathematical Statistics and Data Analysis.
Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani. Introduction to Statistical Learning in R.
Additional information
Information on level of this course, year of study and semester when the course unit is delivered, types and amount of class hours - can be found in course structure diagrams of apropriate study programmes. This course is related to the following study programmes:
- Inter-faculty Studies in Bioinformatics and Systems Biology
- Bachelor's degree, first cycle programme, Mathematics
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: