*Conducted in terms:*2020L, 2021L

*Erasmus code:*11.3

*ISCED code:*0612

*ECTS credits:*6

*Language:*Polish

*Organized by:*Faculty of Mathematics, Informatics, and Mechanics

# Statistical data analysis 1000-714SAD

1. Basic notions of probability calculus and statistics: random variables, their distributions, expected value and variance, probability space.

2. Basic notions of statistics: statistical space, random experiment, statistic, statistical model, model evaluation methods.

3. Parameter estimation. Bias and efficiency, maximum likelihood estimatoes, confidence intervals.

4. Summary and visualisation of data. Quantile-quantile plots. Histograms, kernel density estimation, boxplot.

5. Hypothesis testing. The notion of a statistical hypothesis, the procedure of hypothesis testing, type I and type II errors, power of a test, Neyman-Pearson lemma, parametric statistical significance tests, significance tests for a mean, significance tests for a variance.

6. The notion of p-value and potential misunderstandings and misusage, effect size, multiple hypothesis testing.

7. Useful statistical tests. Statistical significance test for two means, non-parametric tests for two medians, Pearson's chi-squared test, analysis of variance.

8. Linear regression, simple, multiple, with extensions: assumptions, parameter estimation, evaluation of goodness of fit.

9. Classification. Logistic regression, LDA, QDA, KNN.

10. Resampling methods. Cross-validation, bootstrap.

11. Model selection and regularisation. Feature selection, usage of a validation set, usage of cross-validation, analysis of high-dimensional data, lasso and ridge regression, partial least squares.

12. Tree-based models: decision trees, bagging, random forests, boosting

13. Support vector machines. Separating hyperplanes, maximum margin classifyier, support vector machines.

14. Dimensionality reduction. PCA.

15. Unsupervised learning. The notion of clustering, methods of hierarchical clustering and k-means.

16. Nonlinear models. Polynomial regression, splines, generalized additive models.

## Type of course

## Course coordinators

## Learning outcomes

Knowledge:

1. general knowledge of the problems of statistical data analysis.

2. basic knowledge of the statistical tools used in the modeling and analysis of data.

3. basic notions and methods of probability calculus and statistics, including parameter estimation and hypothesis testing methods.

Skills:

1. performing simple statistical analysis and statistical testing.

2. using modern statistical analysis tools.

Social skills:

1. Ability to explain statistical inference in plain words.

## Assessment criteria

Impact on the final grade: the exam grade 40%, mid-term test 20%, assignment 10%, in class activity 10%, in lab activity 10%.

## Bibliography

Lesław Gajek, Marek Kałuszka, Wnioskowanie statystyczne, modele i metody.

John A. Rice, Mathematical Statistics and Data Analysis.

Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani. Introduction to Statistical Learning in R.

## Additional information

Information on *level* of this course, *year of study* and semester when the course
unit is delivered, types and amount of *class hours* - can be found in course structure
diagrams of apropriate study programmes. This course is related to
the following study programmes:

- Inter-faculty Studies in Bioinformatics and Systems Biology
- Bachelor's degree, first cycle programme, Computer Science
- Bachelor's degree, first cycle programme, Mathematics
- Master's degree, second cycle programme, Computer Science

Additional information (*registration* calendar, class conductors,
*localization and schedules* of classes), might be available in the USOSweb system: