Statistics and Exploratory Data Analysis 2400-DS1ST
Statistics and Explanatory Data Analysis topics:
Descriptive statistics
1. Random variables, pdf, cdf etc.
2. Measures of location: mean, mode, median, trimean, mid-mean, trimmed mean, winsorised mean, mid-range, quartiles, deciles, percentiles
3. Measures of dispersion: range, variance, standard deviation, interquartile range, median absolute deviation
4. Higher order moments: skewness, kurtosis
5. Outliers: IQR approach, Z-score, modified Z-score
6. Distribution estimation (normal, Gamma, Beta, lognormal, Pareto etc.)
Graphical analysis of data
1. Histograms,
2. Kernel densities,
3. Scatter plots,
4. Box plots
5. QQ plots, PP plots,
6. Run chart and others.
Statistical inference
1. One sample (t test, z test, binomial test, Chi-square test)
2. Two independent samples (t test, Wilcoxon test, Chi-square test, Fischer exact test)
3. More than two independent samples (ANOVA, Kruskal-Wallis, Chi-square test)
4. Two matched(paired) samples (paired t test, Wilcoxon signed rank test, McNemar test/symmetry test)
5. More than two matched samples (repeated ANOVA/mixed effects model, Cochran’s Q test/Friedman test)
Measures of dependence
1. Pearson, Spearman and Kendall correlation coefficients
2. V-Cramer measures
Power & Sample Size analysis
1. Simple tests,
2. Multiple hypothesis testing.
Type of course
Course coordinators
Learning outcomes
The students will learn how to calculate and interpret descriptive statistics and different statistical charts. They will also learn how to perform main procedures of statistical inference – either parametric and nonparametric statistical tests, and to give a statistically correct interpretation of the results.
K_W01, K_U01, K_U02, K_U03, K_U04, K_U05, KS_01
Assessment criteria
All students will be obliged to:
• be present at the classes (according to common University of Warsaw rules),
• pass written, open book Final Exam.
Bibliography
Peng R. D., Exploratory data analysis with R, 2016.
Magnificio S. S., Summary and analysis of extension education program evaluation in R, 2016.
Wagenmakers E.-J. & Gronau Q. F., A compendium of clean graphs in R, online access.
Delignette-Muller M. & Dutang C., Fitdistrplus: an R Package for fitting distributions. Journal of Statistical Software, 64(4) 2015
Blomberg S. P., Power analysis using R, 2014.
Delorme P., De Micheaux, P. L., Liquet B., and Riou, J. Type-II generalized family-wise error rate formulas with application to sample size determination. Statist. Med. 2016.
Additional information
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: