(in Polish) Statystyka i eksploracyjna analiza danych 2400-M1ABSEA
1. Organizational matters; Introduction to Statistics
● Overview of the course syllabus, assessment rules, and recommended literature.
● The importance of statistics and data analysis in business.
● Types of data structures.
● Concepts of population and sample, random variable, probability distribution, probability density function, and cumulative distribution function.
● Selected statistical distributions: normal, Student's t, Snedecor's F, Poisson, Gamma, and Beta distributions.
2. Introduction to the Python programming language
● Loading data from various sources.
● Selecting observations and variables, filtering, sorting.
● Overview of core Python libraries: numpy, pandas, statsmodels.
● Data cleaning, detecting and removing duplicates, handling and imputing missing values.
● Data structure transformations, merging datasets.
3. Measures of central tendency, dispersion, and distribution shape
● Mean, median, mode, quartiles, deciles, quantiles, percentiles.
Range, interquartile range, variance, standard deviation, coefficient of variation.
Skewness and kurtosis.
4. Measures of dependence and association
● Pearson, Spearman, and Kendall correlation coefficients, Cramér’s V.
5. Data visualization
● Outliers: boxplot, outlier detection, IQR, Z-score, modified Z-score.
● Variable transformations: binning of continuous variables, one-hot encoding, normalization, standardization, winsorization.
● Purpose and importance of data visualization.
● Good and bad practices in visualization.
● Types of charts and their suitability for different data types (continuous, categorical, time series).
● Overview of Python visualization libraries: matplotlib, seaborn, plotly, and pandas plotting tools.
6. Hypothesis testing
● Hypotheses, test statistics, critical values, p-value, significance level, Type I and Type II errors.
● Test power, relationship between power and sample size.
7. Selected statistical tests
● Tests for a single group: normality tests, t-test, Z-test, binomial test, chi-square test for a single group.
● Tests for two or more independent groups: t-test, Wilcoxon test, chi-square test, Fisher’s exact test, ANOVA, Kruskal-Wallis test, chi-square test.
● Tests for two or more dependent groups: paired t-test, Wilcoxon signed-rank test, McNemar’s test, repeated measures ANOVA, Cochran’s Q test.
Type of course
Course coordinators
Learning outcomes
Upon completing the course, the student:
● understands fundamental statistical concepts,
● is able to load, clean, and prepare data for analysis,
● can draw conclusions based on measures of central tendency, dispersion, distribution shape, and measures of dependence and association,
● is capable of visualizing data clearly and effectively, and of interpreting insights from such visualizations,
● is able to conduct and interpret statistical tests
Assessment criteria
The final grade is based on the following components: a written exam (50%), a student-prepared project (30%), and a project presentation (20%). To pass the course, students must obtain at least half of the possible points in each of these components.
Bibliography
Downey, A. B. (2014). Think stats: Exploratory data analysis in Python (Version 2.0.27).
Knaflic, C. N. (2015). Storytelling with data: A data visualization guide for business professionals. John Wiley & Sons, Inc.
Mangiafico, S. S. (2015). An R companion for the Handbook of Biological Statistics (Version 1.3.9, revised 2023) [Online handbook]. Rutgers Cooperative Extension. Retrieved from https://rcompanion.org/handbook.
Molin, S. (2021). Hands-on data analysis with pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python (2nd ed.). Packt Publishing.
Pant, D., & Mukhiya, S. K. (2025). Statistics for Data Scientists and Analysts: Statistical approach to data‑driven decision making using Python. BPB Publications.
Additional information
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: