(in Polish) Data Summarization: Sketches, Coresets, and Geometric Summaries 1000-2M26DSS
i. The Philosophy of Summarization Trade-offs: Size vs. Error vs. Time. The concept of (1 ± epsilon)-approximations (1 lecture).
ii. Nets & Coverings Metric spaces, epsilon-nets, and epsilon-samples. The VC-dimension and its role in bounding sample size (2 lectures).
iii. Frequency & Cardinality Bloom Filters, Count-Min Sketch, and HyperLogLog (1 lecture).
iv. Dimensionality Reduction Johnson-Lindenstrauss Lemma, Random projections, LSH (1 lecture).
v. Randomized NLA Matrix sketching, fast approximate SVD, Nyström method, and matrix leverage scores (2 lectures).
vi. Geometry & Convexity Minimum Enclosing Balls, convex hulls, and Logistic Regression coresets (1 lecture).
vii. Clustering Basics k-means/k-median, the Sensitivity Sampling framework (2 lectures).
viii. Distributed & Dynamic Merge-and-Reduce framework, composable coresets, and sliding windows (1 lecture).
ix. Cost Stability Breaking worst-case Ω(k^2 /epsilon^2) bounds via dataset β-stability (Bansal et al., 2024) (1 lecture).
x. Tight VC-Dim Bounds Layered Group Sampling, overcoming the O(k^2) bottleneck for metrics like Planar Graphs (Cohen-Addad et al., 2025) (1 lecture).
xi. Fairness & Robustness Demographic parity in coresets, handling extreme outliers (1 lecture).
Course coordinators
Prerequisites (description)
Learning outcomes
Knowledge:
Knowledge of basic and some advanced techniques in designing and understanding data summaries such as coresets, sketches, etc.
A reasonably wide exposure to classic and modern problems.
Understanding the importance of mathematical proofs in providing guarantees on performance.
Skills:
Ability to apply mathematical concepts to problems in Data Science and more specifically data summarization.
Ability to select appropriate tools from Discrete Mathematics, Linear Algebra and Probability theory, or to tweak existing tools if required, to solve data reduction problems.
Competence:
Awareness of own limitations and the need for further study.
Development of the ability to read scientific articles, if necessary in a foreign language, to expand their knowledge.
Ability to precisely formulate questions to deepen their own understanding or to find gaps in reasoning.
Assessment criteria
Homeworks and paper presentation (approximately 50 percent each).
PhD students have the option to either (a) present a paper from a list of selected recent papers, OR (b) solve a challenging (starred) homework problem, preparing for research work.
Additional information
Information on level of this course, year of study and semester when the course unit is delivered, types and amount of class hours - can be found in course structure diagrams of apropriate study programmes. This course is related to the following study programmes: