Unsupervised Learning 2400-DS1UL

The main aim of the course is to make students familiar with research opportunities associated with data mining algorithms (Knowledge Discovery in Databases, KDD) and their usage in business applications. Three blocks of subjects are going to be fulfilled: clustering, dimension reduction and association rule learning.

Each of the blocks will be divided into four stages: i) introduction and construction of basic algorithms, ii) familiarization with accessible commands in R, their comparison and evaluation, iii) work with newest literature sources, iv) group project.

BLOCK 1: Clustering
Data research will be done by clustering. Several methods are going to be introduced: distance-based, k-means, Partitioning Around Medoids (PAM), Clustering Large Applications (CLARA), Clustering Large Applications based on RANdomized Search (CLARANS) or nonparametric clustering, hierarchical clustering, dictionary learning, linkage methods and probabilistic ones. Different ways of identifying the optimal number of clusters (CH index, Silhouette index) and agreement indices will be presented.

BLOCK 2: Dimension reduction
Analysis of main components of the principal component analysis (PCA), multidimensional scaling (classic and metric), as well as actual non-linear dimension reduction methods.

BLOCK 3: Association rule learning
Main algorithms of association rules are going to be introduced (Apriori, Eclat, FP-growth, OPUS). They are applied i.a. in market based analysis and common patterns between purchased goods. Crucial measures of rules and transactions (support, confidence, lift, difference of consifence) will be described.

Models based on real data will be prepared (with cleaning and transforming the input data). Visualization methods of data regarding transactions, rules and clusters (also interactive), as well as simplifying tools for big data by sampling will be done. Examples of used packages: arules, arulesViz, stats, cluster, pdfCluster, clues i inne (see R TaskViews „Cluster” - Cluster Analysis & Finite Mixture Models).

Szacunkowy nakład pracy studenta:
Typ aktywności K (kontaktowe) S (samodzielne)
wykład (zajęcia) 30 0
ćwiczenia (zajęcia) 0 0
egzamin 0 0
konsultacje 0 0
przygotowanie do ćwiczeń 0 0
przygotowanie do wykładów 0 15
przygotowanie do kolokwium 0 0
przygotowanie 3x końcowego projektu 0 30
praca z materiałami dodatkowymi 0 0
Razem 30 45 = 75

Course coordinators

Katarzyna Kopczewska
Jacek Lewkowicz

Type of course

obligatory courses

Learning outcomes

- Student has knowledge about unsupervised learning principles
- Student is familiar with unsupervised learning methodology
- Student is able to analyze data by using unsupervised learning approach
- Student is able to use knowledge about unsupervised learning to conduct his/her own research
- Student gains, processes and analyzes data independently
- Student is capable of working in groups and co-operating with others
- Student is able to formulate his/her point of view and express it in the discussion
- Student expresses his/her research curiosity and openness towards economic phenomenon
K_W01, K_U01, K_U02, K_U03, K_U04, K_U05, KS_01,

Assessment criteria

Evaluation of group projects

Bibliography

Papers provided by lecturers as well as:

Bousquet, O.; von Luxburg, U.; Raetsch, G., eds. (2004). Advanced Lectures on Machine Learning. Springer-Verlag.

Duda, Richard O.; Hart, Peter E.; Stork, David G. (2001). "Unsupervised Learning and Clustering". Pattern classification (2nd ed.). Wiley.

Hastie, Trevor; Tibshirani, Robert (2009). The Elements of Statistical Learning: Data mining,Inference,and Prediction. New York: Springer.