Big Data Analytics 2400-DS2BDA

1. Introduction to Linux environment
2. Introduction to Big Data concepts
• Hadoop ecosystem
• MapReduce paradigm
3. Data preparation and exploration with use of Apache Hive and Apache Spark
• Differences vs. RDBMs
• Optimization
• Traps
4. Introduction to ML with Apache Spark
• Transfer models build in R or Python to big data world (possibilities and limitations)
5. Interactive analytics
6. Visualization in Big Data
7. Scheduling tools (Apache Airflow)

Course coordinators

Piotr Menclewicz

Type of course

obligatory courses

Learning outcomes

Student will learn how to use Hadoop Ecosystem technologies for data preparation, analysis and how to apply basic machine learning algorithms to big data datasets with use of Apache Hive and Apache Spark.
K_U02, K_U05

Assessment criteria

All students will be obliged to:
• be present at the classes (according to common University of Warsaw rules),
• presentation about examples of usage of methods presented at the course (based on academic articles)
• Big Data project

Bibliography

Readings and up-to-date online resources provided during the laboratory as a preparation for the next one.