Big Data Analytics 2400-DS2BDA
1. Introduction to Linux environment
2. Introduction to Big Data concepts
• Hadoop ecosystem
• MapReduce paradigm
3. Data preparation and exploration with use of Apache Hive and Apache Spark
• Differences vs. RDBMs
• Optimization
• Traps
4. Introduction to ML with Apache Spark
• Transfer models build in R or Python to big data world (possibilities and limitations)
5. Interactive analytics
6. Visualization in Big Data
7. Scheduling tools (Apache Airflow)
Type of course
Course coordinators
Term 2024Z: | Term 2023Z: |
Learning outcomes
Student will learn how to use Hadoop Ecosystem technologies for data preparation, analysis and how to apply basic machine learning algorithms to big data datasets with use of Apache Hive and Apache Spark.
K_U02, K_U05
Assessment criteria
All students will be obliged to:
• be present at the classes (according to common University of Warsaw rules),
• presentation about examples of usage of methods presented at the course (based on academic articles)
• Big Data project
Bibliography
Readings and up-to-date online resources provided during the laboratory as a preparation for the next one.
Additional information
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: