(in Polish) Data Analysis in R 2500-EN-F-233
In psychology as in other fields, we see how technological advancements
provide researchers with a growing quantity of collected data.
Researchers are faced with the necessity of adding to the theoretical
knowledge about their subject of study, the knowledge on how to deal
and extract information from this increasingly available amount of data.
Oftentimes though, we witness spectacular scientific advances, which are
made possible by creatively connecting data with the scientific questions
we are after. Simple, standard statistical knowledge and methods are
often not enough anymore and the need for a new competence for the
empirically minded researcher is emerging. This competence is taking the
form of a new discipline in the quantitative sciences. called Data Science.
Data science involves theoretically informed data management decisions
and requires robust and customizable tools to perform these activities.
In this context, R is emerging as THE standard statistical software for the
next generations of analytically minded researchers. It is a flexible and
powerful programming language and environment focused on data
analysis. Although the core set of functions has more functionalities than
you will ever want to use, R is an open source and freely available
platform, which means everyone can contribute to it writing specialized
packages that are made public to everyone. For that reason, it is also the
most comprehensive statistical software and many innovative statistical
methods are already available for your specialized needs. But not only
that. R has very advanced and impressive graphical capacities which can
produce publication quality data visualizations with just a few lines of
code.
The course will have an applied, hands-on approach and we will lead
students from implementing their first simple operations on the data, to
creating their own set of scripts and functions using the R language. The
course is meant to be the first in a series of lectures specifically centered
on R. Its focus will be on data management and programming, the basis of
data science. For that reason both statistical modeling and data
visualization won’t have much space and only a few basic statistical
analyses and plotting functions will be covered. Advanced, specialized
courses on these aspects of data science will be offered as standalone
classes.
Learning outcomes
Students will be able to perform many basic but important operations
over data within the R statistical environment: importing the data,
understanding the basic data types, visualizing and summarizing the
data.
Students will be trained to go through all the steps needed to
organize, restructure and clean data for successive statistical analysis.
Students will learn how to write simple programs (scripts) in R in
order to automatize recursive problems in data cleaning.
Assessment criteria
Most of the classes will start with a short (3-4 questions) quiz concerning
the material presented in the previous class, and short polls to gauge
students’ confidence and understanding of the current material will be
administered and used to additionally tune the presentation of materials.
Anyway, these quizzes won’t contribute to the final grade.
Home assignments will contribute to the evaluation and progress made.
There will be 5 home assignments during the course (approximately one
every two/three classes; 30 points total). A final exam is envisioned
during which students will solve one or two practical problems using most
of the concepts treated in class in R (70 points total). For these reasons
attendance is deemed essential – students are expected to attend ALL
classes, be on time and prepared for discussion and activities.
In general Home assignments will contribute to 30% of the final grade,
and Final Exam for the remaining 70%.
Grades will be assigned according
to the following scale:
5 – 90-100% – outstanding performance
4+ – 79-89
4 – 73-78% – good performance
3+ – 67-72
3 – 60-66% – minimum passing performance
2 – 59% or less – performance not suitable for passing
Attendance rules
Attendance is a very important factor in order to pass the class. Up to two
unexcused missed classes are allowed. Additional absences should be
documented (e.g. sick leave). In case of exceptional and motivated
situations I urge to contact me personally to evaluate if additional
assignments can amend for the missed periods
Additional information
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: