Reinforcement learning 1000-318bRL

1. Model-free methods
a) Reinforcement Learning formalism: Markov Decision Processes (MDPs) & Dynamic programming (DP)
b) Value methods
* SARSA and TD(1)
* Bias-variance trade-off and TD(lambda)
* Function approximators and corresponding challenges
c) Policy gradient methods
* Vanilla policy gradients
* Generalized Advantage Estimator (GAE)
* Problems with policy gradient methods
d) Actor-critic methods
* Trust Region Policy Optimization (TRPO)
* Proximal Policy Optimization (PPO)
* Soft Actor-Critic (SAC)
2. Model-based methods:
a) Model estimation
b) Planning
* Continuous and discrete control problems
* Monte-Carlo Tree Search
* AlphaZero
3. Exploration
a) Multi-armed bandits model
b) Uncertainty related exploration strategies
4. Research topics
5. Talks by practitioners

Type of course

elective monographs

Prerequisites

Deep neural networks

Course coordinators

Term 2024L:

Łukasz Kuciński
Piotr Miłoś

Term 2025L:

Piotr Miłoś

Learning outcomes

Knowledge: the student

* knows the properties of reinforcement learning algorithms, knows scenarios of their application and how to implement the most important ones, especially ones from the class of policy gradient algorithms, the class of value based algorithms and the class of actor-critic algorithms [K_W14].

Abilities: the graduate is able to

* appropriately apply methods to design a dedicated reinforcement learning algorithm or apply existing methods in own research projects. [K_U17]
* implement own algorithms and use existing libraries with reinforcement learning procedures [K_U18]

Social competences: the graduate is ready to

* critically evaluate acquired knowledge and information[K_K01];
* recognize the significance of knowledge in solving cognitive and practical problems and the importance of consulting experts when difficulties arise in finding a self-devised solution [K_K02];
8 think and act in an entrepreneurial way [K_K03].

Assessment criteria

Exam, project.

Bibliography

R. Sutton, G. Barto, Reinforcement Learning: An Introduction
Francois-Lavet, F., Henderson P., Islam R., Bellemare M. G., Pineau J.,, An Introduction to Deep Reinforcement Learning.
Szepesvari, C., Algorithms for Reinforcement Learning

Additional information

Information on level of this course, year of study and semester when the course unit is delivered, types and amount of class hours - can be found in course structure diagrams of apropriate study programmes. This course is related to the following study programmes:

Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system:

Description of 1000-318bRL in USOSweb