Reinforcement learning 1000-318bRL
1. Model-free methods
a) Reinforcement Learning formalism: Markov Decision Processes (MDPs) & Dynamic programming (DP)
b) Value methods
* SARSA and TD(1)
* Bias-variance trade-off and TD(lambda)
* Function approximators and corresponding challenges
c) Policy gradient methods
* Vanilla policy gradients
* Generalized Advantage Estimator (GAE)
* Problems with policy gradient methods
d) Actor-critic methods
* Trust Region Policy Optimization (TRPO)
* Proximal Policy Optimization (PPO)
* Soft Actor-Critic (SAC)
2. Model-based methods:
a) Model estimation
b) Planning
* Continuous and discrete control problems
* Monte-Carlo Tree Search
* AlphaZero
3. Exploration
a) Multi-armed bandits model
b) Uncertainty related exploration strategies
4. Research topics
5. Talks by practitioners
Type of course
Prerequisites
Course coordinators
Learning outcomes
Knowledge: the student
* knows the properties of reinforcement learning algorithms, knows scenarios of their application and how to implement the most important ones, especially ones from the class of policy gradient algorithms, the class of value based algorithms and the class of actor-critic algorithms [K_W14].
Abilities: the graduate is able to
* appropriately apply methods to design a dedicated reinforcement learning algorithm or apply existing methods in own research projects. [K_U17]
* implement own algorithms and use existing libraries with reinforcement learning procedures [K_U18]
Social competences: the graduate is ready to
* critically evaluate acquired knowledge and information[K_K01];
* recognize the significance of knowledge in solving cognitive and practical problems and the importance of consulting experts when difficulties arise in finding a self-devised solution [K_K02];
8 think and act in an entrepreneurial way [K_K03].
Assessment criteria
Exam, project.
Bibliography
R. Sutton, G. Barto, Reinforcement Learning: An Introduction
Francois-Lavet, F., Henderson P., Islam R., Bellemare M. G., Pineau J.,, An Introduction to Deep Reinforcement Learning.
Szepesvari, C., Algorithms for Reinforcement Learning
Additional information
Information on level of this course, year of study and semester when the course unit is delivered, types and amount of class hours - can be found in course structure diagrams of apropriate study programmes. This course is related to the following study programmes:
- Bachelor's degree, first cycle programme, Computer Science
- Master's degree, second cycle programme, Computer Science
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: