Lecture "Reinforcement Learning" WS2010/11


winter semester 2010/2011, Friday, 10.15 s.t., room ND 6/99

Content and description

The lecture provides an introduction to reinforcement learning (RL). It covers the basics of RL, including Markov decision processes, dynamic programming, and temporal-difference learning. Additionally, special topics such as policy gradient methods as well as the connection evolutionary computation are presented.


Very basic knowledge in statistics and analysis is required, basic knowledge about neural networks is helpful but not necessary.

Written exam

The second written exam (Nachklausur) will take place in NA 2/99, Mo 10.08.2011 at 9.00-11.00 o'clock.
The results of the exam are finally there. I apologize for the long delay.
Correcting them will take a bit longer than usual because they will be send to France for me to correct.)


The slides of each lecture unit will be available here.
Unit 1: Introduction
Unit 2: The reinforcement learning problem
Unit 3: Dynamic programming
Unit 4: Monte Carlo methods
Unit 5: Temporal difference learning
Unit 6: Eligibility traces
Mini-Unit: Actor-critic methods
Unit 7: Function approximation
Unit 8: Planning and learning
Unit 9: Least-squares temporal difference learning (LSTD)
Unit 10: Policy gradient methods
Unit 11: Evolutionary reinforcement learning
Outlook: Other reinforcement learning areas
This slides and topics will be updated after each lecture.


Exercises due on 29.10.2010
Exercises due on 12.11.2010
Exercises due on 03.12.2010
Exercises due on 14.01.2011
Exercises due on 04.02.2011
The last exercises due on 04.02.2011 will be available at 28.01.2011.


Shark - The machine learning library Shark will be needed for programming exercises.

Additional files: LinearPolicy.h (Aufgabe 10)
SinglePole.h (Aufgabe 10)
cartPoleCMA-ES.cpp (Aufgabe 10)
RLTask.h (Aufgabe 5, 6 und 10)
Maze.h (Aufgabe 5)
onPolicyMonteCarlo.cpp (Aufgabe 5) arrived later, but is available now. To compensate for the delay, the solutions for the programming exercises can be emailed until Friday evening (20.00).
Policy.h (Aufgabe 6)
dangerMaze.cpp (Aufgabe 6)
DangerousMaze.h (Aufgabe 6)
Makefile (for Linux, adapt the path to your local Shark installation!)
Makefile-Mac (for Mac, adapt the path to your local Shark installation and add the library path to the system variable DYLD_LIBRARY_PATH!)

Installation tips for Shark in Ubuntu:

Based on a fresh install of Ubuntu 10.10 the following packages are needed (install from "System->Systemadministration->Synoptic package manager"): - binary install (.deb): g++
- source install: cmake and g++

Additional information and further reading

R. Sutton and A. Barto. Reinforcement Learning: An Introduction, MIT Press, 1998. The first part of the lecture is based on this book.
V. Heidrich-Meisner, M. Lauer, C. Igel, and M. Riedmiller. Reinforcement Learning in a Nutshell. In M. Verleysen, editor, 15th European Symposium on Artificial Neural Networks (ESANN 2007), Belgium: d-side publications, pp. 277-288, 2007. A short overview on reinforcement learning.
J. Boyan. Technical Update: Least-Squares Temporal Difference Learning. Machine Learning 49(23), pp. 233246, 2002
R. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy Gradient Methods for Reinforcement Learning with Function Approximation, Advances in Neural Information Processing Systems 12, pp.~1057--1063, MIT Press, 2000.


Verena Heidrich-Meisner
room: from 01.11.2010 NB 3/26
telefon: 0234-32 27974
email: verena.heidrich-meisner at rub.de