## Deep RL 12 Reinforcement Learning and Control as Probabilistic Inference

Please checkout Professor Sergey Levine’s excellent tutorial: Levine 18’

Please checkout Professor Sergey Levine’s excellent tutorial: Levine 18’

In this section, we study how to learn policies utilize the known (learned) dynamics. Why do we need to learn a policy? What’s wrong with MPC in the previous...

Previous lecture is mainly about how to plan actions to take when the dynamics is known. In this lecture, we study how to learn the dynamics. We will also in...

Let’s recall the reinforcement learning goal — we want to maximaze the expected reward (or expected discounted reward in the infinite horizon case)

At the end of previous lecture, we talked about the issues with Q-learning, one of them is that it’s not directly optimizing the expected return and it can t...

In this section we extend the online Q-iteration algorithm in the previous lecture by identifying the potential issues and introducing solutions. The improve...

Previously we studied policy gradient methods, which proposes a parametric policy and optimize it to achieve better expected reward. Then we introduce actor-...

Actor-critic algorithms build on the policy gradient framwork that we discussed in the previous lecture, but also augment it with learning value functions an...

In this lecture, we will study the classic policy gradient methods, which includes the REINFORCE algorithm, off-policy policy gradient method, and several co...

This is an introduction to reinforcement learning, including core concepts, the general goal, the general framework, introduction and comparison of different...

The framework of imitation learning tackles reinforcement learning as a supervised learning problem.

This is my notes for CS285 Deep Reinforcement Learning at UC Berkeley.

Please checkout Professor Sergey Levine’s excellent tutorial: Levine 18’

In this section, we study how to learn policies utilize the known (learned) dynamics. Why do we need to learn a policy? What’s wrong with MPC in the previous...

Previous lecture is mainly about how to plan actions to take when the dynamics is known. In this lecture, we study how to learn the dynamics. We will also in...

Let’s recall the reinforcement learning goal — we want to maximaze the expected reward (or expected discounted reward in the infinite horizon case)

At the end of previous lecture, we talked about the issues with Q-learning, one of them is that it’s not directly optimizing the expected return and it can t...

In this section we extend the online Q-iteration algorithm in the previous lecture by identifying the potential issues and introducing solutions. The improve...

Previously we studied policy gradient methods, which proposes a parametric policy and optimize it to achieve better expected reward. Then we introduce actor-...

Actor-critic algorithms build on the policy gradient framwork that we discussed in the previous lecture, but also augment it with learning value functions an...

In this lecture, we will study the classic policy gradient methods, which includes the REINFORCE algorithm, off-policy policy gradient method, and several co...

This is an introduction to reinforcement learning, including core concepts, the general goal, the general framework, introduction and comparison of different...

The framework of imitation learning tackles reinforcement learning as a supervised learning problem.

This is my notes for CS285 Deep Reinforcement Learning at UC Berkeley.