Posts by Year

2021

Deep RL 11 Model-Based Policy Learning

9 minute read

In this section, we study how to learn policies utilize the known (learned) dynamics. Why do we need to learn a policy? What’s wrong with MPC in the previous...

Deep RL 9 Model-based Planning

12 minute read

Let’s recall the reinforcement learning goal — we want to maximaze the expected reward (or expected discounted reward in the infinite horizon case)

Deep RL 8 Advanced Policy Gradient

9 minute read

At the end of previous lecture, we talked about the issues with Q-learning, one of them is that it’s not directly optimizing the expected return and it can t...

Deep RL 7 Q-learning

10 minute read

In this section we extend the online Q-iteration algorithm in the previous lecture by identifying the potential issues and introducing solutions. The improve...

Deep RL 6 Value Function Methods

9 minute read

Previously we studied policy gradient methods, which proposes a parametric policy and optimize it to achieve better expected reward. Then we introduce actor-...

Deep RL 5 Actor Critic

11 minute read

Actor-critic algorithms build on the policy gradient framwork that we discussed in the previous lecture, but also augment it with learning value functions an...

Deep RL 4 Policy Gradient

11 minute read

In this lecture, we will study the classic policy gradient methods, which includes the REINFORCE algorithm, off-policy policy gradient method, and several co...

Deep RL 3 Intro to RL

13 minute read

This is an introduction to reinforcement learning, including core concepts, the general goal, the general framework, introduction and comparison of different...

Deep RL 2 Imitation Learning

7 minute read

The framework of imitation learning tackles reinforcement learning as a supervised learning problem.

Deep RL 1 Introduction

less than 1 minute read

This is my notes for CS285 Deep Reinforcement Learning at UC Berkeley.

Back to Top ↑