Indice
Foundations of Deep RL - lecture series by Pieter Abbeel 🔗
- Lecture 1 - MDPs, Exact Solution Methods, Max-ent RL
- Lecture 2 - Deep Q-Learning
- Lecture 3 - Policy Gradients and Advantage Estimation
- Lecture 4 - TRPO and PPO
- Lecture 5 - DDPG and SAC
- Lecture 6 - Model-based RL
Open AI Spinning Up 🔗
Miscellaneous
Link utili
- Generalized Advantage Estimate: Maths and Code 🔗
- RL — Trust Region Policy Optimization (TRPO) Explained 🔗
- RL — Trust Region Policy Optimization (TRPO) Part 2 🔗