John Shulman Deep RL I found these talks to be super straightforward and helpful. A breath of fresh air.
Part 1 A brief overview of applications, including robotics, inventory management, resource allocation (queuing), and routing problems (sequential decision making problem).
Differentiating between policy optimization and dynamic programming. In particular, policy optimization including DFO/Evolutionary algorithms (derivative-free) and Policy Gradients (using gradients, improves with more parameters. Dynamic programming requires discrete finite states, and so must be approximated (for instance, approximating function with neural nets).