Intro
这篇blog是我在看过Logan等人的“implementation matters in deep policy gradients: a case study on ppo and trpo“之后的总结。
value function clipping
reward scaling
orthogonal initialization and layer scaling
adam learning rate annealing
reward clipping
observation clipping
hyperbolic tan activations
global gradient clipping
针对PPO的一些Code-level性能优化技巧
原文:https://www.cnblogs.com/dynmi/p/14031724.html