首页 > 其他 > 详细

CS294-112 深度强化学习秋季学期（伯克利）NO.6 Value functions introduction NO.7 Advanced Q learning

时间：2018-05-26 21:10:52 阅读：309 评论：0 收藏：0 [点我收藏+]

技术分享图片

--------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------

技术分享图片

技术分享图片

技术分享图片

understand that correlated samples cause problem. and how paralled solve the problem

another solution is replay buffers, fully ultilizing the advantage of off policy in Q-learning.

技术分享图片

技术分享图片

技术分享图片

技术分享图片

there‘s still a problem: Q learning is not gradient descent

技术分享图片

技术分享图片

divide Q function into two parts: the target net and the evolving net.

sacrifice speed to get the convergence.

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

overestimation of Natural DQN

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

get trouble in left and right dilemma of avoiding bumping on a tree

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

技术分享图片

CS294-112 深度强化学习秋季学期（伯克利）NO.6 Value functions introduction NO.7 Advanced Q learning

原文：https://www.cnblogs.com/ecoflex/p/9094123.html

踩

(0)

赞

(0)

举报

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)

最新文章

更多>

教程昨日排行

更多>

友情链接

汇智网 PHP教程插件网

关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com

© 2014 bubuko.com 版权所有

打开技术之扣，分享程序人生！