首页 > Web开发 > 详细

Learning Notes: Morvan - Reinforcement Learning, Part 4: Deep Q Network

时间:2017-01-18 07:58:07      阅读:325      评论:0      收藏:0      [点我收藏+]

Deep Q Network

Notes

Deep Q-learning Algorithm

This gives us the final deep Q-learning algorithm with experience replay:

技术分享

There are many more tricks that DeepMind used to actually make it work – like target network, error clipping, reward clipping etc, but these are out of scope for this introduction.

The most amazing part of this algorithm is that it learns anything at all. Just think about it – because our Q-function is initialized randomly, it initially outputs complete garbage. And we are using this garbage (the maximum Q-value of the next state) as targets for the network, only occasionally folding in a tiny reward. That sounds insane, how could it learn anything meaningful at all? The fact is, that it does.

Extension

 

Learning Notes: Morvan - Reinforcement Learning, Part 4: Deep Q Network

原文:http://www.cnblogs.com/casperwin/p/6295404.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!