首页 > 其他 > 详细

To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning

时间:2017-09-30 20:27:59      阅读:283      评论:0      收藏:0      [点我收藏+]

 

 

https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a-html/node26.html

【平均-打折奖励】

Schwartz [106] examined the problem of adapting Q-learning to an average-reward framework. Although his R-learning algorithm seems to exhibit convergence problems for some MDPs, several researchers have found the average-reward criterion closer to the true problem they wish to solve than a discounted criterion and therefore prefer R-learning to Q-learning [69].

To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning

原文:http://www.cnblogs.com/yuanjiangw/p/7615875.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!