The RL problem
Trading as an RL problem
Mapping trading to RL
Markov decision problems
Unknown transitions and rewards
What to optimize?
Learning Procedure
Update Rule
The formula for computing Q
for any state-action pair <s, a>
, given an experience tuple <s, a, s‘, r>
, is:Q‘[s, a] = (1 - α) · Q[s, a] + α · (r + γ · Q[s‘, argmaxa‘(Q[s‘, a‘])])
Here:
r = R[s, a]
is the immediate reward for taking action a
in state s
,γ ∈ [0, 1]
(gamma) is the discount factor used to progressively reduce the value of future rewards,s‘
is the resulting next state,argmaxa‘(Q[s‘, a‘])
is the action that maximizes the Q-value among all possible actions a‘
from s‘
, and,α ∈ [0, 1]
(alpha) is the learning rate used to vary the weight given to new experiences compared with past Q-values.
Two Finer Points
The Trading Problem: Actions
A reward at each step allows the learning agent get feedback on each individual action it takes (including doing nothing).
SMA: single moving average => different stocks have different basis
=> adj close / SMA is a good normalized factor
Creating the State
Discretizing
Q-Learning Recap
T(s, a, s‘)
or rewards R(s, a)
.maxa Q(s, a)
) as well as the best policy in terms of the action that should be taken (argmaxa Q(s, a)
).
Dyna-Q Big Picture <= invented by Richard Sutton
Learning T
How to Evaluate T?
Type in your expression using MathQuill - a WYSIWYG math renderer that understands LaTeX.
E.g.:
Tc
, type: T_c
Σ
, type: \Sigma
For entering a fraction, simply type /
and MathQuill will automatically format it. Try it out!
Correction: The expression should be: In the denominator shown in the video,
T
is missing the subscript c
.
Learning R
Dyna Q Recap
The Dyna architecture consists of a combination of:
Sutton and Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998. [web]
Tammer Kamel is the founder and CEO of Quandl - a data platform that makes financial and economic data available through easy-to-use APIs.
Listen to this two-part interview with him.
Note: The interview is audio-only; closed captioning is available (CC button in the player).
原文:https://www.cnblogs.com/ecoflex/p/10977470.html