The RL problem

Trading as an RL problem

Mapping trading to RL

Markov decision problems

Unknown transitions and rewards

What to optimize?




Learning Procedure

Update Rule

The formula for computing Q for any state-action pair <s, a>, given an experience tuple <s, a, s‘, r>, is:Q‘[s, a] = (1 - α) · Q[s, a] + α · (r + γ · Q[s‘, argmaxa‘(Q[s‘, a‘])])
Here:
r = R[s, a] is the immediate reward for taking action a in state s,γ ∈ [0, 1] (gamma) is the discount factor used to progressively reduce the value of future rewards,s‘ is the resulting next state,argmaxa‘(Q[s‘, a‘]) is the action that maximizes the Q-value among all possible actions a‘ from s‘, and,α ∈ [0, 1] (alpha) is the learning rate used to vary the weight given to new experiences compared with past Q-values.
Two Finer Points

The Trading Problem: Actions


A reward at each step allows the learning agent get feedback on each individual action it takes (including doing nothing).

SMA: single moving average => different stocks have different basis
=> adj close / SMA is a good normalized factor
Creating the State

Discretizing

Q-Learning Recap

T(s, a, s‘) or rewards R(s, a).maxa Q(s, a)) as well as the best policy in terms of the action that should be taken (argmaxa Q(s, a)).
Dyna-Q Big Picture <= invented by Richard Sutton


Learning T

How to Evaluate T?

Type in your expression using MathQuill - a WYSIWYG math renderer that understands LaTeX.
E.g.:
Tc, type: T_cΣ, type: \SigmaFor entering a fraction, simply type / and MathQuill will automatically format it. Try it out!
Correction: The expression should be:
In the denominator shown in the video, T is missing the subscript c.
Learning R

Dyna Q Recap

The Dyna architecture consists of a combination of:

Sutton and Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998. [web]
Tammer Kamel is the founder and CEO of Quandl - a data platform that makes financial and economic data available through easy-to-use APIs.
Listen to this two-part interview with him.
Note: The interview is audio-only; closed captioning is available (CC button in the player).
原文:https://www.cnblogs.com/ecoflex/p/10977470.html