The Holy Grail of RL
?: S ? A ? ? “Policy”
?(?) = max( E[ ?t ?t ?(?,?,t) ] ) ???S and ???A?.
In other words, learn the value function
Q: S ? A? ? Reals
Such that:
?(?) = argmaxa Q(?,?)
Previous slide
Next slide
Back to first slide
View graphic version