Are There Convergence Proofs?
To give you an idea:
- TD(?) requires that the learned value function Q() will converge to the optimal value function Q*() with probability one IF each state is visited an infinite number of times on an infinite run. [Watkins 89; Tsitsiklis 94; Jaakkola, Jordan, & Singh 94].