A Question on Stochastic vs Deterministic Policies

Is it possible that a stochastic strategy be better than a greedy one in the sense of obtained reward and after learning and convergence to a fixed policy? For instance, is there any situation that something like Boltzman action selection performs better than Greedy one? It is not the case in MDP, but what about POMDP?! I guess not! I am looking for a counterpart of game theoryÃ¢Â€Â™s Mixed Strategy in other fields. For some multi-player games, there exist a mixed strategy Nash equilibrium but there is no such a point in pure strategy case. Have you seen something similar in other fields and more specifically in the cases that the performance is the comparison criterion. I wonder what the benefit of acting randomly can be.

2 Replies to “A Question on Stochastic vs Deterministic Policies”

Interesting questions you got here.

Boltzman and Gibbs distributions are used for exploration, without an exploration policy (i.e epsilon-greedy or softmax) there’s no convergence waranties.

Hope this is useful 😉

Thanks very much! (: Yes! Of course, stochastic policy is needed for ensuring convergence in those problems that there is some kind of convergent phenomena. However, now I am curious about the performance in the sense of expected received reward: Is there any problem that a stochastic policy gains more reward comparing with a deterministic one?! It is not the case for MDP, but what about POMDP or Markov Games?!

2 Replies to “A Question on Stochastic vs Deterministic Policies”

Leave a Reply