By: Amir massoud Farahmand

Amir massoud Farahmand — Fri, 04 Jun 2010 07:07:47 +0000

Thanks Michael for your clarification!
I have ignored this blog for a long time.

By: Michael Littman

Michael Littman — Wed, 09 Jul 2008 21:07:50 +0000

Hi,

I wanted to leave a comment to respond to your comment about the “rationality” of minimax-Q, but comments are closed at https://thesilog.sologen.net/?p=76, so I decided to leave my comment here. 🙂

The “non-rationality” of minimax-Q follows from Bowling and Veloso’s (idiosyncratic?) definition of rationality. Specifically, they define it to be converging to best response against a stationary strategy (even a suboptimal one). Minimax-Q actually ignores the opponent’s strategy and assumes a worst-case opponent, so, indeed minimax-Q fails to satisfy their definition.

Your alternative definition is interesting. It says that a “rational” learning algorithm should adopt a best response to any Nash-equilibrium opponent. In zero-sum games (which is where minimax-Q makes the most sense), this definition is equivalent to saying that the learner should adopt a minimax policy. Of course, that’s exactly what minimax-Q does, so it passes your rationality test in this case.

-Michael

Comments on: Reinforcement Learning blog

By: Amir massoud Farahmand

By: Michael Littman