Satinder Singh started a new blog named Reinforcement Learning blog. Now, I can see that Michael Littman is one of the authors of the blog – though he hasn’t published any post yet. This is a good news for RL. Good luck to them!
Archives
- June 2008 (2)
- May 2008 (2)
- April 2008 (5)
- March 2008 (3)
- February 2008 (2)
- December 2007 (1)
- November 2007 (1)
- October 2007 (1)
- June 2007 (3)
- January 2007 (3)
- December 2006 (1)
- October 2006 (4)
- September 2006 (2)
- August 2006 (2)
- July 2006 (6)
- June 2006 (2)
- May 2006 (6)
- April 2006 (5)
- March 2006 (1)
- February 2006 (4)
- January 2006 (4)
- December 2005 (10)
- November 2005 (6)
- October 2005 (4)
- September 2005 (6)
- August 2005 (6)
- July 2005 (5)
- June 2005 (6)
- May 2005 (6)
- April 2005 (5)
- March 2005 (9)
- February 2005 (10)
- January 2005 (5)
- December 2004 (13)
- November 2004 (5)
- October 2004 (5)
- September 2004 (6)
- August 2004 (10)
- July 2004 (6)
- June 2004 (8)
- May 2004 (5)
- April 2004 (1)
Hi,
I wanted to leave a comment to respond to your comment about the “rationality” of minimax-Q, but comments are closed at http://thesilog.sologen.net/?p=76, so I decided to leave my comment here.
The “non-rationality” of minimax-Q follows from Bowling and Veloso’s (idiosyncratic?) definition of rationality. Specifically, they define it to be converging to best response against a stationary strategy (even a suboptimal one). Minimax-Q actually ignores the opponent’s strategy and assumes a worst-case opponent, so, indeed minimax-Q fails to satisfy their definition.
Your alternative definition is interesting. It says that a “rational” learning algorithm should adopt a best response to any Nash-equilibrium opponent. In zero-sum games (which is where minimax-Q makes the most sense), this definition is equivalent to saying that the learner should adopt a minimax policy. Of course, that’s exactly what minimax-Q does, so it passes your rationality test in this case.
-Michael
Thanks Michael for your clarification!
I have ignored this blog for a long time.