<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>
<channel>
	<title>Comments on: Reinforcement Learning blog</title>
	<atom:link href="http://thesilog.sologen.net/?feed=rss2&#038;p=217" rel="self" type="application/rss+xml" />
	<link>http://thesilog.sologen.net/?p=217</link>
	<description>My thoughts as a computing science graduate student</description>
	<pubDate>Fri, 10 Sep 2010 16:52:19 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Amir massoud Farahmand</title>
		<link>http://thesilog.sologen.net/?p=217&#038;cpage=1#comment-61219</link>
		<dc:creator>Amir massoud Farahmand</dc:creator>
		<pubDate>Fri, 04 Jun 2010 07:07:47 +0000</pubDate>
		<guid isPermaLink="false">http://thesilog.sologen.net/?p=217#comment-61219</guid>
		<description>Thanks Michael for your clarification!
I have ignored this blog for a long time.</description>
		<content:encoded><![CDATA[<p>Thanks Michael for your clarification!<br />
I have ignored this blog for a long time.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Littman</title>
		<link>http://thesilog.sologen.net/?p=217&#038;cpage=1#comment-44374</link>
		<dc:creator>Michael Littman</dc:creator>
		<pubDate>Wed, 09 Jul 2008 21:07:50 +0000</pubDate>
		<guid isPermaLink="false">http://thesilog.sologen.net/?p=217#comment-44374</guid>
		<description>Hi,

I wanted to leave a comment to respond to your comment about the "rationality" of minimax-Q, but comments are closed at http://thesilog.sologen.net/?p=76, so I decided to leave my comment here.  :-)

The "non-rationality" of minimax-Q follows from Bowling and Veloso's (idiosyncratic?) definition of rationality.  Specifically, they define it to be converging to best response against a stationary strategy (even a suboptimal one).  Minimax-Q actually ignores the opponent's strategy and assumes a worst-case opponent, so, indeed minimax-Q fails to satisfy their definition.

Your alternative definition is interesting.  It says that a "rational" learning algorithm should adopt a best response to any Nash-equilibrium opponent.  In zero-sum games (which is where minimax-Q makes the most sense), this definition is equivalent to saying that the learner should adopt a minimax policy.  Of course, that's exactly what minimax-Q does, so it passes your rationality test in this case.

-Michael</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>I wanted to leave a comment to respond to your comment about the &#8220;rationality&#8221; of minimax-Q, but comments are closed at <a href="http://thesilog.sologen.net/?p=76" rel="nofollow">http://thesilog.sologen.net/?p=76</a>, so I decided to leave my comment here.  <img src='http://thesilog.sologen.net/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>The &#8220;non-rationality&#8221; of minimax-Q follows from Bowling and Veloso&#8217;s (idiosyncratic?) definition of rationality.  Specifically, they define it to be converging to best response against a stationary strategy (even a suboptimal one).  Minimax-Q actually ignores the opponent&#8217;s strategy and assumes a worst-case opponent, so, indeed minimax-Q fails to satisfy their definition.</p>
<p>Your alternative definition is interesting.  It says that a &#8220;rational&#8221; learning algorithm should adopt a best response to any Nash-equilibrium opponent.  In zero-sum games (which is where minimax-Q makes the most sense), this definition is equivalent to saying that the learner should adopt a minimax policy.  Of course, that&#8217;s exactly what minimax-Q does, so it passes your rationality test in this case.</p>
<p>-Michael</p>
]]></content:encoded>
	</item>
</channel>
</rss>
