I have started reading Vladimir Vapnik’s Statistical Learning Theory book. It is a fascinating book which I enjoy much whenever I read it. Parallel to it, I read Vidyasagar’s Learning and Generalization (2nd edition). It is also about statistical learning theory. However, its trend is somehow different from Vapnik’s. It is early to judge and compare these books, but from what I have read up to now, I can say that Vapnik’s book is much easier and insightful comparing the other. On the other hand, Vidyasagar’s book is more mathematically inclined and I cannot understand many parts of it easily (so I escape most proofs and …). A big problem (for me) in Vidyasagar’s books is that it does not try to explain the underlying phenomena intuitively.
Would you mind write your idea and suggestions about these two books? Moreover, I want to become aware of trends in the theoretical ML. Which book do you suggest? (Kearns?! I haven’t it and I don’t know from where I can find it.)
A Question on Stochastic vs Deterministic Policies
Is it possible that a stochastic strategy be better than a greedy one in the sense of obtained reward and after learning and convergence to a fixed policy? For instance, is there any situation that something like Boltzman action selection performs better than Greedy one? It is not the case in MDP, but what about POMDP?! I guess not! I am looking for a counterpart of game theory’s Mixed Strategy in other fields. For some multi-player games, there exist a mixed strategy Nash equilibrium but there is no such a point in pure strategy case. Have you seen something similar in other fields and more specifically in the cases that the performance is the comparison criterion. I wonder what the benefit of acting randomly can be.
Michael Bowling and Manuela Veloso, Multiagent Learning using a Variable Learning Rate
Michael Bowling and Manuela Veloso, “Multiagent Learning using a Variable Learning Rate,†Artificial Intelligence, 2002.
This is the first paper I write about in my weblog in my series of multi-agent reinforcement learning papers. I had seen this paper about a year ago, but I did not read it as I thought that changing learning rate is not a real solution to the problem and I supposed that this paper is a kind of ad-hoc method. In that time, I was not that concerned about learning in multi-agent learning from the game theoretic perspectives, so I was not that aware of this paper. Now, it is apparent that I was not that correct. The results of this paper is interesting, Michael Bowling and Manuela Veloso tried to use as much mathematics as possible, and more importantly, his approach to the problem is really insightful.
I do not discuss the paper much, but I tries to write about my concerns about it. Before going on, it is mandatory to mention that I am new in this MAS learning and game theory; so concepts such as Nash equilibrium and its importance like that are not much clear for me.
Continue reading “Michael Bowling and Manuela Veloso, Multiagent Learning using a Variable Learning Rate”
The Effect of Reinforcement Signal Error in Reinforcement Learning
TITLE: The Effect of Reinforcement Signal Error in Reinforcement Learning (Translated)
ABSTRACT: Designing reinforcement signal is fundamental problem in reinforcement learning. Intelligent agent’s designer can guide the learner agent to its desired behavior by selecting an appropriate reinforcement signal. However, there is no general methodology for designing that signal and the designed signal is different from the unknown ideal one in many cases. In this paper, this difference is considered as a bounded-norm error in reinforcement signal and its effects on the value function and the policy of the agent is calculated as some upper bounds. In the end, the mathematical results are tested in an experiment. (Translated)
This is my The Computer Society of Iran Computer Conference (CSICC) 2005 paper. As it is written in Persian, I translate its abstract and put it in this weblog. It is probable that I revise it and send it to an international conference or even journal.
Reinforcement Learning FAQ
Addiction and Learning
A University of Minnesota researcher developed a computational model of addiction which can be used to make predictions about human behavior, animal behavior, and neurophysiology.
…
Natural increases in dopamine occur after unexpected natural rewards; however, with learning these increases shift from the time of reward delivery to cueing stimuli. In TDRL, once the value function predicts the reward, learning stops. Cocaine and other addictive drugs, however, produce a momentary increase in dopamine through neuropharmacological mechanisms, thereby continuing to drive learning, forcing the brain to over-select choices which lead to getting drugs … (read more)
Reinforcement Learning Repository
Reinforcement Learning Repository is a good place for Reinforcement Learning that you may find categorized list of papers, e.g. application in robotics, hierarchical RL, use of function approximation, and … . You may find a lot of good papers about RL. It is in university of Massachusetts, Amherst. (I want to download some papers from there: remember to do so!)
Evolution and Learning
I have conducted some interesting researches combining evolution and learning in a new way. I may discuss it later as it is in its early development stages.
Considering Controlling Probabilities in Behavior Learning
Yesterday, I got stucked in a problem that was imposed by Dr.Nili. The problem was too simple: How do I update new values in my subsumption architecture learning. What I did seem reasonable but was not compatible with my theory. Actually, I did update each layer whenever it was controlling or it outputs NoAction. I did not consider “controlling probabilities†of each layer and it was inconsistence with my theories in which those probabilities were very important. I changed the code and considered that probabilities too: if a state-action in a behavior does not receive a reinforcement signal for a while, it will decreases toward zero. It is natural as its controlling probability is decreasing. Anyway, I implemented this code and it has worked. It is not very fascinating as the previous code had worked too – maybe due to its intrinsic robustness. The interesting fact is that each behavior predicts its structural value too, i.e. the sum of the value of each behavior is equal to its behavior value in the structure. It is the first time that I get this equality.
What is remained to do is to implement these algorithm to object lifting problem (I have done these with that abstract one) and check the other method of updating which is standard one (not this averaging).
AI Links
It is somehow disappointing that there are a lot of useful stuff in the Internet that you cannot even read its title – not mentioning their readings. Unfortunately, there is no simple way to read all of them. Emmm … let’s link to this site:
AI Links that is maintained by Mark Humphrys – whom has recently gained my attention due his works on action selection and specially that interesting W-Learning idea.
Let’s bring its title in order to be easier for you (and specially myself) to remember what you (I) can find in it.