-I am going to submit my paper to CDC-ECC 2005 conference. Hmmm … I want to blog it online to see what happens!
-I have just finished struggling with this idiot MS Word XP that changes the appearance of my papers every time I load the document. Let’s go to the site … I have not bookmarked the page, so I must use Google … yeap!! it is there! (12:02PM)
-Now, look for the page that I must submit my paper. Today’s morning, I got my PaperPlazza’s PIN code. So, I do not need to get it again. My Internet connection is a bit slow and I must wait a little so that my weblog become updated (12:04PM).
-I found it and I am going to PaperPlaza right now!
According to the site, the conference is this: 44th IEEE Conference on Decision and Control and European Control Conference ECC 2005 (CDC-ECC’05), December 12-15, 2005, Seville, Spain Well, I may not attend it by myself, but my co-author would probably do so. Click! (12:08PM – I am not that slow … blogging while reading instructions while … is not that easy! Try it!)
-My paper is regular paper, so go to that part …
-It says that I can test my PDF before going on! It is a good idea. Let’s test it! Hope it works (PLZ!!!) (12:13PM)
-oops … it says that it cannot detect my paper size (A4 or Letter). let’s see what is the problem … 🙁
-I used the CDC-ECC 02 template. The margin and … was the same as the standard A4 paper. emmm …
-I used PDFFactory to generate my files and it causes error. CutePDF does not have this problem, but the output is not that cute. (12:39PM)
-Well … I have uploaded it using CutePDF generated PDF. (1:09PM) So, it seems that is finished now!
This may be the first online blogging of a real paper submission! wow!
Red Queen Effect and Multi-agent Learning
Suppose that there are two populations A and B in the environment. Each population tries to evolve in order to increase its fitness which is coupled to the behavior of the other population, e.g. the performance of hunter is highly dependent on the performance of the prey. However, when A increases its fitness, B evolves in order to become fitter and then the fitness of A is not the same as before as the fitness landscape of it is highly dependent on the others’ policy. This (or something similar to this) is called Red Queen effect in co-evolutionary systems. There are many arguments about its definition and even the usefulness to consider it which you may read here.
I am not expert in co-evolution and I am not aware of the existence of the analyses regarding the dynamics of the co-evolutionary mechanisms (of course, there must be!); however, when I thought about Red Queen effect, I found this idea interesting:
We may analyze the co-evolutionary mechanism in the game theoretic framework. Red Queen effect is like not becoming stable to the Nash equilibrium in the game theory. As we do not know the model of the world (Payoff matrices and stochastic game transition probabilities) in a co-evolutionary setting, we may take a look at similar work in multi-agent reinforcement learning literature and get some inspiration. If we define rational and convergent properties as Bowling defined, we may say that the common implementation of co-evolutionary mechanism that leads to Red Queen effect is like using simple Q-learning (rational/non-convergent) in multi-agent learning. Not let we use something like Minimax-Q or Variable Learning rate Q-learning for co-evolutionary process in order to suppress the Red Queen effect.
Moreover, looking from another perspective (again game theoretic): An agent with Pure strategy may not have a Nash equilibrium but if it follows a Mixed strategy, it has. An individual in a population is certainly doing a pure strategy in most cases. What about evolving a set of strategies choosing randomly between them (evolving PDF too) to imitate evolving to a mixed strategy?!
Hmmm … don’t kill me if it is far from reality! I use my weblog exactly for this kind of conversations! Any idea about them or citing similar ideas? (I searched a little and found that R. Paul Wiegand have done some researches in this area. He used game theory to analyze the problem. I must see what he did.).
(:
Yeap!!! The first!
A Question on Stochastic vs Deterministic Policies
Is it possible that a stochastic strategy be better than a greedy one in the sense of obtained reward and after learning and convergence to a fixed policy? For instance, is there any situation that something like Boltzman action selection performs better than Greedy one? It is not the case in MDP, but what about POMDP?! I guess not! I am looking for a counterpart of game theory’s Mixed Strategy in other fields. For some multi-player games, there exist a mixed strategy Nash equilibrium but there is no such a point in pure strategy case. Have you seen something similar in other fields and more specifically in the cases that the performance is the comparison criterion. I wonder what the benefit of acting randomly can be.
Two other presentations at MVIP 2005
Today and Yesterday, 3rd Iranian conference on Machine Vision and Image Processing was held at our university: University of Tehran – Department of Electrical and Computer Engineering. I have not submitted any paper to it; however, I had two presentations!!!
Like the last week, I presented Hossein Mobahi’s papers entitled “A Comparative Study on Geometric and Holistic Representations for Facial Expression Recognition†and “Vision Based Fruit Inspection Using Independent Component Analysisâ€Â. I presented both of them well, but some people asked me a few strange questions about the imaging and camera conditions though I had told them before that I did not do anything in this research. These three recent presentations were good practice for me as I understand that not only it is not necessary to do a real research to present a research, but also you can present it rather well by reading its report just before the presentation. (: Nevertheless, I do not intend to do any similar experience in a near future. (;
P.S: Yeap! I do not work on machine vision. But it was a quite natural to understand their concerns and even devise a method in their field if you know that “an image is data to be processed”, “there are some methods to process image data which is 2D extension (or specialization) of general signal processing methods”, “patterns can be recognized”, and more importantly “pattern recognition is a kind of decision making”. Everything else is similar.
Michael Bowling and Manuela Veloso, Multiagent Learning using a Variable Learning Rate
Michael Bowling and Manuela Veloso, “Multiagent Learning using a Variable Learning Rate,†Artificial Intelligence, 2002.
This is the first paper I write about in my weblog in my series of multi-agent reinforcement learning papers. I had seen this paper about a year ago, but I did not read it as I thought that changing learning rate is not a real solution to the problem and I supposed that this paper is a kind of ad-hoc method. In that time, I was not that concerned about learning in multi-agent learning from the game theoretic perspectives, so I was not that aware of this paper. Now, it is apparent that I was not that correct. The results of this paper is interesting, Michael Bowling and Manuela Veloso tried to use as much mathematics as possible, and more importantly, his approach to the problem is really insightful.
I do not discuss the paper much, but I tries to write about my concerns about it. Before going on, it is mandatory to mention that I am new in this MAS learning and game theory; so concepts such as Nash equilibrium and its importance like that are not much clear for me.
Continue reading “Michael Bowling and Manuela Veloso, Multiagent Learning using a Variable Learning Rate”
The Effect of Reinforcement Signal Error in Reinforcement Learning
TITLE: The Effect of Reinforcement Signal Error in Reinforcement Learning (Translated)
ABSTRACT: Designing reinforcement signal is fundamental problem in reinforcement learning. Intelligent agent’s designer can guide the learner agent to its desired behavior by selecting an appropriate reinforcement signal. However, there is no general methodology for designing that signal and the designed signal is different from the unknown ideal one in many cases. In this paper, this difference is considered as a bounded-norm error in reinforcement signal and its effects on the value function and the policy of the agent is calculated as some upper bounds. In the end, the mathematical results are tested in an experiment. (Translated)
This is my The Computer Society of Iran Computer Conference (CSICC) 2005 paper. As it is written in Persian, I translate its abstract and put it in this weblog. It is probable that I revise it and send it to an international conference or even journal.
My Presentations at The Computer Society of Iran Computer Conference (CSICC) 2005
The Computer Society of Iran Computer Conference (CSICC) 2005 was held from Tuesday to Thursday. I had a paper on it which was presented as a poster. Its acceptance as a poster made me quite angry at that time as I became aware that the reason of this kind of acceptance was that one of reviewers did not like my writing style!!! Yes! It is unbelievable that s/he did not evaluate my paper’s scientific qualification, but assessed its style. I can’t explain the exact nature of the problem as the paper is written in Persian but it may be helpful that I note that my writing style was a new methodology of writing in Persian which is used by some well-known and professional authors and poets. Anyway, I had the chance to present my poster to two individuals only: my girlfriend and another person. (:
It is not the end of story … It was 11PM of Wednesday that one of my friends, H.M., mailed me that if I could present his paper. He had submitted a paper to the conference; however, he is not in the country right now. I had written him well before that if he wanted me to present his paper, he would send its PowerPoint file a week before. But he didn’t do so and I received his PPT file at 12PM, Thursday and the session was scheduled to start at 2PM that day!!! Although no wise man would accept to do so, I preferred to present his paper as a very interesting and exciting experience. I read his paper very fast, and then took a look at his PPT once. Then, I rehearsed the presentation for myself. It was 1:25PM that I left home. The situation became more exciting as the traffic was terrific. I arrived the session at 2:25PM and started my presentation 5 min afterward. Fortunately, I presented very well (considering all the conditions, of course!) and I could even answer the questions of Dr.Badi-i and Dr.Meybodi.
Designing a Mobile Robot’s Mind: A New Job!
I have found a new and interesting job: designing a mobile robot’s mind! I have started working on designing necessary decision making system to control a small mobile robot. It ought to act as a vacuum cleaner or something similar. It is not apparent for me what I should do exactly; however, I designed a simple two layer subsumption architecture that wanders around while avoid obstacles. If I am true about my position in the group, I intented to design three other modules in addition to what I have implemented: map building, self-localization, and dirt-cleaning planner to cover the whole map efficiently. I have some ideas on all of them that I will discuss them gradually.
Cybernetical Semiology
Tonight, Ramin and I had a talk on the relation between engineering aspects of context and its semiological implication. The result can be named Cybernetical Semiology.
What is your idea about it? Is there any good relation between these two fields? I believe that what people name “context†in AI (and related fields) is what semiologists call “textâ€Â.