Dynamical Behavior of Wikis
(naive idea!) Do Wikis have a fixed-point?! I guess depending on the content and the diversity of commonsense in that subject, they may have a vast range of dynamical behaviors, e.g. stable to a fixed-point (i.e. everyone agrees), limit-cycle (i.e. two diffent controversial belief about the subject), or even richer behaviors such as time-varying limit-cycle (i.e. dialectic dialog), or chaotic.
It is an interesting subject to work on. If you have any idea, please share it. If I find some interesting work, I will give a link to them.
Computing Chris blog
Look at this Computing Chris weblog. It is written by Chris Leonard, a publishing editor of Elsevier for theoretical computer science journals. Welcome Chris!
Welcome Reviewers!
I guess, but I’m not sure, that my papers’ reviewers read this blog. When it comes near the decision time, I have a few visitors who are searching my name on the web and find this site!
AAAI-05 Blog
I guess it would be a little late, but you may enjoy reading this AAAI-05 blog. Its description reveals everything: Student blog for the 20th National Conference on Artificial Intelligence (AAAI-05) and 17th Innovative Applications of Artificial Intelligence Conference (IAAI-05) 9-13 July 2005, Pittsburgh.
Marvin Minsky’s fan can read these two posts: 1 and 2!
Thesis Writing
Today, I wrote 2800 words (15K characters) of my thesis discussing behavior evolution experiments. I am really tired of reporting and writing and …, but it is almost done. I must write a conclusion, then re-read all I’ve written till now, and then put figures in their place. After that, I can print my thesis.
Deep Impact: A Further Step
NASA’s Deep Impact is a really astonishing project. You may know by now that the mission is about sending a spacecraft toward comet Temple 1 to make an artificial impact with it. This way, one can seek the material beneath the surface of that comet which has the same age as the solar system. This helps us to find out more about the early ages of solar system.
IMO, the mission would be considered as science fiction ten or twenty years ago.
On Performance Reporting in Reinforcement Learning
One common way to evaluate reinforcement learning methods is comparing the learning curve that is usually an average of a few runs (trials) of the method applied to a single problem. If the ultimate part of the learning curve is higher than the other methods, the author says that “our method handles uncertainty and partial observability much better than …â€Â. If the problem is easy enough that reaching the optimal solution is not an astonishing job (e.g. it is MDP), then comparison will be made between slopes of the learning curves, i.e. learning speed. If the method reaches faster to the optimum, it is said that it uses more information during learning/it uses intrinsic hierarchy of the problem better/it bootstraps better/… . Of course, some papers are not written this way, but it is the case.
However, there are a few important points one must pay attention:
1- What if variance of expected gained reward is too high? Averaging a few runs is not very bad, but it is a lossy method of showing results as some very good learning curves can heighten the average curve. This high amount of variance is not due to the intrinsic nature of the problem, but it is due to the fact that the method cannot learn in every run.
To remedy this problem, I am used to plot the probability distribution of the performance instead of mere averaging. This results in a three dimensional diagram (time – performance – probability of the performance) which is not an easy to show on a paper; thus, I do some sampling at different stages of learning and provide a few two dimensional diagram. I do not know whether anyone have used this kind of representation before – but believe me that providing a probability distribution is at least a very good looking diagram! (; But remember to compute “Cummulative Probability Distribution Function†instead of “Probability Density Functionâ€Â. The latter one is bumpy if you have not many samples.
2- To my best knowledge, there is few mathematical work on the learning speed of RL (I must confess that I have seen two or three papers on the sample complexity of it – but not anymore. There might be a few more). Most researches show the benefit of their method comparing on at most a few problems. Nevertheless, it is not usually said that getting this high performance needs many parameter fine tunings. So, how should we compare two methods when they are not equally optimized?! (Some people used GA or … to find the best learning parameters for their method. I have not done such a thing yet.)
A Samba of Decision Makers
It is in the half time of Brazil-Argentina final match that I get involved in thinking about the decision making process and reinforcement learning. Suppose that one intends to write a program to play soccer. What kind of state representation should he use? The representation must encompass as much information as it is possible to discriminate between two states with different actions (seeing the problem from classification side – separablity classes and …) or it is informative as it enables the agent to separate two state-action pairs with different value (like the abstraction theorem in MaxQ or …).
For instance, the player’s position, the ball’s relative position, and a few nearby players and their tags (opponent/teammate) would be a rational choice. However, one may ask how many nearby players must be selected? Two? Three? More state information you use, the best possible answer to this POMDP would be better. But it is notable that the state representation is exponentially growing with the number of states – at least in most common representations. What should we do?
I guess we must seek for newer (rich) state representation. Here, I am not talking about using general function approximators or hierarchical RL that are useful in their own. I am talking about wise selection of state representation: a dynamic and automatic generation of states is crucial. As an example, suppose that you are a player in the middle of the field and the ball is with your very close opponent (a few meters). The most important factors (read it as state) for your decision making is your relative distance with the opponent and if you are a good player, his limbs’ movement. It is not “that†important to know where the exact position of your teammate is when he is 20 meters away. However, when you are close to the penalty area of the opponent, not only your relative position to your opponents are important, but also your teammate positions might be critical for a good decision making, e.g. passing to your teammate may come at a goal.
I believe that there must be a method for automatic selection of important features for each state. Different states need not have the same kind of representation and dimension. In some situations, the current sensory information might be sufficient, in some other situations, the predictions of other agent’s sensory information might be necessary and … . An extended Markov property may apply to this situation: having a set of S1…Sn (n-dim) state variables, I guess it is possible to reduce the state transition of the MDP environment in this way: P(Si(t+1)..Sj(t+1)..Sk(t+1)|S1(t)…Sn(t)) = P(Si(t+1)..Sj(t+1)..Sk(t+1)|Sp(t)..Sq(t)..Sr(t)) for some p..q..r, i.e. there are some independency here.
As far as I know, the most similar well-known research similar to this idea is work of McCallum: Utile Suffix Memory and Nearest Sequence Memory. Nevertheless, those methods do consider only a single kind of states which is simpler than what I am thinking about.
Well … Brazil won the game tremendously with those samba dancers! Congratulations to Marcelo!
Dreaming John Nash
Last night, I dreamt John Nash!
It was a conference in Japan or somewhere alike – and if I should name the conference, it was something like IROS04 (maybe due to it was my only international conference). My advisor and I were trying to find rooms in a hotel that was almost filled. After placing our luggage in the hotel, we went down. During this room seeking job, I saw many famous people which I recognized then but I cannot remember right now (I guess those people do not actually exist in the real world). Well … outside the hotel, my advisor became curious to see John Nash. He did not know him but I knew (again: I have not seen his real face until an hour before. emmm … not very similar.). Someone introduced us to him. He greeted us. My advisor tried to speak to him. I don’t know what happened, but I remember that the speech continued between John Nash and me. I wanted to persuade him that interpreting the neural network as a multi-person game would be interesting (e.g. seeing learning as a cooperative game and … ) and I insisted him to do some research on artificial intelligence. I felt that he became interested in the subject. Unfortunately, my dream broke and we could not discuss any more.
Isn’t it interesting that I dream this way? Am I going to become mad?!