I guess, but I’m not sure, that my papers’ reviewers read this blog. When it comes near the decision time, I have a few visitors who are searching my name on the web and find this site!
AAAI-05 Blog
I guess it would be a little late, but you may enjoy reading this AAAI-05 blog. Its description reveals everything: Student blog for the 20th National Conference on Artificial Intelligence (AAAI-05) and 17th Innovative Applications of Artificial Intelligence Conference (IAAI-05) 9-13 July 2005, Pittsburgh.
Marvin Minsky’s fan can read these two posts: 1 and 2!
Thesis Writing
Today, I wrote 2800 words (15K characters) of my thesis discussing behavior evolution experiments. I am really tired of reporting and writing and …, but it is almost done. I must write a conclusion, then re-read all I’ve written till now, and then put figures in their place. After that, I can print my thesis.
Deep Impact: A Further Step
NASA’s Deep Impact is a really astonishing project. You may know by now that the mission is about sending a spacecraft toward comet Temple 1 to make an artificial impact with it. This way, one can seek the material beneath the surface of that comet which has the same age as the solar system. This helps us to find out more about the early ages of solar system.
IMO, the mission would be considered as science fiction ten or twenty years ago.
On Performance Reporting in Reinforcement Learning
One common way to evaluate reinforcement learning methods is comparing the learning curve that is usually an average of a few runs (trials) of the method applied to a single problem. If the ultimate part of the learning curve is higher than the other methods, the author says that “our method handles uncertainty and partial observability much better than …â€Â. If the problem is easy enough that reaching the optimal solution is not an astonishing job (e.g. it is MDP), then comparison will be made between slopes of the learning curves, i.e. learning speed. If the method reaches faster to the optimum, it is said that it uses more information during learning/it uses intrinsic hierarchy of the problem better/it bootstraps better/… . Of course, some papers are not written this way, but it is the case.
However, there are a few important points one must pay attention:
1- What if variance of expected gained reward is too high? Averaging a few runs is not very bad, but it is a lossy method of showing results as some very good learning curves can heighten the average curve. This high amount of variance is not due to the intrinsic nature of the problem, but it is due to the fact that the method cannot learn in every run.
To remedy this problem, I am used to plot the probability distribution of the performance instead of mere averaging. This results in a three dimensional diagram (time – performance – probability of the performance) which is not an easy to show on a paper; thus, I do some sampling at different stages of learning and provide a few two dimensional diagram. I do not know whether anyone have used this kind of representation before – but believe me that providing a probability distribution is at least a very good looking diagram! (; But remember to compute “Cummulative Probability Distribution Function†instead of “Probability Density Functionâ€Â. The latter one is bumpy if you have not many samples.
2- To my best knowledge, there is few mathematical work on the learning speed of RL (I must confess that I have seen two or three papers on the sample complexity of it – but not anymore. There might be a few more). Most researches show the benefit of their method comparing on at most a few problems. Nevertheless, it is not usually said that getting this high performance needs many parameter fine tunings. So, how should we compare two methods when they are not equally optimized?! (Some people used GA or … to find the best learning parameters for their method. I have not done such a thing yet.)