To PAC or not to PAC – This is the problem! (Actually the problem is finding a VC or Pseudo-Dimension of a MDP).
Chaos Control’s Seminar: The Last Part
My last part of Chaos Control seminar presentation trilogoy was presented yesterday. It was mostly dedicated to Bifurcation control which I was not professional at. Anyway, now I do know more than 99.99% of people (even more)!! 😀 This is the good part of it.
But something strange happened yesterday. oops!
Bifurcation surfing!
I’m busy with some readings mostly about Bifurcation. I’ll write about the papers and … that I read later, but it is worthy to mention some useful (or interesting) links that I encountered during my research. It may be useful later.
Chaos @ Maryland (you cannot find any paper here, but you can find every kind of chaos research!!)
Fredholm Alternative Theorem (It appears in one of bifurcation papers that I read.)
Dynamical System Theory (Seems to be a book, but I haven’t looked at it. I was searching for Center Manifold Theorem that I found it).
Invariant Subspaces (I was looking for Popov-Belevitch-Hautus cont./obs. and I found this pdf. It is a good one).
Bifurcation (seems to be a good introductory one!)
Pre-HRL Presentation era!
I’m working on my Hierarchical Reinforcement Learning presentation that I will present in Distributed AI class a few hours later.It is 2:47AM and … emmm … yeap! The life is too compressed!
IROS 2004: Paper acceptance
I woke up today, checked my e-mail and suddenly I found this mail who announced me that my IROS 2004 paper has been accepted!! (: I have been waiting for this mail for a long time! (at least, it is a week that I’m too curious to know the result!!) The paper, which is entitled “Behavior hierarchy learning in a behavior-based system using reinforcement learningâ€Â, is based on my work on structure learning of Subsumption Architecture. Anyway, this news was a very good one! (:
These are its comments which I must answer:
Comment #1
——————————————–
Interesting preliminary results.
Further work is required including real experiments.
——————————————–
Comment #2
——————————————–
Summary:
The paper describes a reinforcement learning approach to selecting behaviors in a subsumption architecture. From a given set behaviors arranged in a hierarchy of layers, each layer learns to determine which behavior should be active. An appropriate (greedy, value-function based) reinforcement learning system is formulated for this problem, and evaluated in a simulated cooperative object lifting example with multiple robots.
General Comments:
Applying reinforcement learning (RL) to a subsumption architecture is not new, as cited correctly by the authors. What is finally developed in the paper looks like a standard value iteration RL method, i.e., a form of approximate dynamic programming. As the authors mention themselves, RL has seen a fair amount of work over the last year in learning with behaviors (the authors mention Options as future work). Thus, why did the authors not follow one of these established behavior-based RL approaches, or at least compare their results with related work? It will not be obvious for a reader where the originality and significance of the paper lies.
Detailed Comments:
– The use of English needs improvement in various places.
– Page 1: are the S i parts of the state space for each behavior overlapping or not?
——————————————–
Comment #3
——————————————–
Paper: Evolution of a Subsumption Architecture Neurocontroller
Julian Togelius, “Evolution of a Subsumption Architecture Neurocontroller”, ?
I’ve read this paper. It was interesting as it strenghten my idea of using (or possibility of using) incremental ideas in learning. I have done some experiments doing incremental learning, but I’m not yet in a place to make conclusions.
Before rewriting its abstract, let’s copy this informative table:
1-One layer – One fitness: Monolithic evolution
2-One layer – Many fitness: Incremental evolution
3-Many layers – One fitness: Modularized evolution
4-Many layers – Many fitness: Layered evolution
(One may call have another names for this one, i.e. I used to name every incrementally making “many layers” system, “incremental”.)
He found out that the forth method of evolution has indeed very good performance. It is the one I’m thinking about.
Here is the paper’s abstract:
Abstract. An approach to robotics called layered evolution and merging features from the subsumption architecture into evolutionary robotics is presented, and its advantages are discussed. This approach is used to construct a layered controller for a simulated robot that learns which light source to approach in an environment with obstacles. The evolvability and performance of layered evolution on this task is compared to (standard) monolithic evolution, incremental and modularised evolution. To corroborate the hypothesis that a layered controller performs at least as well as an integrated one, the evolved layers are merged back into a single network. On the grounds of the test results, it is argued that layered evolution provides a superior approach for many tasks, and it is suggested that this approach may be the key to scaling up evolutionary robotics.
What is my thesis about?!
I have not written anything directly related to my project there. You may wonder whether this guy is a machine learning student or a philosophy student. (; Anyway, I may change my high-security-with-copyrighted-material situation if everything goes this way. However, I try to write something about my project – wish it be fun and encouraging!
Let’s briefly discuss what I have done up to now:
As you know, I am working on learning in behavior-based systems. I have chosen Subsumption architecture as a base architecture due its success in designing a lot of behavior-based systems. I decomposed the learning process to two different situations: 1) structure learning, 2) behavior learning.
In the former case, I have supposed that the designer know how each behavior is working and s/he wants the learning mechanism places each behavior in its correct place. S/he guides this process by giving the agent a reinforcement signal that rewards or punishes its action. In the later case, the designer knows the correct structure of the architecture, but s/he is not aware of the way each behavior must act. For instance, s/he knows that there must be an obstacle avoidance behavior superior to any other behaviors, but s/he does not know what an appropriate action in each case is.
To learn a behavior-based system, one must solve these two problems. What I have done by now is trying to solve these two problems in a special case. I have got some partial results, but the problem is not solved completely.
A New Place: Control Lab
Now I’m in Control Lab! A Pentium 4 -2.8GHz with 512MB-RAM and 80GB of HD is set mine! From now, I may work on my thesis at the university instead of home. Let’s see if it actually happens!