On Adaptive Situated Agents – Page 19 – Amir-massoud Farahmand's thoughts on reinforcement learning, machine learning, and other wonderful stuff!

July 22, 2004

PACness of MDP

I found some useful papers about PAC of MDP and its VC (and Ã¢Â€Â¦) dimensions. Read them later:

Rahul Jain and Pravin Varaiya, Ã¢Â€ÂœPAC Learning for Markov Decision Process and Dynamics Games,Ã¢Â€Â ?, 04.
R. Jain and P. Varaiya, Ã¢Â€ÂœExtension of PAC Learning for Partially Observable Markov Decision Processes,Ã¢Â€Â ?, 04.
Yishay Mansour, Ã¢Â€ÂœReinforcement Learning and Mistake Bounded Algorithsm,Ã¢Â€Â ?, 1999.

July 20, 2004

Post those-busy-days era: Chaos control and Co-evolution

At last, I finished that bulk of reporting stuff that I was engaged in during last week. I must have written a technical report about Chaos Control and a paper on Evolutionary Robotics. These heavy works Ã¢Â€Â“with becoming near deadlines and too little time to do- was too stressful for me. Fortunately, I did them!

The first one that is written in Persian (Farsi) is a literature survey on different methods of chaos control. I have been fascinated about chaos for a long time (perhaps from the time I was 12. Yes?! What is the problem?!), but I could not find any possibility to do some real scientific research or at least readings. Despite a short not-too-academic research that I did in the first year of BSEE, I have found a chance to do a real one when I entered graduate school and begin my MS study (The first one was about using a chaos signal in order to solve some optimization problem. After that, I did two chaos control ones too).
Thus, this rather good literature survery was a very pleasant experience for me. In spite of those readings, I am not a chaos specialist anyway! 😀

The second one, which is entitled Behavior Evolution/Hierarchy Learning in a Behavior-based System using Reinforcement Learning and Co-evolutionary Mechanism, was a result of some experiences on evolutionary robotics. You may know that I believe in the evolutionary mechanism (be natural or artificial), though many think that it is just an idiot (with IQ = 0.0001) given enough time to try every cases. Nevertheless, I got some good results mixing co-evolution and learning which was fascinating. I mainly did this research in order to satisfy the requirement of getting a mark for Dr.LucasÃ¢Â€Â™ Biocomputing course, but that was only an ignition. Anyway, Dr.Nili and Dr.Araabi told me not to submit this paper to any place before submitting some other papers before.

July 20, 2004

non-singular situation

det(SoloGen)!=0 anymore!

July 16, 2004

These busy days!

These days are very busy for me. Actually, I am writing a technical report about chaos control and write a paper about evolutionary robotics. Both of them must be ready by Sunday. Haha Ã¢Â€Â¦ ! In addition to these writing stuffs, I must think about my thesis in my unconsciousness.

June 21, 2004

PAC or ~PAC

To PAC or not to PAC – This is the problem! (Actually the problem is finding a VC or Pseudo-Dimension of a MDP).

June 17, 2004

Chaos Control’s Seminar: The Last Part

My last part of Chaos Control seminar presentation trilogoy was presented yesterday. It was mostly dedicated to Bifurcation control which I was not professional at. Anyway, now I do know more than 99.99% of people (even more)!! 😀 This is the good part of it.
But something strange happened yesterday. oops!

June 13, 2004

Bifurcation surfing!

I’m busy with some readings mostly about Bifurcation. I’ll write about the papers and … that I read later, but it is worthy to mention some useful (or interesting) links that I encountered during my research. It may be useful later.

Chaos @ Maryland (you cannot find any paper here, but you can find every kind of chaos research!!)

Fredholm Alternative Theorem (It appears in one of bifurcation papers that I read.)

Implicit Function Theorem

Dynamical System Theory (Seems to be a book, but I haven’t looked at it. I was searching for Center Manifold Theorem that I found it).

Invariant Subspaces (I was looking for Popov-Belevitch-Hautus cont./obs. and I found this pdf. It is a good one).

Bifurcation (seems to be a good introductory one!)

June 5, 2004

Pre-HRL Presentation era!

I’m working on my Hierarchical Reinforcement Learning presentation that I will present in Distributed AI class a few hours later.It is 2:47AM and … emmm … yeap! The life is too compressed!

June 3, 2004

IROS 2004: Paper acceptance

I woke up today, checked my e-mail and suddenly I found this mail who announced me that my IROS 2004 paper has been accepted!! (: I have been waiting for this mail for a long time! (at least, it is a week that I’m too curious to know the result!!) The paper, which is entitled Ã¢Â€ÂœBehavior hierarchy learning in a behavior-based system using reinforcement learningÃ¢Â€Â, is based on my work on structure learning of Subsumption Architecture. Anyway, this news was a very good one! (:
These are its comments which I must answer:

Comment #1
——————————————–
Interesting preliminary results.
Further work is required including real experiments.

——————————————–
Comment #2
——————————————–
Summary:
The paper describes a reinforcement learning approach to selecting behaviors in a subsumption architecture. From a given set behaviors arranged in a hierarchy of layers, each layer learns to determine which behavior should be active. An appropriate (greedy, value-function based) reinforcement learning system is formulated for this problem, and evaluated in a simulated cooperative object lifting example with multiple robots.

General Comments:
Applying reinforcement learning (RL) to a subsumption architecture is not new, as cited correctly by the authors. What is finally developed in the paper looks like a standard value iteration RL method, i.e., a form of approximate dynamic programming. As the authors mention themselves, RL has seen a fair amount of work over the last year in learning with behaviors (the authors mention Options as future work). Thus, why did the authors not follow one of these established behavior-based RL approaches, or at least compare their results with related work? It will not be obvious for a reader where the originality and significance of the paper lies.

Detailed Comments:
– The use of English needs improvement in various places.
– Page 1: are the S i parts of the state space for each behavior overlapping or not?

——————————————–
Comment #3
——————————————–

June 1, 2004

Paper: Evolution of a Subsumption Architecture Neurocontroller

Julian Togelius, “Evolution of a Subsumption Architecture Neurocontroller”, ?

I’ve read this paper. It was interesting as it strenghten my idea of using (or possibility of using) incremental ideas in learning. I have done some experiments doing incremental learning, but I’m not yet in a place to make conclusions.
Before rewriting its abstract, let’s copy this informative table:

1-One layer – One fitness: Monolithic evolution
2-One layer – Many fitness: Incremental evolution
3-Many layers – One fitness: Modularized evolution
4-Many layers – Many fitness: Layered evolution
(One may call have another names for this one, i.e. I used to name every incrementally making “many layers” system, “incremental”.)
He found out that the forth method of evolution has indeed very good performance. It is the one I’m thinking about.

Here is the paper’s abstract:
Abstract. An approach to robotics called layered evolution and merging features from the subsumption architecture into evolutionary robotics is presented, and its advantages are discussed. This approach is used to construct a layered controller for a simulated robot that learns which light source to approach in an environment with obstacles. The evolvability and performance of layered evolution on this task is compared to (standard) monolithic evolution, incremental and modularised evolution. To corroborate the hypothesis that a layered controller performs at least as well as an integrated one, the evolved layers are merged back into a single network. On the grounds of the test results, it is argued that layered evolution provides a superior approach for many tasks, and it is suggested that this approach may be the key to scaling up evolutionary robotics.