Behavior Hierarchy Learning in a Behavior-based System using Reinforcement Learning (abstract)

This is my IROS 2004 paper abstract which is entitled Behavior Hierarchy Learning in a Behavior-based System using Reinforcement Learning. It will be presented very soon at International Conference on Intelligence Robots Systems [It have been presented now! You can download the paper at my publication page or directly at there].
]. I am not sure if I present it or my advisor, Dr. Nili, do so as I have some problem getting passport and VISA. Anyway, it is very interesting to go to my first international conference and visit those guys that I read their papers and adore their work, e.g. Maja Mataric of USC, Cynthia Breazeal of MIT, Leslie Pack Kaelbling of CMU, Maneula Veloso of CMU, Lynne Parker of university of Tennessee, Sridhar Mahadevan of UMass, and a lot more who I havenÃƒÂ¢Ã¢Â‚Â¬Ã¢Â„Â¢t found their names yet. I wish I could visit other ones like Rodney Brooks, Marvin Minsky, Andrew Barto, Richard Sutton, Marco Dorigo, Floreano, and ÃƒÂ¢Ã¢Â‚Â¬Ã‚Â¦ and ÃƒÂ¢Ã¢Â‚Â¬Ã‚Â¦ but it seems that they do not participate in this conference. It is somehow natural as this is a Robotic conference and not all-I-lovable-topics one!!

And now, you can see the abstract. I will put the paper as soon as I upgrade my host and run an updating section for my scientific work.

Behavior Hierarchy Learning in a Behavior-based System using Reinforcement Learning
Abstract: Hand-design of an intelligent agentÃƒÂ¢Ã¢Â‚Â¬Ã¢Â„Â¢s behaviors and their hierarchy is a very hard task. One of the most important steps toward creating intelligent agents is providing them with capability to learn the required behaviors and their architecture. Architecture learning in a behavior-based agent with Subsumption architecture is considered in this paper. Overall value function is decomposed into easily calculate-able parts in order to learn the behavior hierarchy. Using probabilistic formulations, two different decomposition methods are discussed: storing the estimated value of each behavior in each layer, and storing the ordering of behaviors in the architecture. Using defined decompositions, two appropriate credit assignment methods are designed. Finally, the proposed methods are tested in a multi-robot object-lifting task that results in satisfactory performance.