Reinforcement Learning – Page 3 – On Adaptive Situated Agents

July 25, 2004

Behavior learning in SSA: a mid-work report

I am working on SSA again. Behavior learning is possible but is not consistent in object lifting task, i.e. I cannot be sure whether it works in every trial or not. I changed that abstract problem to include Ã¢Â€ÂœNoActionÃ¢Â€Â actions with different behaviors (in both state and action space) and it seems fine. I must work more on it, but I believe the difficulty of object lifting task is inherent in it: 1) it is not Markov Problem and 2) reward function is not well-defined in it. Anyway, I am going to investigate my methods on it.

July 22, 2004

PACness of MDP

I found some useful papers about PAC of MDP and its VC (and Ã¢Â€Â¦) dimensions. Read them later:

Rahul Jain and Pravin Varaiya, Ã¢Â€ÂœPAC Learning for Markov Decision Process and Dynamics Games,Ã¢Â€Â ?, 04.
R. Jain and P. Varaiya, Ã¢Â€ÂœExtension of PAC Learning for Partially Observable Markov Decision Processes,Ã¢Â€Â ?, 04.
Yishay Mansour, Ã¢Â€ÂœReinforcement Learning and Mistake Bounded Algorithsm,Ã¢Â€Â ?, 1999.

June 21, 2004

PAC or ~PAC

To PAC or not to PAC – This is the problem! (Actually the problem is finding a VC or Pseudo-Dimension of a MDP).

June 1, 2004

What is my thesis about?!

I have not written anything directly related to my project there. You may wonder whether this guy is a machine learning student or a philosophy student. (; Anyway, I may change my high-security-with-copyrighted-material situation if everything goes this way. However, I try to write something about my project Ã¢Â€Â“ wish it be fun and encouraging!
Let’s briefly discuss what I have done up to now:
As you know, I am working on learning in behavior-based systems. I have chosen Subsumption architecture as a base architecture due its success in designing a lot of behavior-based systems. I decomposed the learning process to two different situations: 1) structure learning, 2) behavior learning.
In the former case, I have supposed that the designer know how each behavior is working and s/he wants the learning mechanism places each behavior in its correct place. S/he guides this process by giving the agent a reinforcement signal that rewards or punishes its action. In the later case, the designer knows the correct structure of the architecture, but s/he is not aware of the way each behavior must act. For instance, s/he knows that there must be an obstacle avoidance behavior superior to any other behaviors, but s/he does not know what an appropriate action in each case is.
To learn a behavior-based system, one must solve these two problems. What I have done by now is trying to solve these two problems in a special case. I have got some partial results, but the problem is not solved completely.