I am working on SSA again. Behavior learning is possible but is not consistent in object lifting task, i.e. I cannot be sure whether it works in every trial or not. I changed that abstract problem to include “NoAction†actions with different behaviors (in both state and action space) and it seems fine. I must work more on it, but I believe the difficulty of object lifting task is inherent in it: 1) it is not Markov Problem and 2) reward function is not well-defined in it. Anyway, I am going to investigate my methods on it.
PACness of MDP
I found some useful papers about PAC of MDP and its VC (and …) dimensions. Read them later:
Rahul Jain and Pravin Varaiya, “PAC Learning for Markov Decision Process and Dynamics Games,†?, 04.
R. Jain and P. Varaiya, “Extension of PAC Learning for Partially Observable Markov Decision Processes,†?, 04.
Yishay Mansour, “Reinforcement Learning and Mistake Bounded Algorithsm,†?, 1999.
PAC or ~PAC
To PAC or not to PAC – This is the problem! (Actually the problem is finding a VC or Pseudo-Dimension of a MDP).
What is my thesis about?!
I have not written anything directly related to my project there. You may wonder whether this guy is a machine learning student or a philosophy student. (; Anyway, I may change my high-security-with-copyrighted-material situation if everything goes this way. However, I try to write something about my project – wish it be fun and encouraging!
Let’s briefly discuss what I have done up to now:
As you know, I am working on learning in behavior-based systems. I have chosen Subsumption architecture as a base architecture due its success in designing a lot of behavior-based systems. I decomposed the learning process to two different situations: 1) structure learning, 2) behavior learning.
In the former case, I have supposed that the designer know how each behavior is working and s/he wants the learning mechanism places each behavior in its correct place. S/he guides this process by giving the agent a reinforcement signal that rewards or punishes its action. In the later case, the designer knows the correct structure of the architecture, but s/he is not aware of the way each behavior must act. For instance, s/he knows that there must be an obstacle avoidance behavior superior to any other behaviors, but s/he does not know what an appropriate action in each case is.
To learn a behavior-based system, one must solve these two problems. What I have done by now is trying to solve these two problems in a special case. I have got some partial results, but the problem is not solved completely.