Behavior learning in SSA: a mid-work report

I am working on SSA again. Behavior learning is possible but is not consistent in object lifting task, i.e. I cannot be sure whether it works in every trial or not. I changed that abstract problem to include “NoAction” actions with different behaviors (in both state and action space) and it seems fine. I must work more on it, but I believe the difficulty of object lifting task is inherent in it: 1) it is not Markov Problem and 2) reward function is not well-defined in it. Anyway, I am going to investigate my methods on it.

