Considering Controlling Probabilities in Behavior Learning

Yesterday, I got stucked in a problem that was imposed by Dr.Nili. The problem was too simple: How do I update new values in my subsumption architecture learning. What I did seem reasonable but was not compatible with my theory. Actually, I did update each layer whenever it was controlling or it outputs NoAction. I did not consider “controlling probabilities” of each layer and it was inconsistence with my theories in which those probabilities were very important. I changed the code and considered that probabilities too: if a state-action in a behavior does not receive a reinforcement signal for a while, it will decreases toward zero. It is natural as its controlling probability is decreasing. Anyway, I implemented this code and it has worked. It is not very fascinating as the previous code had worked too – maybe due to its intrinsic robustness. The interesting fact is that each behavior predicts its structural value too, i.e. the sum of the value of each behavior is equal to its behavior value in the structure. It is the first time that I get this equality.
What is remained to do is to implement these algorithm to object lifting problem (I have done these with that abstract one) and check the other method of updating which is standard one (not this averaging).

This entry was posted in Reinforcement Learning. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *