Today, I came to Control Lab. in order to write a technical report about approximate reward in RL. I write something, but my efficiency is not very good, e.g. you may get involved in a long conversation and you cannot escape! 😀 Anyway …
During my writings, I found out that there might be some fallacy in agnostic learning: policy would change after changed agnostic reinforcement signal. I am not sure whether my result is correct or not.
If I can prove that policy does not change value function, everything would be ok! It is not generally correct, but may be correct in some situations, i.e. being sure that every state-action will be visited infinitely, then V->V* and so policy is irrelevant. emmm … must be thought!
I am working on SSA again. Behavior learning is possible but is not consistent in object lifting task, i.e. I cannot be sure whether it works in every trial or not. I changed that abstract problem to include âNoActionâ actions with different behaviors (in both state and action space) and it seems fine. I must work more on it, but I believe the difficulty of object lifting task is inherent in it: 1) it is not Markov Problem and 2) reward function is not well-defined in it. Anyway, I am going to investigate my methods on it.
I found some useful papers about PAC of MDP and its VC (and â¦) dimensions. Read them later:
Rahul Jain and Pravin Varaiya, âPAC Learning for Markov Decision Process and Dynamics Games,â ?, 04.
R. Jain and P. Varaiya, âExtension of PAC Learning for Partially Observable Markov Decision Processes,â ?, 04.
Yishay Mansour, âReinforcement Learning and Mistake Bounded Algorithsm,â ?, 1999.
At last, I finished that bulk of reporting stuff that I was engaged in during last week. I must have written a technical report about Chaos Control and a paper on Evolutionary Robotics. These heavy works âwith becoming near deadlines and too little time to do- was too stressful for me. Fortunately, I did them!
The first one that is written in Persian (Farsi) is a literature survey on different methods of chaos control. I have been fascinated about chaos for a long time (perhaps from the time I was 12. Yes?! What is the problem?!), but I could not find any possibility to do some real scientific research or at least readings. Despite a short not-too-academic research that I did in the first year of BSEE, I have found a chance to do a real one when I entered graduate school and begin my MS study (The first one was about using a chaos signal in order to solve some optimization problem. After that, I did two chaos control ones too).
Thus, this rather good literature survery was a very pleasant experience for me. In spite of those readings, I am not a chaos specialist anyway! 😀
The second one, which is entitled Behavior Evolution/Hierarchy Learning in a Behavior-based System using Reinforcement Learning and Co-evolutionary Mechanism, was a result of some experiences on evolutionary robotics. You may know that I believe in the evolutionary mechanism (be natural or artificial), though many think that it is just an idiot (with IQ = 0.0001) given enough time to try every cases. Nevertheless, I got some good results mixing co-evolution and learning which was fascinating. I mainly did this research in order to satisfy the requirement of getting a mark for Dr.Lucasâ Biocomputing course, but that was only an ignition. Anyway, Dr.Nili and Dr.Araabi told me not to submit this paper to any place before submitting some other papers before.
These days are very busy for me. Actually, I am writing a technical report about chaos control and write a paper about evolutionary robotics. Both of them must be ready by Sunday. Haha â¦ ! In addition to these writing stuffs, I must think about my thesis in my unconsciousness.