It is in the half time of Brazil-Argentina final match that I get involved in thinking about the decision making process and reinforcement learning. Suppose that one intends to write a program to play soccer. What kind of state representation should he use? The representation must encompass as much information as it is possible to discriminate between two states with different actions (seeing the problem from classification side – separablity classes and …) or it is informative as it enables the agent to separate two state-action pairs with different value (like the abstraction theorem in MaxQ or …).
For instance, the player’s position, the ball’s relative position, and a few nearby players and their tags (opponent/teammate) would be a rational choice. However, one may ask how many nearby players must be selected? Two? Three? More state information you use, the best possible answer to this POMDP would be better. But it is notable that the state representation is exponentially growing with the number of states – at least in most common representations. What should we do?
I guess we must seek for newer (rich) state representation. Here, I am not talking about using general function approximators or hierarchical RL that are useful in their own. I am talking about wise selection of state representation: a dynamic and automatic generation of states is crucial. As an example, suppose that you are a player in the middle of the field and the ball is with your very close opponent (a few meters). The most important factors (read it as state) for your decision making is your relative distance with the opponent and if you are a good player, his limbs’ movement. It is not “that†important to know where the exact position of your teammate is when he is 20 meters away. However, when you are close to the penalty area of the opponent, not only your relative position to your opponents are important, but also your teammate positions might be critical for a good decision making, e.g. passing to your teammate may come at a goal.
I believe that there must be a method for automatic selection of important features for each state. Different states need not have the same kind of representation and dimension. In some situations, the current sensory information might be sufficient, in some other situations, the predictions of other agent’s sensory information might be necessary and … . An extended Markov property may apply to this situation: having a set of S1…Sn (n-dim) state variables, I guess it is possible to reduce the state transition of the MDP environment in this way: P(Si(t+1)..Sj(t+1)..Sk(t+1)|S1(t)…Sn(t)) = P(Si(t+1)..Sj(t+1)..Sk(t+1)|Sp(t)..Sq(t)..Sr(t)) for some p..q..r, i.e. there are some independency here.
As far as I know, the most similar well-known research similar to this idea is work of McCallum: Utile Suffix Memory and Nearest Sequence Memory. Nevertheless, those methods do consider only a single kind of states which is simpler than what I am thinking about.
Well … Brazil won the game tremendously with those samba dancers! Congratulations to Marcelo!
Hellom Amir!! 😀
Oh, Thank You for your congratulations to my Country’s Soccer Team. 😀 I think that argentinians are still “crying” the match. 😀
Very interesting your ideas about soccer and Machine Learning, but I think that try to use Markov Chains/Processes to describe the decicions could drive to a dangerous scenary, because that set of theories claims that the transition to the next state depends only about the current state (memoryless). I even put a post in my Blog about my actual professor in Analysis of Performance of Systems, he thinks that all the World is Markovian.
Oh, I had left a comment in another post of yours, the comment was about Real-Coded GA’s, I think that you did not notice the comment, so, I put it again here 😀
“Hello, Amir!!
Ow, it’s very interesting that you are a fan of Brazil, but I’m happy that Brazilian Soccer Team had woken up your interest. Tell your little brother to wait until next World Cup(2006) and he will see who will win the Big Challenge. I bet all my faith in Brazil!! 😀
Hey, Amir, I would like to ask you if you had already read a Goldberg’s article about real-coded GA’s: “Real-coded genetic algorithms, virtual alphabets, and blocking”. I was reading this article, and, gosh, I really would like to know if there is somebody who had understood all those formulas. I think that Goldberg sinks himself with those mathematical formality.
So, let me know, please, what you think about that article. Your opinion will be very useful for me.
See You!!
Marcelo”
Até Mais!!
Marcelo
(:
I do not agree with you about the Markov Process. Depending on the current state for determining the next state does not eliminate “memory” as you may define the state as the history of the agent.
I have a question: do you know any physical system that is not Markov?! Dynamical systems are Markovian. (It may exist, but I cannot remember any.)
About the second part: I guess I read that paper but I cannot remember what happened there! 😀 I may try to find it again in my hard-disk and see if it looks familiar then.
Hello, Amir!! 😀
Well, once I said that I was inside a project related to Weather Forecast and Neural Nets, we used some past steps to determine the next (I think that it was not a “memoryless” situation) and we got better results than those supplied by classical statistics and probablity methods, I don’t know if among those methods is included Markov Chains/Processes. But a thing that Markov Chains/Processes maybe do not incorporate is a very importante one: Learning. Neural Networks have that characteristc to learn while dealing with the data, I do not know if Markov Chains/Processes do the same. 😀
About a physical example that is not a Markov Chains/Processes, well, someone could do any physical system be inside of any set of theories. I think that the Nature is a very integrated chain and try to model some Nature process with a Transition Mathematics (Markov Chains) could be a not very good modelling.
No Free Lunch, Amir!! 😀
See You!!
Marcelo