I have started reading/thinking about this Predictive State Representation (PSR) concept recently. It is interesting in my opinion, but I am not so sure if it is a “really” good thing or not. Anyway, I am investigating it.

I wrote this mail to my friend Mohsen. I guess putting it here may be useful for me and others:

Well … You may ask yourself that why I become interested in SI. The reason is this Predictive State Representation newly proposed concept by a few reinforcement learning researchers. Let me introduce briefly the concept of PSR.

A central problem in reinforcement learning, control engineering, and … is predicting a sequence of observations. You want to know what is the next observation based on what you have seen before. Thereafter, you can design a controller or whatever you want. In order to do so, you need to have a model of the dynamical system.

There are two dominant approaches in modeling dynamical system in RL community. One of them is using Partial Observable MDP (POMDP) and the other is history-based methods. I think you know what a MDP is. MDP is like a dynamical system that you have access to the state information. On the other hand, you do not have access to state information in POMDP, but you observe an output of that system. It is like a dynamical system that you are observing the state of the system through a function of state (y=h(x)).

Well … the bad news, which I believe is common for both RL and control theory, is that it is not reasonable that you have access to the internal model of the dynamics. You may have an approximation of it, but in general, it is not known exactly. The only thing you know for sure is the output (observation) of the system. The whole job of SI is estimating that model. Right now, I want to know if estimation of state-space system is an easy job or not.

PSR is a new idea that wants to remedy this problem. How? It says that instead of modeling an internal dynamics of the system, let’s work only on observations and base our predictions on a set of previous predictions.

Suppose you want to predict (in a stochastic system), the probability of some sequences in the future based on your previous observations. In the language of control theory, it is like predicting the output of a system (in a form of probability distribution) based on its previous responses to signals. The idea of PSR asserts that you can do so if you have a set of known predictions called core tests. For instance, if you know that P{o1o2} = 0.3 and P{o2o3} = 0.5, you can find P{o1o2o3} (it is simplified!) (oi is observation (output) of the system at time instance ‘i’. You know, I used ‘y’ instead of ‘o’ when I was young! (; ). I have some problems with this concept. RL people do not agree that a similar idea exists in other fields. They believe that they are completely new. However, I am not so sure. I think the whole concept is not that new. It may use new names and metaphors, but the concept is like identification of IO systems (identification in the form of transfer function). I think, it is somehow like knowing the impulse response of a linear system. If your system is not linear, the impulse response is not sufficient anymore. But the concept is similar.

Well … what is your idea?! I would be happy if PSR people write me about it.