I have started reading/thinking about this Predictive State Representation (PSR) concept recently. It is interesting in my opinion, but I am not so sure if it is a “really” good thing or not. Anyway, I am investigating it.

I wrote this mail to my friend Mohsen. I guess putting it here may be useful for me and others:

Well … You may ask yourself that why I become interested in SI. The reason is this Predictive State Representation newly proposed concept by a few reinforcement learning researchers. Let me introduce briefly the concept of PSR.

A central problem in reinforcement learning, control engineering, and … is predicting a sequence of observations. You want to know what is the next observation based on what you have seen before. Thereafter, you can design a controller or whatever you want. In order to do so, you need to have a model of the dynamical system.

There are two dominant approaches in modeling dynamical system in RL community. One of them is using Partial Observable MDP (POMDP) and the other is history-based methods. I think you know what a MDP is. MDP is like a dynamical system that you have access to the state information. On the other hand, you do not have access to state information in POMDP, but you observe an output of that system. It is like a dynamical system that you are observing the state of the system through a function of state (y=h(x)).

Well … the bad news, which I believe is common for both RL and control theory, is that it is not reasonable that you have access to the internal model of the dynamics. You may have an approximation of it, but in general, it is not known exactly. The only thing you know for sure is the output (observation) of the system. The whole job of SI is estimating that model. Right now, I want to know if estimation of state-space system is an easy job or not.

PSR is a new idea that wants to remedy this problem. How? It says that instead of modeling an internal dynamics of the system, let’s work only on observations and base our predictions on a set of previous predictions.

Suppose you want to predict (in a stochastic system), the probability of some sequences in the future based on your previous observations. In the language of control theory, it is like predicting the output of a system (in a form of probability distribution) based on its previous responses to signals. The idea of PSR asserts that you can do so if you have a set of known predictions called core tests. For instance, if you know that P{o1o2} = 0.3 and P{o2o3} = 0.5, you can find P{o1o2o3} (it is simplified!) (oi is observation (output) of the system at time instance ‘i’. You know, I used ‘y’ instead of ‘o’ when I was young! (; ). I have some problems with this concept. RL people do not agree that a similar idea exists in other fields. They believe that they are completely new. However, I am not so sure. I think the whole concept is not that new. It may use new names and metaphors, but the concept is like identification of IO systems (identification in the form of transfer function). I think, it is somehow like knowing the impulse response of a linear system. If your system is not linear, the impulse response is not sufficient anymore. But the concept is similar.

Well … what is your idea?! I would be happy if PSR people write me about it.

I went to Satinder Singh’s PSR talk at MLSS and from what I gathered, PSR works by attempting to learn a sufficient statistic for the future. This idea of trying to compress your input data (in this case, your history) through sufficiency comes up often in various applications of statistics. For instance feature selection, feature induction, and algorithms that transform feature space are also motivated by finding sufficient representation for the data (or sufficient statistics)

BTW, this talk’s video and slides will be posted on http://seminars.ijs.si/pascal/2006/mlss06%5Fcanberra/ within the next month or 2.

Hey Sologen!

I disagree; I think that RL people know that work has gone on in the SI field, and that we are trying to solve a similar problem. Our claim is that the idea that an estimate of the probability of some future events is a sufficient statistic is the new idea. Instead of representing the state as a set of unobservable latent variables, or simply the current observations, a predictive representation uses these predictions of future events to represent state. If it is not a new idea, we surely would like to know! The reason to use such a representation is that when solving problems in very complex domains (real life), it is silly to think that you can describe how the world works with a small set of variables or using some history. We are looking to represent the world only in terms of things we can see/touch/taste.

I’d like to talk to you about your knowledge of SI though, it would be good to understand this field better!

Hi Sologen,

Funnily PSR has been my heaviest reading since last week as well. Since I am mostly concerned with such ideas, at the onset I can tell you that the idea has appeared elsewhere. The most prominent to my knowledge are Anticipatory Learning Classifiers (ALC/ACS2/XCS etc.). But I do not personally care who did what first. I also believe that previous views of RL were easily adaptable for anticipatory processes as well. But the current formalization can for sure assure some convergence and nice situations for applications. I am trying to implement it using Brian Tanner’s TD Net formalization for an improvisation system! (Yeah you remember me I hope)

Cheers!