February 2006 – On Adaptive Situated Agents

February 20, 2006

Predictive State Representation and System Identification

I have started reading/thinking about this Predictive State Representation (PSR) concept recently. It is interesting in my opinion, but I am not so sure if it is a “really” good thing or not. Anyway, I am investigating it.

I wrote this mail to my friend Mohsen. I guess putting it here may be useful for me and others:

Well … You may ask yourself that why I become interested in SI. The reason is this Predictive State Representation newly proposed concept by a few reinforcement learning researchers. Let me introduce briefly the concept of PSR.

A central problem in reinforcement learning, control engineering, and … is predicting a sequence of observations. You want to know what is the next observation based on what you have seen before. Thereafter, you can design a controller or whatever you want. In order to do so, you need to have a model of the dynamical system.
There are two dominant approaches in modeling dynamical system in RL community. One of them is using Partial Observable MDP (POMDP) and the other is history-based methods. I think you know what a MDP is. MDP is like a dynamical system that you have access to the state information. On the other hand, you do not have access to state information in POMDP, but you observe an output of that system. It is like a dynamical system that you are observing the state of the system through a function of state (y=h(x)).
Well … the bad news, which I believe is common for both RL and control theory, is that it is not reasonable that you have access to the internal model of the dynamics. You may have an approximation of it, but in general, it is not known exactly. The only thing you know for sure is the output (observation) of the system. The whole job of SI is estimating that model. Right now, I want to know if estimation of state-space system is an easy job or not.

PSR is a new idea that wants to remedy this problem. How? It says that instead of modeling an internal dynamics of the system, let’s work only on observations and base our predictions on a set of previous predictions.
Suppose you want to predict (in a stochastic system), the probability of some sequences in the future based on your previous observations. In the language of control theory, it is like predicting the output of a system (in a form of probability distribution) based on its previous responses to signals. The idea of PSR asserts that you can do so if you have a set of known predictions called core tests. For instance, if you know that P{o1o2} = 0.3 and P{o2o3} = 0.5, you can find P{o1o2o3} (it is simplified!) (oi is observation (output) of the system at time instance ‘i’. You know, I used ‘y’ instead of ‘o’ when I was young! (; ). I have some problems with this concept. RL people do not agree that a similar idea exists in other fields. They believe that they are completely new. However, I am not so sure. I think the whole concept is not that new. It may use new names and metaphors, but the concept is like identification of IO systems (identification in the form of transfer function). I think, it is somehow like knowing the impulse response of a linear system. If your system is not linear, the impulse response is not sufficient anymore. But the concept is similar.

Well … what is your idea?! I would be happy if PSR people write me about it.

February 18, 2006

To Read: Stochastic Differential Equations and more

A bunch of interesting course lecture notes in Lawrence Craig Evans‘s homepage:

– An Introduction to Stochastic Differential Equations
– An Introduction to Mathematical Optimal Control Theory
– Entropy and Partial Differential Equations

February 8, 2006

SVM: Maximizing the Sum of Gaps

A few days ago, while sitting in the machine learning class, I was thinking about the following problem about designing a classifier:
AFAIK, In traditional SVM, the goal is maximizing the minimum gap. To cope with noise, we may add slack variables to our constrained optimization programming.
BUT, why don’t we just maximize the total gap (sum of gaps) instead of max. of the min?! This way, outliers act as negative gaps. This problem would be defined as Linear Programming, so we have efficient solution for it.

I just wonder if this approach has been used before. Or is it convertible to some other commonly-used objective functions?

February 8, 2006

Busy Semester

Hi everybody after a long time!
It is almost a shame that I do not update here regularly. The worst thing is that I do not have a consistent plan about what I want to write here, what I want to tell in my Persian weblog, and what to keep just in my notebook. Well! Let’s not talk about administrative(!) stuff right here.

This semester is quite busy. I have taken the following courses:

Reinforcement Learning (Rich Sutton)
3D Vision (Martin Jagersand)
Operating System Concepts (Pawel Gburzynski)

Also, I audit Dale Schuurmans’ Machine Learning course. All these courses have their own readings, assignments, and so on! In addition, I want to do my own readings which certainly take much time. The other thing that has been taking my time recently is preparing a few papers for a conference and a journal. I do not want to mention all those seminars, meetings, and … that I should attend. Putting all these together result in being completely busy!