On Adaptive Situated Agents

I recently came across this interesting paper by the NVIDIA autonomous driving team.

Bojarski, Del Testa, Dworakowski, et al., “End to End Learning for Self-Driving Cars,” 2016.

I wrote a summary and a few comments about it on my Twitter account. And IÂ thought maybe I can repost it here, with some additional discussions, to rekindle this dormant blog. So here you are. As always, your comments are appreciated.

SUMMARY

The NVIDIA group formulatesÂ the problem of learning how to drive as an imitation learning problem.Â It learns a mapping from the image input to the steering command by imitating how a human driver does that.

Their approach is essentially a modern (mid 2010s) version of ALVINN from late 1980s: more data, deeper neural networks, and more computation power.
The function approximator is a convolutional neural network (a normalization + 5 convolutional + 3 fully connected).Â They use a lot of collected data based on actual driverâ€™s behaviour to train their network (about 70 hours of real driving, which I believe corresponds to about 2.5M data samples â€” not explicitly mentioned) and some data augmentation. You can see the video of the self-driving car here. Cool, isn’t it?!

COMMENTS

It is exciting to see an end-to-end neural network learned how to perform relatively well. I congratulate them on this. But there are potential problems from machine learning perspective: Treating the imitation learning problem as a standardÂ supervised learning problemÂ may lead to lower performance than expected. This is due toÂ theÂ distribution mismatchÂ caused by the dynamical nature of the agent-environment interaction: When an agent (e.g., self-driving car) makes a mistake at each time step, the distribution of the future states slightly changes compared to the distribution induced byÂ the expertÂ agent (e.g., humanÂ driver).Â This has a compounding effect and the difference in distributions can potentially grow as the agent makes more interactions with the environment. In the self-driving car example,Â it means that aÂ series of small mistakes by a self-driving car moves the car to situations that are farther and farther away from the usual situation of a car driven by a human, e.g., the car gradually gets dangerously close to the shoulder.

As a result, as time passes, the agent is more likely to be in regions of the state space from which it doesnâ€™t have much training data (generated by the expert agent). So the agent starts behaving in ways that are not predictable even though it might perform well on the training distribution. This difference between two distributions is called the distributionÂ mismatch (or covariate shift) problem in the machine learning/statistics literature.

A solution to this problem is to use DAGGER-like algorithms:

StÃ©phane Ross, Geoffrey J. Gordon, and J. Andrew Bagnell, â€œA Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning,â€ AISTATS, 2011.

The basic idea behind DAGGER is that instead of letting the agent only learn on a fixed training data coming from an expert agent (which is a human driver in this case), we should let it learn on the distribution that the agent itselfÂ actually encounters. So if it happens that the agent goes to regions of the state space whichÂ are not usually encountered by the expert (so not in the initial training data set), well, that’s OK, because we can ask the expert to tell us what to do then, and hopefully the expert also knows how to deal with those situations.Â Â By keep training on the data from this distribution, the agent can learn a policy that is much better.

Of course, if the “expert” itself is not a real expert for certain situations, we cannot really hope to learn a useful agent even if we use DAGGER. For example,Â most drivers know how to drive a car that is a bit over the lineÂ to the centre of the lane, so they are expert in that situation and their expertise can be useful; but they may not be of any real useÂ how to deal with aÂ car thatÂ is in aÂ ditch. Not being able to be better than the expert is a limitation of imitation learning. There are some solutions for that, but maybe that should be the topic of another post.

Aside the aforementioned work, which analyzes the phenomenon in the imitation learning setting, the analysis of how the distribution of the agentâ€™s changes, in the context of reinforcement learning, has been done by several researchers, including myself. I only refer to threeÂ papers. See their references for further information.

Remi Munos, “Performance bounds in Lp norm for approximate value iteration,” SIAM Â Journal of Control and Optimization, 2007.
Amir-massoud Farahmand, Remi Munos, and Csaba Szepesvari, “Error Propagation for Approximate Policy and Value Iteration,” NIPS, 2010.
Bruno Scherrer, Mohammad Ghavamzadeh, Victor Gabillon, Matthieu Geist, “Approximate Modified Policy Iteration,” ICML, 2012.

Anyway, it is nice to see a self-driving car that is not based on a lot of manual design and engineering, but is heavily based on the principles of machine learning. Of course, there are a lot more to be done and I am sure that the NVIDIA team willÂ improve their system.

These are papers that I want to read (or can be considered for our reading group).

Dj. Mitrovic, S. Klanke, and S. Vijayakumar, “Adaptive Optimal Control for Redundancy Actuated Arms,” SAB 2008.
G. Chesi and Y. S. Hung, “Global Path-Planning for Constrained and Optimal Visual Servoing,” IEEE Trans. Robotics, 2007.

J.A. Ting, M. Mistry, J. Peters, S. Schaal, J. Nakanishi, “A Bayesian Approach to Nonlinear Parameter Identification for Rigid Body Dynamics,” RSS 2006.

J.A. Ting, A. D’Souza, S. Vijayakumar, and S. Schaal, “A Bayesian Approach to Empirical Local Linearization for Robotics,” ICRA 2008.

Seems to be relevant to our IROS 2007 paper (A. M. Farahmand, A. Shademan, and M. Jagersand, “Global Visual-Motor Estimation for Uncalibrated Visual Servoing,” IROS 2007. Check later.

Posts

End to End Learning for Self-Driving Cars and the Distribution Mismatch Problem

SUMMARY

COMMENTS

Advice for Graduate Students in Statistics

Nonparametric Bayesian Methods

Reinforcement Learning blog

Bracketing Entropy Bounds for Distribution Function

Embedding, Metric Entropy, etc.

Shannon Sampling and Learning Theory

Statistical Performance of Support Vector Machines

Compression-related ideas in Machine Learning

A few papers on estimation and control of robotic systems