Reinforcement Learning Frequently Asked Questions (FAQ) by Richard Sutton
How are you?!
How is the emergent interaction of the world and your body?!
(an excerpt from a daily dialogue between a believer of embodied behavioral AI to an unknown passenger)
Deciphering Academese
“To the best of the author’s knowledge …†= “WE WERE TOO LAZY TO DO A REAL LITERATURE SEARCH.â€Â
“Results were found through direct experimentation.†= “WE PLAYED AROUND WITH IT UNTIL IT WORKED.â€Â
“The data agreed quite well with the predicted model.†= “IF YOU TURN THE PAGE UPSIDE DOWN AND SQUINT, IT DOESN’T LOOK TOO DIFFERENT.â€Â
“It should be noted that …†= “OK, SO MY EXPERIMENTS WEREN’T PERFECT. ARE YOU HAPPY NOW??â€Â
“There results suggest that …†= “IF WE TAKE A HUGE LEAP IN REASONING, WE CAN GET MORE MILEAGE OUT OF OUR DATA.â€Â
“Future work will focus on … “ = “YES, WE KNOW THERE IS A BIG FLAW, BUT WE PROMISE WE’LL GET TO IT SOMEDAY.â€Â
“…remains an open question.†= “WE HAVE NO CLUE EITHERâ€Â.
(copied from a comic from www.phdcomics.com, drawn by Jorge Cham if I read it right!)
Long time paper writing
I looked at the date of BehStrLearning1.doc file which is the first document file of my current paper and it dates back to 7th of September 2004! It means that I am working on this paper for about 4 months! Wow! I wouldn’t believe it if I were told that a paper writing might take this much long time. Hopefully, it is in its last parts and I wish(!) it will finish and be submitted very soon.
University Rankings in AI
I’ve started ranking schools with AI researches. My emphasize is on new-age AI, e.g. situated embodied intelligent robots, neural networks, machine learning (especially reinforcement learning) fuzzy systems, evolutionary computation, pattern recognition, vision, and … . It means that I do not score schools with symbolic AI approach much. In other words, I score universities which is along my preferences. I’ve divided schools into these 6 categories:
5: Everything is pleasant: good projects, good professors, good reputation of school.
4: Very good place; may be not too prestigious.
3: Somehow good, e.g. one or two well-known professors but not a famous school.
2: Hey! There are trying to do something!
1: There is a little AI there, i.e. just having the name.
0: Nothing!
I will report the results after finishing my applying and receiving admissions. Universities can increase their rank by giving me admission with financial aid!
Kind people around the world!
Hey! Smile! There are a bunch of lovely and good people around the world – friend and stranger- who answer my questions very kindly and helpfully.
Yes! Yes!! It may be exactly you the dear reader whom I am speaking to. Smile! (;
Hey you, accept me please!
Hey you!
Out there in your lab,
Getting lonely, getting old,
Can you feel me?
Hey you!
Standing in the aisle [of your department!],
With an itchy feet and fading smile
As you have not any good student for a while,
Can you feel me?
Hey you!
Don’t help them to bury the strong AI,
Don’t give in without a fight!
Hey you!
Out there on your own,
Sitting naked by the phone,
Would you touch me? [It is not necessary to do so physically; it is sufficient to admit me!]
Hey you!
With your sonar against the wall,
Waiting for someone to call out [and implement your SLAM algorithm!]
Would you touch me?
Hey you!
Would you help me to admit to a good Ph.D. program
Open your group, I’m coming home!
[This is a disappointing part; you must not read it!!!]
[But it was only fantasy
The sonars was too noisy as you can see.
No matter how he try, he could not break free
And the bugs ate into his brain.]
Hey you!
Out there on the road [it refers to your outdoor robot],
Always doing what you’re not told [yes! It needs me to become working],
Can you help me?
Hey you!
Out there beyond the wall,
Breaking bottles in the hall [actually, your mobile robot do so because its obstacle avoidance behavior does not work anymore],
Can you help me?
Hey you!
Don’t tell me there is no hope at all.
Together we research, divided we fall!
P.S: The original lyric is from Pink Floyd and I’ve changed it a little!
Admission!
Hey! Is there anybody out there, want a new student, a very good one, a creative one?! Come on!! You earn much!
Addiction and Learning
A University of Minnesota researcher developed a computational model of addiction which can be used to make predictions about human behavior, animal behavior, and neurophysiology.
…
Natural increases in dopamine occur after unexpected natural rewards; however, with learning these increases shift from the time of reward delivery to cueing stimuli. In TDRL, once the value function predicts the reward, learning stops. Cocaine and other addictive drugs, however, produce a momentary increase in dopamine through neuropharmacological mechanisms, thereby continuing to drive learning, forcing the brain to over-select choices which lead to getting drugs … (read more)
Robust Algorithms: Some more thought
Ramin replied me back and wrote:
Every program could be considered as an automata which gets a string input and outputs a string output of codes. I think the robustness of algorithms can be interpreted as “redundancy” in this messaging framework. Thus, I recommend this, as an alternative strategy, to bring robustness to computer algorithms from the communications view point.
The idea of translating computer algorithms to dynamical systems is fascinating but first, we should reconsider the “dynamic” properties of our algorithms.
And I wrote him:
You mentioned a very good and interesting point. I wonder if there is a relation between dyamical properties of the system and information content of the message. I guess that the fractal dimension of a trajectory of the system’s output has a relation with output signal’s entropy. Consider this: a stable system converges to a single point in the state space (or output space) (fractal dimension = 0) and the entropy of the output signal converges to the zero as knowing the system is stable is equivalent to knowing what its future would be. A limit cycle leads to a finite fractal dimension (e.g. in 2D state space, it is between 1 and 2), and the entropy of it is finite depending on the length of its descriptor string, and etc.
Properties like robustness may be considered as cross-entropy between source (disturbances) to destination (output signal). hmmm … it is interesting in my mind. Maybe we are reinventing cybernetics!