Newsgroups: comp.ai.alife,comp.ai.genetic,comp.ai.neural-nets,comp.robotics,comp.robotics.misc
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!oitnews.harvard.edu!purdue!lerc.nasa.gov!magnus.acs.ohio-state.edu!math.ohio-state.edu!howland.reston.ans.net!spool.mu.edu!umn.edu!news
From: bultx001@maroon.tc.umn.edu (Steve Bult)
Subject: Re: Reinforcement learning - predicting reinforcement
Message-ID: <DFq1rs.4CB@news.cis.umn.edu>
Sender: news@news.cis.umn.edu (Usenet News Administration)
Nntp-Posting-Host: dialup-11-a-139.gw.umn.edu
Organization: BEST
X-Newsreader: Forte Free Agent 1.0.82
References: <811106121snz@whitestn.demon.co.uk> <43u2nm$i12@columba.udac.uu.se>
Date: Sat, 30 Sep 1995 14:14:01 GMT
Lines: 91
Xref: glinda.oz.cs.cmu.edu comp.ai.alife:4206 comp.ai.genetic:6888 comp.ai.neural-nets:27177 comp.robotics.misc:1244

Magnus Sandberg <sandberg@ibg.uu.se> wrote:

>Bob Mottram <Bob@whitestn.demon.co.uk> wrote:
>>
>>Hi there,
>>
>>I am currently experimenting with some reinforcement learning (RL)
>>techniques for a small mobile robot moving about in a cluttered 
>>environment.  My aim is to have the 'bot moving happily about,
>>not bumping into obstacles, whilst navigating towards a goal.
>>
(cut)

>>Ideally, I would like the learning of the prediction system to be 
>>continuous. I want to avoid having to store the state/reinforcement
>>of the system over the number of prediction time steps required 
>>(in order to retrospectively calculate the prediction error) if possible.

>I'm not sure if this can be done. In order to make a decent prediction of the
>future without knowing anything else than the present state and past success of
>the robot, the evaluation system would have to be far more intelligent than the
>decision making process itself. No simple function/extrapolation would do. So
>let's design another learning system whose task is to determine (without delay)
>the success of each of the robot's moves. Let's say we pick some kind of neural
>network. Then the network, too, needs to be trained, but it can't be trained
>continously (since that would require us to know at once how successful the
>latest move was, which is what we are looking for, in order to evaluate the
>net's performance). We thus choose to train the network with a delay of ten
>(say) steps. Knowing the states of the robot between steps [t,t+10] will let us
>make a decent evaluation of the robot's move at step t and thus permit us to
>evaluate the net's output at step t. But in order to do this we have to have
>saved the complete state of the network at step t, and so nothing is gained.
Hi guys, I think this is a great discussion (at least to the extent I
understand it). Here are my thoughts about the original problem (safe,
happy and useful little robots) and the more general concept of
learning systems using humans of an example of one that works (more or
less). 

A human baby comes with certain "burned in" routines ("rooting" for a
nipple and suckling, is an example of what consitutes a more complex
behaiviour, blinking is an example of a rather simple reflex and
digestion is an example of a bio-chemical reaction. With the addition
of a rather "blank" but  potentially utile (LOTS of potential
connections) neural network we have a baby as it comes "from the
manufacturer"

This baby will shortly be able to make learned decisions that involve
a very small number of steps. Example: If I need/want something and if
I  make a loud crying sound, my parent will appear, figure out what I
need/want and give it to me. Learned decisions that require many steps
or abstract concepts are beyond the baby at this stage no matter how
beneficial they might be for the baby. Example: If I take the money
grandpa and grandpa sent when I was born and invest it in a mutual
fund that buys hi-tech stocks, I won't need to worry about food when I
retire. This type of behaviour is only exhibited as the organism
matures (if ever).

Sorry this is so long, but here's where I'm going with this. 

The little robots, when they start out, could use a similar structure:

An electro-chemical system - they don't need to "know" how to get
electricity from their batteries and use it, it just happens.

Simple reflexes - blink, jerk back from hazards etc.

simple behaviours that ensure survival - hunger/food: how "hungry" am
I? what is the most direct route to the nearst food source? I have to
go to the nearest food source while I still have enough energy. This
could be coded as ROM and may or may not use a dedicated processor (a
sub-conscious/background  task if you will)

A neural network, possibly seeded with simple actions - obstacle
behaviours: go around it, push it, pick it up, go over it. At this
point, the number of steps from the action to the evaluation of the
"goodness" of the action should be very small, 1-2 steps. As the robot
becomes more proficient it could try to mix actions together and take
more steps to evaluate "goodness". Example of an intermediate stage - 
pick up object A, place it next to object B as a ramp, go up ramp,
check out the top of object B. Example of an advanced stage: push
object B next to a much taller object you can't push (a tabletop or
counter) pick up object A, build a ramp, get on top of object B, pick
up A, build a ramp get on top of the counter, find a new food source
(power outlet)

I hope this made sense. I'm just a layman blessed with internet access
so if I've misused any terms please forgive me. I'll go back to
lurking now 

Steve Bult

