Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!howland.reston.ans.net!news.sprintlink.net!crash!pzcc.bitenet!news
From: Duane DeSieno <duaned@cts.com>
Subject: Re: NN and Games
Organization: /etc/organization
Date: Sat, 14 Jan 1995 08:13:00 GMT
Message-ID: <D2Dytp.Ixo@crash.cts.com>
References: <3ep29n$g1o@newsbf02.news.aol.com> <Rw-6JKJ.predictor@delphi.com> <D25rK4.BD9@crash.cts.com> <D2BtHA.7nJ@undergrad.math.uwaterloo.ca>
Sender: news@crash.cts.com (news subsystem)
Nntp-Posting-Host: loci.cts.com
Lines: 78

Ed,

> >I do not agree with what you have said about games.  The students in 
> >a course I teach were able to train a network to play optimal 
> >TicTacToe using the temporal difference algorithm. They were not 
> >allowed to use any expert knowledge about the game to train the 
> >networks(or rotation invarients).  Only self play or play against
> >a random player was allowed.
> 
> However, how did the neural network indicate the move to play?  Did
> an external algorithm generate all (legal) next states, and ask the
> neural net the value of each one, or did the neural network indicate
> the move to make?  If the former, this was the point being made (that
> other programming paradigms handle, for example, this simple
> constraint problem much more efficiently than neural networks).  If
> the latter, then what happens if the neural network indicates an
> illegal move (how do you first correct, and then ensure, that the
> network only ever plays legal moves).
> 
> While I think Will has used a not entirely correct example, the
> use of other programming methods allow for only legal moves to
> be evaluated/suggested, something that cannot be guaranteed to
> be accomplished in a neural network in a game of any complexity
> (to verify that the neural network doies play only legal moves, you
> would have to either analyze the entire network, or compare every
> possible board configuration which may be exponentially explosive).
> 
> >Another student trained a network to play black-jack at the level of 
> >the "Beat the Dealer" card counting system.
> 
> This is much easier as there is essentially only two next states
> (get card, do not get card) both of which are always legal in
> any position the network has to make a choice.
> 
> >Gerald Tesauro trained a network to play backgammon at a world class 
> >level, using the TD(0) algorithm.  
> 
> From his papers, there is still another layer of the program the generate
> all legal next states, and the neural network only evaluates legal
> continuations.  The complexity of a neural network that could determine
> all legal states, and only play legal moves, would be extremely high, with
> no method to guarantee that the computer would continue to insist on a
> illegal move.
> 
> >Duane DeSieno
> >Logical Designs Consulting, Inc.
> >2015 Olite Ct.
> >La Jolla, CA 92037
> >(619)459-6236
> 
> I do feel that neural network technology has a ways to go before we
> can do everything "within" the network.  In particular, analysis of
> even some parts of the network to verify absolutely correct operation
> (which _must_ occur for legal move generation) is still far away.
> Because of this, the use of other methods will continue to be used
> (besides which, how do you get a neural network to generate "plans"?).
> 
> Ed
> 
You are correct that the neural network is only part of a game playing
system.  External code simulated the game and checked for the end of 
the game.  This code also coded the rules of the game to avoid illegal
moves.  I believe, that the network would have been capable of learning
the rules also, but this would have made the learning task even harder.
The network was used to choose the correct move.  In most
cases, the net was trained to evaluate a board position.  The output
of the net was the expected "value" of the game.  For TicTacToe, 1 ment
"x" would win -1 ment "o"(draw = 0).  Assigning a value to each board position
would require expert information.  This was not allowed execpt at the
end of the game. The TD algoorithm did the rest.

The point of the project was to show that a network could learn in an
Autonomous manner. Again, a class project is handled in a different way
than a commercial project.  Most of the projects I have worked on as a 
consultant, the neural network was only a small part of the whole
system. 

Duane
