Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!rochester!cornellcs!newsfeed.cit.cornell.edu!newsstand.cit.cornell.edu!news.kei.com!newsfeed.internetmci.com!in2.uu.net!daver!dlb!megatest!news
From: Dave Jones <djones>
Subject: Re: Q: Relative entropy or Kullback-Leibler divergence
Content-Type: text/plain; charset=us-ascii
Message-ID: <DHuJr3.EKx@Megatest.COM>
To: mglinws@aol.com
Sender: news@Megatest.COM (News Admin)
Nntp-Posting-Host: pluto
Content-Transfer-Encoding: 7bit
Organization: Megatest Corporation
References: <47igt6$7cs@newsbf02.news.aol.com>
Mime-Version: 1.0
Date: Fri, 10 Nov 1995 21:30:38 GMT
X-Mailer: Mozilla 1.1N (X11; I; SunOS 5.4 sun4m)
X-Url: news:47igt6$7cs@newsbf02.news.aol.com
Lines: 26

> I am not familiar with this measure and wondered if anyone could provide a
> simple explanation of how it works.

Not being a statistician, I may be able to give TOO simple and explanation!
I discovered the measure myself, and only later found out that it had a name.
I called it "average confusion", which I think is a pretty good intuition
helper. The idea is that the "accuracy" of an odds line (probability density
function) can be measured by averaging the amount of information that is
carried by the actual results. The lower the information contents of the
results, the "smarter" you can say the odds line was.

When an event occurs that had been assigned a high probability, little 
information is carried in the result. ("Yeah, big deal. I expected that.")
When a low probability event occurs, it carries much information.
("No joke? That's a surprize.")

The amount of information in a result, given its prior probability P,
is log2(1/P). You average those up for a large number of events.

It may sometimes be useful to normalize the information content for the
number of possible events. I call the resulting measure "average normalized
confusion". Just take the average confusion and divide by log2(1/N) where
N is the number of possible events.

                  Jive

