Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!gatech!newsfeed.internetmci.com!in1.uu.net!news3.ottawa.istar.net!istar.net!news2.toronto.istar.net!usenet.Hydro.ON.CA!ohrd!news
From: fischerd@rd.hydro.on.ca (Dan Fischer)
Subject: Re: How to classify with only 1 class?
Organization: Ontario Hydro Technologies
Date: Mon, 29 Apr 96 20:05:04 GMT
Message-ID: <fischerd.6.0@rd.hydro.on.ca>
Lines: 74
References: <4lhhvr$693@nntp5.u.washington.edu> <4lhvip$8pe@ss10.elvis.ru> <4ljgsm$cp9@news.sandia.gov> <4lpg63$7gi@dfw-ixnews8.ix.netcom.com> <4m2rid$51b@leopard.wmin.ac.uk>
Sender: news@rd.hydro.on.ca (News Account)
Nntp-Posting-Host: pc-fischerd.rd.hydro.on.ca

In article <4m2rid$51b@leopard.wmin.ac.uk> clemenr@westminster.ac.uk (Ross Clement) writes:
>From: clemenr@westminster.ac.uk (Ross Clement)
>Subject: Re: How to classify with only 1 class?
>Date: 29 Apr 1996 17:42:53 +0100
>In article <4lpg63$7gi@dfw-ixnews8.ix.netcom.com>,
>Jive Dadson  <jdadson@ix.netcom.com> wrote:
>>
>>I keep expecting some guru to say, "It's impossible." I mean, it is
>>isn't it? Impossible to "train" a neural network to classify data using
>>data from only one class as a training set, that is.
>>
>>What you could do is use a density estimator on the one-class training
>>set. Then if your unclassified datum came from a low-density area you
>>could speculate that the datum MIGHT not be from the class, but there
>>would be no way to put a confidence bound on that without at least
>>guessing about the density distribution of "others".
>
>Couldn't you just generate a whole lot of random inputs that don't
>exactly match any of the data in your 'one class' training set, and
>make these the 'second class', and train that way.
>
>I'm not too much up on Neural Networks, so don't know if this would
>work, but it sounds logical to me :-) Personally I'd use case-based
>reasoning for the above (cue flames).
>
>Cheers,
>
>Ross-c
There are several problems with this approach:
If the 'second class' data is generated randomly, it will mix with the 
'class one' data.  The network will have an easy time categorizing an input 
if it is not 'class one' but, ironically, will have difficulty determining a 
class one input.  Now, you say, Ok, let's make sure that the random 'class 
two' inputs do not land on 'class one' and you start making these vectors.  
To make my point, let's assume that we have 2 dimensional inputs that sit 
on a curve.  As long as input vectors land on this curve the inputs belong 
to class 1 (good system).  The moment we get a 'different' input (does not 
land on the curve), the input belongs to class two (bad system).  For this 
problem, not only we have to produce fake class 2 inputs away from the 
curve (and not mixed with class 1 inputs from the curve), but we would like 
the class 2 to be close to the curve (for increased sensitivity to the 'not 
class one, on the curve' detection).

The way I solved this problem is the following:  rather then take the whole 
input vector and apply it to the input of the NN and declare the output to 
be of the class one value, I changed one input component (feature) and 
considered it as an output.  Now, the problem is changed to a system 
modelling problem from a classification problem.  The task is to determine 
the function that ties the N-1 inputs to the 1 'output' (ex Nth input).  As 
long as the N inputs are consistent (belong to the same class) the function (
hopefully an easy one to approximate) will produce the Nth value from the 
given N-1 ones.  Now compare the predicted value with the actual Nth input.  
If they are close the input belongs to class one.  If they are not, that 
means that N-1 and Nth input DO NOT belong to the class the network was 
tranined on, are not consistent.

Now the question is, which input should become an ouput to determine the 
function f.  I think we would like this function to be monotonic and with a 
small variation in its output.  Why?  Again consider the case of a 2 feature 
input.  Now we flip one input and make it an output.  So the pair (x1, x2)->
class 1, becomes x1 -> x2. If we get another input (x1, x2') we would like 
the system to say that this pair does not belong to class 1, i.e. a small 
change in x2 moves (x1, x2) away from the curve.  The best curve x2 = f(x1) 
is a horizontal curve; here x2-x2'= distance to curve.  The worst is a 
vertical curve: both (x1,x2) and (x1, x2') land on the curve.  This is why I 
said f should have a small variation.  I wanted f to be monotonic so that 
one x1 results in one and only one x2, hence if x2'-x2 <> small, the (x1,
x2') is not on the curve.

I used the above method successfully on a 3-input vector.  I kept 2 inputs 
and treated the third as an output, letting the NN to determine f.


Any comments?
