Newsgroups: comp.ai.fuzzy
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!news.alpha.net!uwm.edu!msunews!harbinger.cc.monash.edu.au!yarrina.connect.com.au!labtam!labtam!chris
From: chris@labtam.labtam.oz.au (Chris Taylor)
Subject: Re: Membership functions and non-existant data.
Message-ID: <chris.799481722@labtam>
Organization: Labtam Australia Pty. Ltd., Melbourne, Australia
References: <D7Bv35.DBn@news.hawaii.edu> <3n95dv$l29@nntp.crl.com> <dfuessD7FvB0.9wp@netcom.com> <D7JEx7.Ez9@news.hawaii.edu>
Date: Wed, 3 May 1995 06:15:22 GMT
Lines: 142

jamesw@uhunix.uhcc.Hawaii.Edu (James Williams) writes:

>Let's start from the real world.  Let's say that I have a set of data,
>say a profile of biochemical variables measured in the blood.  Let's 
>say I am implimenting a fuzzy expert system to detect a disorder, i.e.
>a set of fuzzy rules layered on membership functions. 

>If the membership functions default to zero when the input to the 
>membership function is unknown (i.e. the variable was not measured)
>then the membership function is giving out blarny rather than saying
>I don't know.  The expert system will then give out blarny.  

>It seems to me that the membership function should have another dimension.
>That is how sure it is.
>....


>So, back to the non-available data input into the membershipf function.
>The membership function might output two values, the membership and the
>"confidance" of the membership.  Where the range of confidance is between
>1 and 0.  So given no input, the membership function would output a 
>default membership value with a "confidance" of 0 (sholder shrug).  The
>"confidance" values would then just be shoved through the AND (MIN) and 
>OR (MAX) precepts to give "certainty" of the membership in the precept.  

>In real life we are not equally "sure" or "certain" about every precept.
>Some are half baked, others are stronger.


I think the issue here is the difference between a simple 'fuzzy' approach 
and a more foolproof 'expert' approach.

The fuzzy approach is good for obtaining an approximation to a
system, while using as few details as possible.
i.e Try to get away with using a few data points to characterise
the system 'adequately', and a few rules in the charaterisation.

Of course if you can supply ALL the possible data points and
obtain ALL the rules then you have modelled the system perfectly,
but often this is impractical and so you just go so far as to obtain
a model percieved to be accurate enough.

One assumes (hopes) that the fuzzy approximation will model the
system well enough to fill in the missing data points.
Of course there are no guarentees that the extrapolation it uses will
be valid - a smoothish extrapolation ought to occur which should
at least have a good chance of working.

This approach is suitable for efficient implimentation of non-critical systems,
where the result of the fuzzy approximation being very wrong upon encountering
one of the missing data inputs in practice wont be catastrophic.

For a critical system, you want to leave as little to chance
as possible. Assumimg that complete specification is impractical
then you need to augment under-specification with confidence levels
for the purposes of risk management.

(i.e I am differentiating 'fuzzy' from 'expert' here on the basis
that experts deal with more critical decisions and experts ought to
have inbuilt knowledge about how accurate their decisions are - 
although either can be systems based on fuzzy logic principles) 


When an input is not available AT ALL, that is different to having an
input that wasn't included in the characterisation procedure. 
(Now you have the double risk of not knowing the input itself
along with the potential that the real input corresponds to an
unspecified input case)
You need to perform some action appropriate to the risks.

Either you catch it out completely, or tranform it into a value that
produces a suitable response from the fuzzy inference system.
This is system dependant. It may be suitable to assign a value corresponding
to an 'average' data value, or a value that tends to triggers a rather 
average or safe response in general.


>> 
>> IF Cholesterol HIGH and OtherTests LOW then
>>       Disease_X_is_present
>> 
>> IF Cholesterol LOW and OtherTests LOW then
>>      Disease_Y_is_present
>> 
>> IF Cholesterol HIGH and OtherTests HIGH then
>>     Disease_Z_is_present
>> 
>> IF Cholesterol LOW and OtherTests HIGH then
>>     Healthy_Patient
>> 
>> Just as examples.  Now, when you run a set of tests, usually the
>> result for Cholesterol is somewhere in the HIGH set or the LOW set,
>> and maybe in both sets if they overlap.  However, if the test is not
>> performed, then the result should be in neither HIGH nor LOW.  Then
>> (using max-min composition) none of the rules listed above would
>> "fire", and your final diagnosis would be based on the output of
>> other rules in the system.
>> 
>
>Take a closer look at the expected performance of the rules. If data are not 
>present for any of the rule inputs (i.e. you simply do not know and cannot 
>infer...), the rule should not fire.


It depends how critical your system is.
If you are happy for a fuzzy approximation to make an educated guess
when it meets an unknown condition then if cholesterol is unknown then maybe
plug in a value corresponding to cholesterol_high=0.5 (also implies
cholesterol_low=0.5) and see whether a suitable response occurs.
(Or at least a relatively safe response occurs - e.g incorrectly diagnosing
diseaseZ might be less serious that diagnosing healthy_patient if the treatment
for diseaseZ is harmless - or conversly if diseaseZ is trivial but it's
treatment has side-effects...).

If the system is not allowed to make such guesses then the system
should catch the unknown and warn that it can't make a valid decision.

Alternatively you might associate a degree of confidence to
input conditions dependant on whether they where included 
during specification (or how similar they are to conditions that
were specified).

Another approach on some systems may be to add a sanity checker that detects
guesses that appear to be catastropic.


For a critical system like the medical diagnosis above, a system that
spat out one definite answer without any cholesterol input would be probably
be unwise. Some figure of confidence should be supplied with the decision. 

With cholesterol unknown you might expect a response something like:

(with Othertests HIGH)  => Disease_Z_is_present = 50% chance
                           Healthy_patient = 50% chance

(with Othertests LOW)  => Disease_X_is_present = 50% chance
                          Disease_Y_is_present = 50% chance

or simply

Cannot diagnose - please enter cholesterol sample

