next up previous
Next: Robustness analysis in JavaBayes Up: JavaBayes Version 0.346 Bayesian Previous: Using JavaBayes

Subsections


Loading and saving data in JavaBayes

Data can be locally loaded/saved when you use JavaBayes as an application. Note that applets cannot load/save data (they are forbidden by the browsers)!

Applications and applets can read Bayesian networks through the Internet; this opens the possibility that JavaBayes be used to help process and organize the huge amounts of data and knowledge in the Internet.

This section contains a detailed description of the formats that can be manipulated by JavaBayes. If you have no interest on this kind of information (if you are not reading/writing files for JavaBayes), you can skip this section entirely.

All the formats

There are three different formats, and all three are supported by JavaBayes in the sense that JavaBayes can read files written on them.

The Bayesian Interchange Format version 0.1 (BIF 0.1) is a simple format, that has been succesfully used to represent a variety of networks. But BIF 0.1 had certain problems, and has been replaced by BIF version 0.15. BIF 0.15 is a more mature format and should work for most applications.

XMLBIF 0.3 is an experimental format, based on the new XML specification. The best way to understand it is to read about BIF 0.15, then read something about XML, then read the description of XMLBIF 0.3.

Because BIF 0.15 supercedes BIF0.1, JavaBayes does not save files in BIF 0.1 anymore. You can choose between XMLBIF 0.3 and BIF 0.15 in the Options menu.

Note that no format supports Noisy functions (since JavaBayes does not support those functions yet). The BIF formats also use the general concept of a property; implementations of the BIF format can use specific properties. JavaBayes handles some properties, such as observed, explanation and credal-set, which are explained later on.

For files, any extension is possible, but the extension bif is recommended for BIF 0.15, and the extension xml is tentatively used for XMLBIF 0.3.

Representing probability values

It is important to understand how the JavaBayes formats handle the specification of probability values. All distributions are specified as arrays of real numbers, and the meaning of the numbers depends on the definition of the distribution. Note that the same representation is used in internal arrays to store and manipulate probability values.

The distribution p(f) in the example above can be specified as follows:

0.15, 0.85

Let's consider a more complicated example. The function p(d|f,b) is given by

0.99, 0.90, 0.97, 0.30, 0.01, 0.10, 0.03, 0.70

The logic is simple: proceed as if you were filling a table, where the indices of the table vary from the right to left (in the example above, it is like binary counting because all variables have only two values).

A more complicated example would be a function p(A|B,C) where A has 3 values, B has 2 values and C has 4 values. The function is represented as:

p(A1|B1 C1) p(A1|B1 C2) p(A1|B1 C3) p(A1|B1 C4)


p(A1|B2 C1) p(A1|B2 C2) p(A1|B2 C3) p(A1|B2 C4)


p(A2|B1 C1) p(A2|B1 C2) p(A2|B1 C3) p(A2|B1 C4)


p(A2|B2 C1) p(A2|B2 C2) p(A2|B2 C3) p(A2|B2 C4)


p(A3|B1 C1) p(A3|B1 C2) p(A3|B1 C3) p(A3|B1 C4)


p(A3|B2 C1) p(A3|B2 C2) p(A3|B2 C3) p(A3|B2 C4).

IMPORTANT: Notice that there is some redundancy in the values, because all probability functions must add up to one. Right now the BayesianNetworks package does not attempt to fill blanks or ensure consistency; the user has to provide the data in the correct format (it has to have the correct number of values, has to add to one, etc).

BIF version 0.15

White spaces, tabs and newlines are ignored; the C/C++ style of comments is adopted. The ``,'' character is also ignored when it occurs between tokens.

The basic unit of information is a block: a piece of text which starts with a keyword and ends with the end of an attribute list (to be explained later). Arbitrary characters are allowed between blocks. This allows the user to insert arbitrarily long comments outside the blocks. It also allows user-specific blocks and commands to be placed outside the standard blocks.

Other than blocks, the BIF 0.15 refers to three entities: words, non-negative integers and non-negative reals.

A word is a contiguous sequence of characters, with the restriction that the first character be a letter. Characters are letters plus numbers plus the underline symbol (_) plus the dash symbol (-).

A non-negative number is a sequence of numeric characters, containing a decimal point or an exponent or both.

Blocks

A block is a unit of information. The general format of a block is:

     block-type block-name {
       attribute-name  attribute-value;
       attribute-name  attribute-value;
       attribute-name  attribute-value;
     }
with as many attributes as necessary. The closing semicolon is mandatory after each attribute.

There are three possible blocks: network, variable and probability blocks.

The blocks must be placed in the following order:

Attributes

Several attributes are defined at this point: property, type, table, default and entry attributes (the entry attribute is not associated with any keyword).

The attribute property can appear in all types of blocks. A property is just a string of arbitrary text to be associated with a block. Examples of properties:

     property "size 12";
     property "name Trial number ten";
Any text is valid in the string following keyword property. The idea is to store information that is specific to a particular system or network in the properties. Any number of property attributes can appear in a block.

The type attribute is specific to variable blocks. The property type lists the values of a discrete variable:

    type discrete[ number-of-values ] { list-of-values };
The number-of-values token is a non-negative integer which indicates how many different values this variable may assume (the size of the list-of-values). The list-of-values is a sequence of words, each one the name of a variable value.

There are attributes that are specific to probability blocks (these attributes are discussed in the next section):

The JavaBayes properties

JavaBayes uses a number of properties to load and save information about Bayesian networks:

There are also properties that are related to robustness analysis in JavaBayes. Since robustness analysis is still an ongoing research project, the support for it is minimal. If you want to use robustness analysis now, please send me email. The properties related to robustness analysis always start with the keyword credal-set; if you are defining your own properties, please do not use this keyword.

Probability Blocks

Probability blocks are used to define the actual network topology and conditional probability tables.

An example of a standard probability block is:

probability("GasGauge" | "Gas", "BatteryPower") {
           ("yes", "high") 0.999 0.001;
           ("yes", "low") 0.850 0.150;
           ("yes", "medium") 0.000 1.000;
           ("no", "high") 0.000 1.000;
           ("no", "low") 0.000 1.000;
           ("no", "medium") 0.000 1.000;
}
As explained before, the symbol `,'' is ignored between tokens so it does not affect the list of variables given after the keyword probability. The variables however must be enclosed by parenthesis.

The example above uses the entry attribute, which is different from the other attributes in that it has no keyword. It simply starts with an opening parenthesis, and has a list of values for all the conditioning variables. After the closing parenthesis, a list of probability values for the first variable is given (the user must provide numbers that add to 1, but this is not mandatory).

The probability vectors can be listed in any order, since the names in parentheses uniquely identify the parent instantiation.

In addition to the entry attribute, the BIF 0.15 supports the concept of a default entry. So the above CPT could have been specified equivalently as:

probability("GasGauge" | "Gas", "BatteryPower") {
            default 0.000 1.000;
            ("yes", "low")  0.850 0.150;
            ("no", "medium") 0.000 1.000;
}
Note that each number is a separate token, so we can use ``,'' between numbers.

Another way to define a probability distribution is through the table attribute. The body of such attribute is a sequence of non-negative real numbers, in the counting order of the declared variables (if all variables were binary, we would say binary counting with least significant digit in the right). So, for the example above, we could simply say:

probability("GasGauge" | "Gas", "BatteryPower") {
           table 0.999 0.850 0.0 0.0 0.0 0.0 0.001 0.15 1.0 1.0 1.0 1.0;
}

There are some subtle rules that regulate these declarations.

Examples

Here are some of the available examples:

Here is the dog-problem.bif network:

// Bayesian Network in the Interchange Format
// Produced by BayesianNetworks package in JavaBayes
// Output created Sun Nov 02 17:49:49 GMT+00:00 1997
// Bayesian network
network "Dog-Problem" { //5 variables and 5 probability distributions
	property "credal-set constant-density-bounded 1.1" ;
}
variable  "light-on" { //2 values
	type discrete[2] {  "true"  "false" };
	property "position = (218, 195)" ;
}
variable  "bowel-problem" { //2 values
	type discrete[2] {  "true"  "false" };
	property "position = (335, 99)" ;
}
variable  "dog-out" { //2 values
	type discrete[2] {  "true"  "false" };
	property "position = (300, 195)" ;
}
variable  "hear-bark" { //2 values
	type discrete[2] {  "true"  "false" };
	property "position = (296, 268)" ;
}
variable  "family-out" { //2 values
	type discrete[2] {  "true"  "false" };
	property "position = (257, 99)" ;
}
probability (  "light-on"  "family-out" ) { //2 variable(s) and 4 values
	table 0.6 0.05 0.4 0.95 ;
}
probability (  "bowel-problem" ) { //1 variable(s) and 2 values
	table 0.01 0.99 ;
}
probability (  "dog-out"  "bowel-problem"  "family-out" ) { //3 variable(s) and 8 values
	table 0.99 0.97 0.9 0.3 0.01 0.03 0.1 0.7 ;
}
probability (  "hear-bark"  "dog-out" ) { //2 variable(s) and 4 values
	table 0.7 0.01 0.3 0.99 ;
}
probability (  "family-out" ) { //1 variable(s) and 2 values
	table 0.15 0.85 ;
}

BIF version 0.1

White spaces, tabs and newlines are ignored; the C/C++ style of comments is adopted. Two other characters are also ignored when they occur between tokens: ``,'' and ``|''. These characters can be used to separate variables in the definition of a probability distribution.

The basic unit of information is a block: a piece of text which starts with a keyword and ends with the end of an attribute list (to be explained later). Arbitrary characters are allowed between blocks. This allows the user to insert arbitrarily long comments outside the blocks. It also allows user-specific blocks and commands to be placed outside the standard blocks.

Other than blocks, the BIF 0.1 refers to three entities: words, non-negative integers and non-negative reals.

A word is a contiguous sequence of characters, with the restriction that the first character be a letter. Characters are letters plus numbers plus the underline symbol (_) plus the dash symbol (-).

A non-negative number is a sequence of numeric characters, containing a decimal point or an exponent or both.

Blocks

A block is a unit of information. The general format of a block is:

     block-type block-name {
       attribute-name  attribute-value;
       attribute-name  attribute-value;
       attribute-name  attribute-value;
     }
with as many attributes as necessary. The closing semicolon is mandatory after each attribute.

There are three possible blocks: network, variable and probability blocks.

The blocks must be placed in the following order:

Attributes

Several attributes are defined at this point: property, type, table, default and entry attributes (the entry attribute is not associated with any keyword).

The attribute property can appear in all types of blocks. A property is just a string of arbitrary text to be associated with a block. Examples of properties:

     property size 12;
     property name "Trial number ten";
Any text is valid between the keyword property and the ending semicolon. The idea is to store information that is specific to a particular system or network in the properties. Any number of property attributes can appear in a block.

The type attribute is specific to variable blocks. The property type lists the values of a discrete variable:

type discrete[ number-of-values ] { list-of-values };
The number-of-values token is a non-negative integer which indicates how many different values this variable may assume (the size of the list-of-values). The list-of-values is a sequence of words, each one the name of a variable value.

There are attributes that are specific to probability blocks (these attributes are discussed in the next section):

The JavaBayes properties

JavaBayes uses a number of properties to load and save information about Bayesian networks:

There are also properties that are related to robustness analysis in JavaBayes. Since robustness analysis is still an ongoing research project, the support for it is minimal. If you want to use robustness analysis now, please send me email. The properties related to robustness analysis always start with the keyword credal-set; if you are defining your own properties, please do not use this keyword.

Probability Blocks

Probability blocks are used to define the actual network topology and conditional probability tables.

An example of a standard probability block is:

probability(GasGauge | Gas, BatteryPower) {
           (yes, high) 0.999 0.001;
           (yes, low) 0.850 0.150;
           (yes, medium) 0.000 1.000;
           (no, high) 0.000 1.000;
           (no, low) 0.000 1.000;
           (no, medium) 0.000 1.000;
}
As explained before, the symbols ``|'' and ``,'' are ignored between tokens so they do not affect the list of variables given after the keyword probability. The variables however must be enclosed by parenthesis.

The example above uses the entry attribute, which is different from the other attributes in that it has no keyword. It simply starts with an opening parenthesis, and has a list of values for all the conditioning variables. After the closing parenthesis, a list of probability values for the first variable is given (the user must provide numbers that add to 1, but this is not mandatory).

The probability vectors can be listed in any order, since the names in parentheses uniquely identify the parent instantiation.

In addition to the entry attribute, the BIF 0.1 supports the concept of a default entry. So the above CPT could have been specified equivalently as:

probability(GasGauge | Gas, BatteryPower) {
            default 0.000 1.000;
            (yes, low)  0.850 0.150;
            (no, medium) 0.000 1.000;
}
Note that each number is a separate token, so we can use ``,'' and ``|'' between numbers; these symbols are ignored.

Another way to define a probability distribution is through the table attribute. The body of such attribute is a sequence of non-negative real numbers, in the counting order of the declared variables (if all variables were binary, we would say binary counting with least significant digit in the right). So, for the example above, we could simply say:

probability(GasGauge | Gas, BatteryPower) {
           table 0.999 0.850 0.0 0.0 0.0 0.0 0.001 0.15 1.0 1.0 1.0 1.0;
}

There are some subtle rules that regulate these declarations.

Example

Here is the dog-problem.bif network in BIF0.1:

// Bayesian Network in the Interchange Format
// Produced by BayesianNetworks package in JavaBayes
// Output created Tue Feb 25 12:55:25  1997
// Bayesian network
network Internal-Network{ //5 variables and 5 probability distributions
}
variable light-on{//2 values
	type discrete[2] { true false };
	property  position = (218, 195) ;
}
variable bowel-problem{//2 values
	type discrete[2] { true false };
	property  position = (335, 99) ;
}
variable dog-out{//2 values
	type discrete[2] { true false };
	property  position = (300, 195) ;
}
variable hear-bark{//2 values
	type discrete[2] { true false };
	property  position = (296, 268) ;
}
variable family-out{//2 values
	type discrete[2] { true false };
	property  position = (257, 99) ;
}
probability ( light-on family-out ) { //2 variable(s) and 4 values
	table 0.6 0.05 0.4 0.95 ;
}
probability ( bowel-problem ) { //1 variable(s) and 2 values
	table 0.01 0.99 ;
}
probability ( dog-out bowel-problem family-out ) { //3 variable(s) and 8 values
	table 0.99 0.97 0.9 0.3 0.01 0.03 0.1 0.7 ;
}
probability ( hear-bark dog-out ) { //2 variable(s) and 4 values
	table 0.7 0.01 0.3 0.99 ;
}
probability ( family-out ) { //1 variable(s) and 2 values
	table 0.15 0.85 ;
}

XMLBIF version 0.3

The XMLBIF format provides a different perspective for the storage and manipulation of Bayesian networks. Instead of focusing on a readable and simplified description of Bayesian networks, the XMLBIF format emphasizes ease of distribution through wide area networks. The XMLBIF format is defined through XML, a dialect of SGML that is used to specify formats. The advantage of XML is that it has industry-wide support, and many software developers plan to introduce parsers, search-engines, and browsers for XML. The power of XML is that it is a standard language for editing formats, and XMLBIF attempts to use XML to reduce to a minimum the burden of distributing graphical models to a large audience.

The XMLBIF format is actually quite similar to BIF 0.15, but it is stated in a manner that is XML-compliant. Note the similarity of XMLBIF to HTML; this happens because both HTML and XML are dialects of SGML.

White spaces, tabs and newlines are ignored. The XML style of comments and declarations is used to detect text that should be ignored: any character between <! and > is ignored. Note that XML comments should be enclosed by <!- and ->.

The XMLBIF format is defined by a set of XML-compliant tags. Other than XML tags, the XMLBIF 0.3 refers to three entities: words, non-negative integers and non-negative reals.

A word is a contiguous sequence of characters, with the restriction that the first character be a letter. Characters are letters plus numbers plus the underline symbol (_) plus the dash symbol (-).

A non-negative number is a sequence of numeric characters, containing a decimal point or an exponent or both.

Note that every XML file starts with the expression <?xml version="1.0"?>, indicating the XML version. Other attributes and directives can be contained within this tag; for example, the tag <?xml version="1.0" encoding="US-ASCII"?> specifies the file encoding. This initial tag is followed by any XML definitions and statements that define the DTD for the document (the DTD is always optional in XML).

Networks, variables and probabilities

The first tag of a XMLBIF 0.3 file is the <BIF> tag; the last tag is the closing </BIF> tag. All the information about the model is contained between these tags. There are three basic units of information: network, variable and probability densities.

A network is defined by its name, followed by a list of properties (optional), followed by a list of variables and probability densities. For example, a network may be defined as:

<BIF VERSION="0.3">
<NETWORK>
<NAME>Dog-Problem</NAME>
<PROPERTY>date Sunday, 19 July, 1998</PROPERTY>
<PROPERTY>author John</PROPERTY>

	variables and probabilities go here

</NETWORK>
</BIF>
The VERSION attribute in the BIF tag is mandatory.

Variables are defined by their names, types and properties:

<VARIABLE TYPE="chance">
	<NAME>light-on</NAME>
	<OUTCOME>true</OUTCOME>
	<OUTCOME>false</OUTCOME>
	<PROPERTY>position = (73, 165)</PROPERTY>
</VARIABLE>

Conditional probability densities can be specified in various ways inside the DEFINITION tag. One example is:

<DEFINITION>
	<FOR>hear-bark</FOR>
	<GIVEN>dog-out</GIVEN>
	<TABLE>0.7 0.01 0.3 0.99 </TABLE>
</DEFINITION>

There is no mandatory order of variable and probability blocks.

A property is just a string of arbitrary text to be associated with a block. Examples of properties:

    <PROPERTY>size 12</PROPERTY>
    <PROPERTY>comment Trial number ten</PROPERTY>
Any text is valid in the string inside the PROPERTY opening and closing tags. The idea is to store information that is specific to a particular system or network in the properties. Any number of property attributes can appear in a block.

A variable is defined by a NAME tag (with the TYPE attribute), and its possible OUTCOMES:

<VARIABLE TYPE="chance">
	<NAME>light-on</NAME>
	<OUTCOME>true</OUTCOME>
	<OUTCOME>false</OUTCOME>
	<PROPERTY>position = (73, 165)</PROPERTY>
</VARIABLE>
Currently the content of a TYPE attribute must be the keyword either ``chance'' or ``decision'' or ``utility''.

The TABLE tag is specific to the DEFINITION block (note that a definition can be a probability distribution, a set of decision values or a set of utility values, depending on the TYPE attributes of the referred variable). DEFINITION blocks are used to define the actual network topology, by specifying conditional probability tables.

An example of a standard probability block is:

<DEFINITION>
	<FOR>GasGauge</FOR>
	<GIVEN>BatteryPower</GIVEN>
	<GIVEN>GasInTank</GIVEN>
	<TABLE>1.0 0.0 0.2 0.0 0.0 1.0 0.8 1.0 </TABLE>
</DEFINITION>
for a variable GasGauge that is defined with TYPE equal to ``chance''. The body of the TABLE tag is a sequence of non-negative real numbers, in the counting order of the declared variables (if all variables were binary, we would say binary counting with least significant digit in the right). If multiple table declarations exist, only the last one is valid.

The JavaBayes properties

JavaBayes uses a number of properties to load and save information about Bayesian networks:

There are also properties that are related to robustness analysis in JavaBayes. Since robustness analysis is still an ongoing research project, the support for it is minimal. If you want to use robustness analysis now, please send me email. The properties related to robustness analysis always start with the keyword credal-set; if you are defining your own properties, please do not use this keyword.

Examples

Here are some of the available examples:

Here is the dog-problem.xml network:

<?xml version="1.0" encoding="US-ASCII"?>


<!--
	Bayesian network in XMLBIF v0.3 (BayesNet Interchange Format)
	Produced by JavaBayes (http://www.cs.cmu.edu/~javabayes/
	Output created Wed Aug 12 21:16:40 GMT+01:00 1998
-->



<!-- DTD for the XMLBIF 0.3 format -->
<!DOCTYPE BIF [
	<!ELEMENT BIF ( NETWORK )*>
	      <!ATTLIST BIF VERSION CDATA #REQUIRED>
	<!ELEMENT NETWORK ( NAME, ( PROPERTY | VARIABLE | DEFINITION )* )>
	<!ELEMENT NAME (#PCDATA)>
	<!ELEMENT VARIABLE ( NAME, ( OUTCOME |  PROPERTY )* ) >
	      <!ATTLIST VARIABLE TYPE (chance|decision|utility) "chance">
	<!ELEMENT OUTCOME (#PCDATA)>
	<!ELEMENT DEFINITION ( FOR | GIVEN | TABLE | PROPERTY )* >
	<!ELEMENT FOR (#PCDATA)>
	<!ELEMENT GIVEN (#PCDATA)>
	<!ELEMENT TABLE (#PCDATA)>
	<!ELEMENT PROPERTY (#PCDATA)>
]>


<BIF VERSION="0.3">
<NETWORK>
<NAME>Dog-Problem</NAME>

<!-- Variables -->
<VARIABLE TYPE="chance">
	<NAME>light-on</NAME>
	<OUTCOME>true</OUTCOME>
	<OUTCOME>false</OUTCOME>
	<PROPERTY>position = (73, 165)</PROPERTY>
</VARIABLE>

<VARIABLE TYPE="chance">
	<NAME>bowel-problem</NAME>
	<OUTCOME>true</OUTCOME>
	<OUTCOME>false</OUTCOME>
	<PROPERTY>position = (190, 69)</PROPERTY>
</VARIABLE>

<VARIABLE TYPE="chance">
	<NAME>dog-out</NAME>
	<OUTCOME>true</OUTCOME>
	<OUTCOME>false</OUTCOME>
	<PROPERTY>position = (155, 165)</PROPERTY>
</VARIABLE>

<VARIABLE TYPE="chance">
	<NAME>hear-bark</NAME>
	<OUTCOME>true</OUTCOME>
	<OUTCOME>false</OUTCOME>
	<PROPERTY>position = (154, 241)</PROPERTY>
</VARIABLE>

<VARIABLE TYPE="chance">
	<NAME>family-out</NAME>
	<OUTCOME>true</OUTCOME>
	<OUTCOME>false</OUTCOME>
	<PROPERTY>position = (112, 69)</PROPERTY>
</VARIABLE>


<!-- Probability distributions -->
<DEFINITION>
	<FOR>light-on</FOR>
	<GIVEN>family-out</GIVEN>
	<TABLE>0.6 0.05 0.4 0.95 </TABLE>
</DEFINITION>

<DEFINITION>
	<FOR>bowel-problem</FOR>
	<TABLE>0.01 0.99 </TABLE>
</DEFINITION>

<DEFINITION>
	<FOR>dog-out</FOR>
	<GIVEN>bowel-problem</GIVEN>
	<GIVEN>family-out</GIVEN>
	<TABLE>0.99 0.97 0.9 0.3 0.01 0.03 0.1 0.7 </TABLE>
</DEFINITION>

<DEFINITION>
	<FOR>hear-bark</FOR>
	<GIVEN>dog-out</GIVEN>
	<TABLE>0.7 0.01 0.3 0.99 </TABLE>
</DEFINITION>

<DEFINITION>
	<FOR>family-out</FOR>
	<TABLE>0.15 0.85 </TABLE>
</DEFINITION>


</NETWORK>
</BIF>


next up previous
Next: Robustness analysis in JavaBayes Up: JavaBayes Version 0.346 Bayesian Previous: Using JavaBayes
Fabio Gagliardi Cozman
2001-1-31