**Figure 2:** Global linear regression on some one dimensional data sets

Linear regression is an old statistical method of determining relationships between variables. It finds the linear function (in the 1-d case, a straight line) which minimizes the sum of squared error between the function and all the data points.

We can use Vizier to see what happens when linear regression is applied to some sample data sets. All of the data sets in this tutorial can be found in the data subdirectory of the Vizier installation.

File -> Open -> j1.mbl Edit -> Metacode -> Regression L: Linear Localness 9: Global Neighbors 0: No Nearest Neighbors Model -> Graph -> Graph

In the previous operations, you did 3 separate things. First you loaded
the data file named *j1.mbl*. Second, you specified that you wanted
to do linear regression. Don't worry about the meaning of the various
fields in the Metacode editor yet. They'll be described in more detail
later. Finally, you drew a graph showing the data in the file and a fitted
line from the linear regression. Again, don't worry about all the options
available for graphing. They'll be described in more detail later. If you
are not using Vizier to draw the graph, you can see what it looks like in
fig. 2a. From the graph, it is evident that linear
regression has captured a significant trend in the data, but has not
accurately modeled the relationship.

Next, we look at another data set and linear regression applied to it.

File -> Open -> k1.mbl Model -> Graph -> Graph

The resulting graph is shown in fig. 2b. Here there is no noise in the data and linear regression has a better fit. Unfortunately, it is also obvious that some of the relationship has been glossed over.

We'll look at one more data set and linear regression applied to it.

File -> Open -> a1.mbl Model -> Graph -> Graph

The resulting graph is shown in fig. 2c. In this case linear regression appears to be a reasonable choice. There is significant noise in the data, but the underlying relationship seems mostly linear.

The models for the first two graphs suffer from undesirable *bias*.
Bias refers to the underlying assumption made about the form of the
relationship made by a particular function approximator. In these examples,
the assumption is that the relationship is a straight line and any data
not matching that assumption is poorly represented.

Fri Feb 7 18:00:08 EST 1997