Data Mining Assignment 2: Sample Input

Sample Files: Test Files: Sample Input 1:
We observe the work of customer-service representatives and identify situations that lead to customer complaints; thus, we view complaints as positive examples. We describe each observation by six attributes.

    Service area: rural or urban
    Temperature: cold, warm, or hot
    Weather: sunny, cloudy, or rainy
    Number of representatives: understaffed, normal, or overstaffed
    Day of the week: weekday or weekend
    Number of customers: small, medium, or large

The training examples show that we get complaints in urban areas, when an understaffed service has to face a large number of customers. The candidate-elimination algorithm converges to one hypothesis:

    urban  ?  ?  understaffed  ?  large

This hypothesis leads to the following classification of the test instances:
urban cold rainy understaffed weekday large positive
urban warm cloudy normal weekend large negative

Sample Input 2:
If a grocery store reduces the price of some items, it may increase the volume of sales. We need to identify the combinations of reduced-price products that increase sale; thus, we view an increase in the sales volume as a positive example. We describe a combination of products by five attributes.

    Meat: steak, chicken, or beef
    Dairy: cheese or milk
    Produce: carrots or peas
    Cereal: healthy or junk
    Snacks: chips, cookies, or candy

The candidate-elimination algorithm cannot find any hypothesis consistent with the training examples, and it terminates with a failure.

Sample Input 3:
We next consider six factors that may affect the profits of a department store, and we view profitable stores as positive examples.

    Neighborhood: poor, average, or rich
    Surrounding buildings: houses or apartments
    Age of residents: young, middle, or old
    Waterfront: nowater, lake, or beach
    Area: rural or urban
    Housing costs: low, medium, or high

The candidate-elimination algorithm finds several hypotheses consistent with the training examples.

Most specific hypothesis:
rich ? middle beach ? low

Most general hypotheses:
rich ? ? beach ? ?
? ? middle beach ? ?
? ? ? ? ? low

These hypotheses lead to the following classification of the test instances:
rich houses middle beach rural low positive
rich apartments middle nowater urban medium negative
average houses middle nowater rural low unknown
Back to the Data Mining home page
.