CARNEGIE MELLON UNIVERSITY

15-826 - Multimedia databases and data mining

Spring 2006

Homework 3 - Due: April 25, 3:01pm, in class.

Important:

For your information:

Q1: 1D DFT, DWT and SWFT[15 pts]

1.1 Problem description

Apply DFT to following datasets. Now repeat this with DWT (using Haar wavelets) instead of DFT.

1.2 What to hand in:

For DS1 and DS2, hand in the following:

  1. [2 pt] A plot of the original dataset,
  2. [4 pt] the amplitude-frequency plot for DFT. Does the plots agree with what you expected? Explain briefly (Hint: You might want to create a log-log amplitude-freq plot for the Yahoo dataset).
  3. [4 pt] the scalogram for DWT. Do the full level decomposition of the signals.

Moreover, for the Wavy dataset:

    4. [5 pts] For the wavy dataset, plot the spectrogram generated using SWFT. Use a window size ~100. Try to deduce the generator function by inspection of the SWFT  spectrogram. It is of the form A sin(2π.(f0.t + X)), where you have to estimate the form of X(t).

1.3 Hints

Q2: Iterated Function Systems [20 pts]

2.1 Problem description

  Get the IFS code here. Now use it to generate and plot the following curves (i.e., 10000 points of each):
 

  1. The Koch curve


     
  2. The Hilbert curve: (Assume ε = 0.02)

2.2 What to hand in:

  1. [10 pts] The coefficients for both curves
  2. [10 pts] The plot of the curves

 

Q3: K-Means and K-Harmonic Means[40pts]

3.1 Problem description

Implement the K-Means and the K-Harmonic Means (Zhang et.al - some corrections in the paper) clustering  algorithms. You can use any programming language.

Input Data (ASCII files with windows end-of-line termination):

Input format:

Output format:

3.2 What to hand in:

  1. [10 pts] Hard copy of the code.
  2. [10 pts] K-Means: Output of the program on the given 2d Dataset
        i) Using a good initialization of centers: init_good
        ii) Using a bad initialization of centers: init_bad
  3. [5 pts] K-Means: Plot the cluster centers superimposed on the dataset for both initialization cases.
  4. [10 pts] K-Harmonic Means: Output of the program on the given 2d Dataset with the same initial configurations as in 2 above.
  5. [5 pts] K-Harmonic Means: Plot the cluster centers superimposed on the dataset.

 

Q4: Recursive Least Squares and Autoregression [25pts]

4.1 Problem description:

Implement the method of Recursive Least Squares, with forgetting, from the paper by [Yi et al, ICDE 2000], or from [Chen & Roussopoulos, SIGMOD 94].

Again, make sure that your algorithm is incremental, processing one line at a time.
 No points if you load the full dataset in memory!
 

You will use the above code to perform a Autoregression on the following time-series dataset: Wave Data.

                    xt  = w1*xt-1 + w2*xt-2  ,where xt denotes the value of x at time t.

4.2 What to hand in:

    1. [5 pts] Hard Copy of your code

For the first half of the signal (t ≤ 256):

    2. [1 pts] Regression coefficients at t=256 without forgetting.
    3. [3 pts] Regression coefficients at t=256 when the forgetting factor is lambda = 0.98.
    4. [2 pts] Estimate the frequency f, of the signal from the formula: w2 = -1, and 2πf = arctan ( sqrt(-w12 + 4) / w1). Assume sampling rate = 1 per time step.
    5. [3 pts] Verify the frequency you calculated using DFT. Give the amplitude-frequency plot.
    6. [3 pts] Assuming the 2 initial values of x (x1 and x2), forecast the values of x till t=256 using the coefficients from 1 above. Plot the original and forecasted curves for both cases.

For the full signal (t ≤ 512):

    7. [1 pts] Regression coefficients at t=512 without forgetting.
    8. [5 pts] Regression coefficients t=512 when the forgetting factor is lambda = 0.98
    9. [2 pts] To what frequency do the coefficients correspond?