### Spring 2006

Homework 3 - Due: April 25, 3:01pm, in class.

### Important:

• Please turn in a typed report -  handwritten material will not be graded.
• Due date: April 25, 3:01pm, in class
• No collaborations are allowed.

#### For your information:

• Expected time effort for this homework:
• Q1: 1h
• Q2: 3h
• Q3: 5-7h
• Q4: 3h

• Points are in [bold pts] and add to 100.

## Q1: 1D DFT, DWT and SWFT[15 pts]

1.1 Problem description

Apply DFT to following datasets. Now repeat this with DWT (using Haar wavelets) instead of DFT.

1.2 What to hand in:

For DS1 and DS2, hand in the following:

1. [2 pt] A plot of the original dataset,
2. [4 pt] the amplitude-frequency plot for DFT. Does the plots agree with what you expected? Explain briefly (Hint: You might want to create a log-log amplitude-freq plot for the Yahoo dataset).
3. [4 pt] the scalogram for DWT. Do the full level decomposition of the signals.

Moreover, for the Wavy dataset:

4. [5 pts] For the wavy dataset, plot the spectrogram generated using SWFT. Use a window size ~100. Try to deduce the generator function by inspection of the SWFT  spectrogram. It is of the form A sin(2π.(f0.t + X)), where you have to estimate the form of X(t).

1.3 Hints

• You can use the 'wavemenu' toolbox in MATLAB. For SWFT you can lookup the function 'specgram'.
• Or else you might consider using xwpl
• Feel free to use any software (eg. R) you wish.

## Q2: Iterated Function Systems [20 pts]

2.1 Problem description

Get the IFS code here. Now use it to generate and plot the following curves (i.e., 10000 points of each):

1. The Koch curve

2. The Hilbert curve: (Assume ε = 0.02)

2.2 What to hand in:

1. [10 pts] The coefficients for both curves
2. [10 pts] The plot of the curves

Q3: K-Means and K-Harmonic Means[40pts]

3.1 Problem description

Implement the K-Means and the K-Harmonic Means (Zhang et.al - some corrections in the paper) clustering  algorithms. You can use any programming language.

Input Data (ASCII files with windows end-of-line termination):

Input format:

• Input the set of points from the data file as specified above.
• Input the number of centers and the initial coordinates from the given initialization file. The first line in the initialization file indicates the number of centers k, to use. Then it gives the x y coordinate values of all the k centers. Sample file: init

Output format:

• The number of iterations n, taken to converge.
• The (x, y) coordinate values of the k-centers after convergence.

3.2 What to hand in:

1. [10 pts] Hard copy of the code.
2. [10 pts] K-Means: Output of the program on the given 2d Dataset
i) Using a good initialization of centers: init_good
ii) Using a bad initialization of centers: init_bad
3. [5 pts] K-Means: Plot the cluster centers superimposed on the dataset for both initialization cases.
4. [10 pts] K-Harmonic Means: Output of the program on the given 2d Dataset with the same initial configurations as in 2 above.
5. [5 pts] K-Harmonic Means: Plot the cluster centers superimposed on the dataset.

## Q4: Recursive Least Squares and Autoregression [25pts]

4.1 Problem description:

Implement the method of Recursive Least Squares, with forgetting, from the paper by [Yi et al, ICDE 2000], or from [Chen & Roussopoulos, SIGMOD 94].

• Language: You may choose among C/C++, perl, python, Java, R, Matlab. However, your algorithm must be incremental, processing one row of data at a time.
• Input format:  n+1 numbers per line, blank separated; the last column is the dependent variable
• Output format: n regression coefficients, w1 ... wn, so that
<last-column> ~ w1 * <column-1> + ... wn * <column-n>
For example, you could see this python code which (a) needs "Numerical Python", and (b) has no forgetting factor.
You can "tar xvf" and then "make" it.
Note: The code is provided to mainly show you the expected formats of the input and output. Make sure you test it thoroughly, or rewrite it from scratch.

Again, make sure that your algorithm is incremental, processing one line at a time.
No points if you load the full dataset in memory!

You will use the above code to perform a Autoregression on the following time-series dataset: Wave Data.

• The Autoregressive model AR(2) is:

xt  = w1*xt-1 + w2*xt-2  ,where xt denotes the value of x at time t.

• Preprocessed data: For your convenience we have preprocessed the data, so that each line is of the format: <xi-2 ,xi-1,xi>
You can use this as input for your program.

4.2 What to hand in:

1. [5 pts] Hard Copy of your code

For the first half of the signal (t ≤ 256):

2. [1 pts] Regression coefficients at t=256 without forgetting.
3. [3 pts] Regression coefficients at t=256 when the forgetting factor is lambda = 0.98.
4. [2 pts] Estimate the frequency f, of the signal from the formula: w2 = -1, and 2πf = arctan ( sqrt(-w12 + 4) / w1). Assume sampling rate = 1 per time step.
5. [3 pts] Verify the frequency you calculated using DFT. Give the amplitude-frequency plot.
6. [3 pts] Assuming the 2 initial values of x (x1 and x2), forecast the values of x till t=256 using the coefficients from 1 above. Plot the original and forecasted curves for both cases.

For the full signal (t ≤ 512):

7. [1 pts] Regression coefficients at t=512 without forgetting.
8. [5 pts] Regression coefficients t=512 when the forgetting factor is lambda = 0.98
9. [2 pts] To what frequency do the coefficients correspond?