**Homework 3 - Due: April 25,
3:01pm, in class.**

- Please turn in a typed report - handwritten material will not be graded.
- Due date: April 25, 3:01pm, in class
- No collaborations are allowed.

- Expected time effort for this homework:
- Q1: 1h
- Q2: 3h
- Q3: 5-7h
- Q4: 3h

- Points are in
**[bold pts]**and add to 100.

1.1 Problem description

Apply DFT to following datasets. Now repeat this with DWT (using Haar wavelets) instead of DFT.

- DS1: A noisy sine wave.
- DS2: Stock price fluctuations for Yahoo (Brown Noise).
- DS3: Wavy

1.2 What to hand in:

For DS1 and DS2, hand in the following:

- [2 pt] A plot of the original dataset,
- [4 pt] the amplitude-frequency plot for DFT. Does the plots agree with what you expected? Explain briefly (Hint: You might want to create a log-log amplitude-freq plot for the Yahoo dataset).
- [4 pt] the scalogram for DWT. Do the full level decomposition of the signals.

Moreover, for the Wavy dataset:

4. [5 pts] For the wavy dataset, plot the spectrogram generated using SWFT. Use a window size ~100. Try to deduce the generator function by inspection of the SWFT spectrogram. It is of the form A sin(2π.(f1.3 Hints

- You can use the 'wavemenu' toolbox in MATLAB. For SWFT you can lookup the function 'specgram'.
- Or else you might consider using xwpl
- Feel free to use any software (eg. R) you wish.

2.1 Problem description

Get the IFS code here. Now use it to generate and plot the following curves (i.e.,
10000
points of each):

- The Koch curve

- The Hilbert curve: (Assume ε = 0.02)

2.2 What to hand in:

**[10 pts]**The coefficients for both curves**[10 pts]**The plot of the curves

**Q3: K-Means and K-Harmonic Means[40pts]**

3.1 Problem description

Implement the K-Means and the K-Harmonic Means (Zhang et.al - some corrections in the paper) clustering algorithms. You can use any programming language.

* Input Data* (ASCII files with windows end-of-line termination):

- 2D data: (www.cs.cmu.edu/~kaustav/15826/hw3/q3points.txt).
- Each line specifies a point. The (blank separated) format is: x y.

*Input format: *

- Input the set of points from the data file as specified above.
- Input the number of centers and the initial coordinates from the given initialization file. The first line in the initialization file indicates the number of centers k, to use. Then it gives the x y coordinate values of all the k centers. Sample file: init

*Output format: *

- The number of iterations n, taken to converge.
- The (x, y) coordinate values of the k-centers after convergence.

3.2 What to hand in:

- [10 pts] Hard copy of the code.
- [10 pts] K-Means: Output of the
program on the given 2d Dataset

i) Using a good initialization of centers: init_good

ii) Using a bad initialization of centers: init_bad - [5 pts] K-Means: Plot the cluster centers superimposed on the dataset for both initialization cases.
- [10 pts] K-Harmonic Means: Output of the program on the given 2d Dataset with the same initial configurations as in 2 above.
- [5 pts] K-Harmonic Means: Plot the cluster centers superimposed on the dataset.

4.1 Problem description:

Implement the method of Recursive Least Squares, with forgetting, from the paper by [Yi et al, ICDE 2000], or from [Chen & Roussopoulos, SIGMOD 94].

- Language: You may choose among C/C++, perl, python, Java, R, Matlab. However, your algorithm must be incremental, processing one row of data at a time.
- Input format: n+1 numbers per line, blank separated; the last column is the dependent variable
- Output format:
n regression coefficients, w
_{1}... w_{n}, so that

<last-column> ~ w_{1}* <column-1> + ... w_{n}* <column-n>

For example, you could see this python code which (a) needs "Numerical Python", and (b) has no forgetting factor.

You can "tar xvf" and then "make" it.

**Note:**The code is provided to mainly show you the expected formats of the input and output. Make sure you test it thoroughly, or rewrite it from scratch.

Again, make sure that your algorithm is incremental, processing one line at a time.

No points if you load the full dataset in
memory!

You will use the above code to perform a Autoregression on the following time-series dataset: Wave Data.

- The Autoregressive model AR(2) is:

x_{t } = w_{1}*x_{t-1} + w_{2}*x_{t-2}
,where x_{t} denotes the value of x at time t.

- Preprocessed data: For your convenience we have
preprocessed the data, so that each line is of the format: <x
_{i-2},x_{i-1},x_{i}>

You can use this as input for your program.

4.2 What to hand in:

1. **[5 pts]** Hard Copy of your code

For the first half of the signal (t ≤ 256):

2. [1 pts] Regression coefficients at t=256 without forgetting.3.

4.

5.

6.

For the full signal (t ≤ 512):

7. [1 pts] Regression coefficients at t=512 without forgetting.8. [5 pts] Regression coefficients t=512 when the forgetting factor is lambda = 0.98