Homework 2  Due: Tuesday Mar. 18,
Reminders
Generate a 3D dataset with N=50,000 points, so that
it has the correlation integral of the Figure 1. Notice that there is more than
one correct solution – we’ll accept any one of them. Then, plot the correlation
integral of your dataset. You may use the fractaldimension code from www.cs.cmu.edu/~christos/SRC/fdnq.tar
(feel free to modify it); or some other package from the web, or write your own
code.
Figure 1: Expected Correlation Integral
What to hand in:
(1) [3 pts] Briefly describe the mechanism used to generate your dataset.
(2) [2 pts] The scatterplot of the dataset you generate (In matlab, you can use plot3 to plot a 3d dataset; in gnuplot, you can use splot; in R, you can use cloud)
(3) [10 pts] The fractal dimensions as well as the corresponding range. (It is fine to have small deviations from the specified numbers)
(4) [5 pts] The hard copy of your code that calculates such fractal dimensions. If you use the provided ‘fdnq’ package, highlight your changes, if any.
We want to generate 'fat fractals', as shown in
Figures 2 below: We start with a square of side 1; each side is divided into n
equal pieces (n=7 in the Figure), and each side gives 'birth' to an
'island' of side 1/n, above the mth piece (m=3 there), at
distance d * side length (side
length=1 for iteration 1). Notice that, at the second iteration, every
1/n piece also gives birth to a little island.
Write
a program that will read the parameters n, m, d and k
(= number of iterations) and it will generate points on the periphery of the kth
iteration of such a fat fractal. Specifically, just print the endpoints of
each segment of side (1/n)**k. That is, for k=0, just
print the four corners of the unit square; for k=1, also print the
corners of each new little island, as well as the endpoints of each 1/n
segment on the original unit square; and so on. Hand in your code (again, in any major language).




Fat fractal – iter.#0 
Iteration 1 
Recursive rule 
Iteration 2 
What
to hand in:
(1)
[20 pts] Run your program with n=7, m=3,
d=1/n and k=4 iterations, and handin the scatterplot of these points.
(2)
[10 pts] Give
the correlation fractal dimension of these points.
(3)
[10 pts] Email a tarfile of your source code to
the TA (htong at cs). Reminder: exclude binaries and other redundant files.
In this exercise, we will learn how to generate
power law by Random Walk.
Problem
Description
In this model, assume that you have zero balance at
the very beginning. Then at each step, you toss a coin; you win 1 dollar if it
is head, and you lose 1 dollar if it is the tail. Consider the zerocrossings:
the distribution of these intervals follows a power law (See Lecture 13, page
1920, and the paper [Newman,
2005]). Write a program in your favorite language to verify this claim.
There is only one parameter (e.g., the number of steps/iterations Tt) in your program.
What
to hand in:
(1)
[5 pts] Simulate a random walk with T=1,000,000 steps with your program. Plot
the balance (e.g. the number of dollars you have at that iteration) vs. the
iteration step t, for the last T_{last}=10,000
steps.
(2) [15 pts] Plot the length of interval vs. its rank (i.e., Zipf plot), in loglog scales. Fit a line and plot (in the same figure) the fitted line. What is the slope of your fitted line?
Hints: When fitting the line in loglog scales, you might want to delete the tail part of the curve. We accept any reasonable heuristic to define the tail. For example, you can use the same strategy used in the fdnq package that you have already seen in Q1.
(3) [5 pts] Email a tarfile of your source code to the TA (htong at cs). Reminder: Again, please exclude binaries etc.
In this
exercise, we will learn how to generate a power law distribution, using the
``Chinese Restaurant Process’’.
Problem
Description
In this model, imagine there is a new customer
entering a Chinese restaurant at each time step. He/she has two choices: (1)
joining in an existing table with probability that is proportional to the size
(e.g. the number of customers that are already in that table); (2) starting a
new table with probability c. For
simplicity, we assume that initially, there is only one table with one
customer. The tail distribution of the size of the tables follows a power law
(see Lecture 13, and [Newman,
2005]). Write a program in your favorite language to verify this claim.
Your program should input two parameters: t
and c (the number of steps/iterations
t and the probability to start a new
table c)
What to hand in:
(1)
[5 pts] Simulate a Chinese restaurant process
with c=0.1 and t=10,000 steps with your program. Plot the PDF for size of the
tables in loglog scale. Fit a line and plot (in the same figure) the fitted
line. What is the slope of your fitted line?
(2) [5 pts] Repeat, for c=0.5 (and same duration t=10,000). That is, show the PDF of the table sizes, fit a line in loglog scales, plot it, and report its slope.
(3)
[5 pts] Email a tarfile of your source code to
the TA (htong at cs). Please include just the source code and not the
binaries