- Problem 9.3 from the textbook
- Problem 9.4 from the textbook
- The next problems involve applying genetic algorithms and
hillclimbing to currency data. Five files are necessary:
- yen_returns.dat
- yen_series.dat
- dm_returns.dat
- dm_series.dat
- ga.m
which you should download from here
Put these in a directory and start up matlab. First, load in
the .dat files (type 'load yen_returns.dat' etc at the matlab prompt). These
files contains 4 variables:
- 'dm_series':, which contains mark/dollar exchange rate
daily closes and precomputed moving averages
- 'dm_returns', which contains daily returns for holding
marks (difference in exchange rate + interest rate
differential).
- 'yen_series': like dm_series, but for yen/dollar
- 'yen_returns': like dm_returns, but for yen/dollar
ga.m is the genetic algorithm program, and is run by the
following command at the matlab prompt:
[train_returns, test_returns] = ga(series, returns, training_proportion,
tree_size, population_size, generations)
Where:
- train_returns: array of values containing returns over
training set, one for each generation
- test_returns: array of values containing returns over
testing set, one for each generation
- series: either dm_series, or yen_series
- returns: either dm_returns, or yen_returns
- training_proportion: number between [0,1] containing
proportion of data for use in training set
- tree_size: integer (for this assignment, either 2 or
3), representing max depth of tree
- population_size: integer representing population size
(if you use a pop size of 1, the program assumes to want
to use a hillclimbing algorithm)
- generations: number of generations for ga to run
when run, ga plots the train_returns in blue and the
test_returns in red.
Questions:
- Run ga.m ten times for each of the following parameters set:
- [train_returns, test_returns] = ga(dm_series,
dm_returns, .5, 2, 10, 50)
- [train_returns, test_returns] = ga(dm_series, dm_returns, .5,
2, 1, 500)
(Note that matlab plots will be generated automatically by the
code executing.)
The first is a genetic algorithm with a max tree size of
2, population of 10, and 50 generations. The second is a
hillclimbing algorithm run for 500 generations.
- Using 50 generations for the GA and 500 generations for
the hillclimbing algorithm makes this a fair comparision. Why?
- Record the mean and variance of the final
test fitness produced by each approach over the ten
trials. Can you explain the difference in variance? Which do you
prefer?
- (Optional) Do the results lead you to suggest a
modification to the hillclimbing algorithm? If so, what?
- Would there be an advantage to having a population size of
100? A disadvantage? Run the algorithm as above with a
population size of 100, and see if your predictions are correct.
- Run ga.m ten times for each of the following parameters set:
- [train_returns, test_returns] = ga(dm_series,
dm_returns, .1, 2, 10, 50)
This is the same as the genetic algorithm of the previous
question, except the proportion of data used for training is
only 10%.
- Record the mean and variance of the final
test fitness produced over the ten trials. Compare to the
results from the GA in the previous section, and propose an
explanation for what is causing the differences in terms of
overfitting. (hint: what whould happen if you set
training_proportion to .01?)
- There is a potential confound in comparing results of
the algorithm run on different chunks of data -- what is it?
- Run ga.m ten times for each of the following parameters
sets (note the change from yen to dm):
- [train_returns, test_returns] = ga(yen_series,
yen_returns, .1, 1, 1, 200)
- [train_returns, test_returns] = ga(yen_series,
yen_returns, .1, 2, 1, 200)
Examine the test_fitness curves as a function of the number of
generations. Do you notice a qualitative difference between
the curves from the algorithm with tree_size 2 as opposed to
the algorithm with tree_size 1? (hint: for each trial compare
the maximum test fitness values with the final test fitness
value). Print out one representative plot for each condition
that demonstrates this, and speculate about how the difference
in tree size causes the differences in curve shapes (hint:
think about pruning in decision trees).
Created by James Thomas, maintained by Rosie Jones
Last modified: Mon Apr 12 10:34:54 EDT 1999