16-745: Assignment 1

Exploring Function Optimization And Using It To Optimize A Policy

This assignment has two parts. The first explores applying function optimization, both to a known and also a learned function. The second explores optimizing a policy using function optimization.

Part 1a: Function Optimization

Apply a selection of function optimizers to the 4 dimensional banana function. These function optimizers can be from any package (Matlab, Octave, Julia, ...) or any numerical library you like (Google "function optimization libraries" to get a list).

The Rosenbrock banana function has a long history as a test function for numerical function optimization. Google "Rosenbrock banana function" to find out more.

The 4 dimensional Rosenbrock banana function is (in Matlab)

function cost = banana4(x)
% Rosenbrock banana function in N dimensions                                      
  N = 4;
  cost = 0;
  for i = 1:(N-1)
    cost = cost + 100*(x(i+1) - x(i)^2)^2 + (1 - x(i))^2;
  end
end

This function has a global optimum at (1,1,1,1) of 0, and a local optimum at (-1,1,1,1) of 4.

Optimizing this function in Matlab (from a random start) with a non-gradient method:

for i = 1:4
 x0(i) = 2*(rand - 0.5);
end
[ x, fval, eflag, output ] = fminsearch( @banana4, x0 )

and with a gradient method:

for i = 1:4
 x0(i) = 2*(rand - 0.5);
end
[ x, fval, eflag, output] = fminunc( @banana4, x0 )

The initial point x0 = [ -0.1565 0.8315 0.5844 0.9190 ] sends it to the local minimum. This page shows many Matlab optimizers applied to the 2 dimensional Rosenbrock banana function.

Note that it is easy to analytically take derivatives of this function, including 2nd derivatives, so you can use both first and second order gradient methods that require analytic derivatives.

I encourage you to try out a non-gradient function optimization method such as CMAES. Google "matlab cmaes" to get CMAES code for Matlab.

What to turn in: Nothing to turn in for this part.

Part 1b: Optimizing Learned Functions

Now let's optimize a learned version of the 4 dimensional Rosenbrock. The first thing you have to do is generate a training set. I suggest taking random samples from a uniform distribution where each input dimension is between -2 and 2. I am interested in how the performance in this situation depends on the size of the training set. Here is an example of how to generate a training set in Matlab (which you can then export to any learning software you like):

N = 4;
N_data = 100;
for i = 1:N_data
  for j = 1:N
    x(j) = 2*(rand - 0.5);
  end
  in(1:N,i) = x;
  out(i) = banana4(x);
end

Here is an example of fitting the training data with a simple neural network in Matlab.

% design the net architecture
hiddenLayerSize = 10;
net = fitnet(hiddenLayerSize);
% train it
[net, tr] = train(net, in, out);
% test it
net([1 1 1 1]')
net([-1 1 1 1]')

When I did this I got

net([1 1 1 1]') = 137.3236
net([-1 1 1 1]') = 18.9859

Clearly, this fit is terrible, and any attempt to optimize using this learned function will fail badly. Nevertheless, we will continue with this example ...

To access the data structure "net" Matlab uses funky notation with @:

fun = @(x)(net([x(1) x(2) x(3) x(4)]'));
fun([1,1,1,1]) % an example evaluation

Otherwise the variable net has to be declared global to be included in an external function. To minimize this function:

for i = 1:4
 x0(i) = 2*(rand - 0.5);
end
[x,fval,eflag,output] = fminsearch(fun,x0)

What to turn in: The writeup is what matters for this section. What I want you to write up is all the things that go wrong and go right with trying to optimize learned functions. What is different from optimizing analytic functions? What is your advice for folks using machine learning to optimize robot performance on various tasks where we don't have analytic functions that score task performance?

A hint: you may find it useful to constrain the function optimizer to the training data distribution. Here is how you do that in Matlab, using a constrained function optimizer:

A = [];
b = [];
Aeq = [];
beq = [];
lb = [-2,-2];
ub = [2,2];
[x,fval,eflag,output] = fmincon(fun,x0,A,b,Aeq,beq,lb,ub)

Another hint: Assuming you can fit this function well, how about taking derivatives? Do the analytic derivatives of your approximate function match the analytic derivatives of the true function as well as the match to the function values? What is the summed error of the learned derivatives across all the training points?

Part 2a: Parametric Policy Optimization

Use this two-wheel inverted pendulum (TWIP) simulator to find feedback gains to balance a simulated version of our lab robot. The gains needed are the the wheel angle gain, the wheel angular velocity gain, the body angle gain, and the body angular velocity gain. These four gains are components of the vector Kc in the simulator code.

You can use any method you like to do this. These Matlab examples show how to use optimization to find gains that work. We will later talk about Linear Quadratic Regulator (LQR) approaches, as well as reinforcement learning (RL) approaches that use neural networks. You can google LQR and reinforcement learning to find out how to do these. The TWIP is similar to an inverted pendulum mounted on a cart (Google "cart-pole"), so solution methods that work for the cart-pole should also work for our TWIP. The cart-pole mechanism is often used in machine learning studies.

Part 2b: Parametric Policy Optimization

Now do Part 2a over again using a neural network (any architecture you like) as the policy representation.

What to turn in for Parts 2a and 2b:Your writeup of what you did, how you did it, and what you learned is what is important here. If you just manually searched for 4 numbers that worked, describe how many tries that took, and how you decided what the next set of numbers to try after each test. If you used some other method to automate the search, describe that method. You are encouraged to tackle this problem in several different ways, but that is not a requirement. What changes when you use a neural network for the policy representation?