This assignment has two parts. The first explores applying function optimization, both to a known and also a learned function. The second explores optimizing a policy using function optimization.
The Rosenbrock banana function has a long history as a test function for numerical function optimization. Google "Rosenbrock banana function" to find out more.
The 4 dimensional Rosenbrock banana function is (in Matlab)
function cost = banana4(x)
% Rosenbrock banana function in N dimensions
N = 4;
cost = 0;
for i = 1:(N-1)
cost = cost + 100*(x(i+1) - x(i)^2)^2 + (1 - x(i))^2;
end
end
This function has a global optimum at (1,1,1,1) of 0, and a local optimum
at (-1,1,1,1) of 4.
Optimizing this function in Matlab (from a random start) with a non-gradient method:
for i = 1:4 x0(i) = 2*(rand - 0.5); end [ x, fval, eflag, output ] = fminsearch( @banana4, x0 )and with a gradient method:
for i = 1:4 x0(i) = 2*(rand - 0.5); end [ x, fval, eflag, output] = fminunc( @banana4, x0 )The initial point x0 = [ -0.1565 0.8315 0.5844 0.9190 ] sends it to the local minimum. This page shows many Matlab optimizers applied to the 2 dimensional Rosenbrock banana function.
Note that it is easy to analytically take derivatives of this function, including 2nd derivatives, so you can use both first and second order gradient methods that require analytic derivatives.
I encourage you to try out a non-gradient function optimization method such as CMAES. Google "matlab cmaes" to get CMAES code for Matlab.
What to turn in: Nothing to turn in for this part.
N = 4;
N_data = 100;
for i = 1:N_data
for j = 1:N
x(j) = 2*(rand - 0.5);
end
in(1:N,i) = x;
out(i) = banana4(x);
end
Here is an example of fitting the training data with a simple neural network
in Matlab.
% design the net architecture hiddenLayerSize = 10; net = fitnet(hiddenLayerSize); % train it [net, tr] = train(net, in, out); % test it net([1 1 1 1]') net([-1 1 1 1]')When I did this I got
net([1 1 1 1]') = 137.3236 net([-1 1 1 1]') = 18.9859Clearly, this fit is terrible, and any attempt to optimize using this learned function will fail badly. Nevertheless, we will continue with this example ...
To access the data structure "net" Matlab uses funky notation with @:
fun = @(x)(net([x(1) x(2) x(3) x(4)]')); fun([1,1,1,1]) % an example evaluationOtherwise the variable net has to be declared global to be included in an external function. To minimize this function:
for i = 1:4 x0(i) = 2*(rand - 0.5); end [x,fval,eflag,output] = fminsearch(fun,x0)What to turn in: The writeup is what matters for this section. What I want you to write up is all the things that go wrong and go right with trying to optimize learned functions. What is different from optimizing analytic functions? What is your advice for folks using machine learning to optimize robot performance on various tasks where we don't have analytic functions that score task performance?
A hint: you may find it useful to constrain the function optimizer to the training data distribution. Here is how you do that in Matlab, using a constrained function optimizer:
A = []; b = []; Aeq = []; beq = []; lb = [-2,-2]; ub = [2,2]; [x,fval,eflag,output] = fmincon(fun,x0,A,b,Aeq,beq,lb,ub)
Another hint: Assuming you can fit this function well, how about taking derivatives? Do the analytic derivatives of your approximate function match the analytic derivatives of the true function as well as the match to the function values? What is the summed error of the learned derivatives across all the training points?
Use this two-wheel inverted pendulum (TWIP) simulator to find feedback gains to balance a simulated version of our lab robot. The gains needed are the the wheel angle gain, the wheel angular velocity gain, the body angle gain, and the body angular velocity gain. These four gains are components of the vector Kc in the simulator code.
You can use any method you like to do this. These Matlab examples show how to use optimization to find gains that work. We will later talk about Linear Quadratic Regulator (LQR) approaches, as well as reinforcement learning (RL) approaches that use neural networks. You can google LQR and reinforcement learning to find out how to do these. The TWIP is similar to an inverted pendulum mounted on a cart (Google "cart-pole"), so solution methods that work for the cart-pole should also work for our TWIP. The cart-pole mechanism is often used in machine learning studies.
What to turn in for Parts 2a and 2b:Your writeup of what you did, how you did it, and what you learned is what is important here. If you just manually searched for 4 numbers that worked, describe how many tries that took, and how you decided what the next set of numbers to try after each test. If you used some other method to automate the search, describe that method. You are encouraged to tackle this problem in several different ways, but that is not a requirement. What changes when you use a neural network for the policy representation?