Here is Matlab code for a simple dual control example. There is no measurement noise, so the x(i) are known perfectly. The control gain b is not known perfectly but with variance var_b(i). There is process noise with variance var_w = 1. The first section below is the top level code, and the second file is a subroutine that computes the cost of an initial action.
Matlab code to plot cost of various initial actions on a 2 step problem:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% x(k+1) = x(k) + b(k)*u(k) + w
% b(k+1) = b(k)
% w is zero mean noise with variance var_w = 1
% x0 is first state
x0 = -1;
% nominal estimate of b remains the same throughout
b = 1;
% initial b parameter variance
var_b0 = 1;
% var_w: variance of process noise w
var_w = 1
% u0 is first action
N = 100;
u0s(N) = 0;
cs(N) = 0;
for i = 1:N
u0 = (i-1)/(N-1);
c = cost( u0, x0, b, var_b0, var_w );
u0s(i) = u0;
cs(i) = c;
end
plot( u0s, cs );
xlabel('u0')
ylabel('cost')
title('x0=-1,b=1,var_w=1')
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Matlab subroutine to compute cost of initial action
function c = cost( u0, x0, b, var_b0, var_w ) % c = cost( u0, x0, b, var_b0, var_w ): compute cost of initially taking % action u0 % x0 is the starting state, % b is the prior on the value of the b parameter, % var_b0 is the initial variance of b % x(k+1) = x(k) + b(k)*u(k) + w % b(k+1) = b(k) % w is zero mean noise with variance var_w % step indices are appended to variable names: % x1 = x(1), x2 = x(2), var_b1 = var_b(1), u1 = u(1), ... % variance of b after first step var_b1 = var_b0*var_w/(u0*u0*var_b0 + var_w); % expectation of x1 x1 = x0 + b*u0; % use cautious control on last step since further learning is not useful % since no further decisions will be made u1 = -b*x1/(var_b1 + b*b); % expectation of x2 x2 = x1 + b*u1; % Var(x1) var_x1 = var_b0*u0*u0 + var_w; % Var(x2) var_x2 = var_b1*u1*u1 + var_w; % E(x1^2) ex1_2 = var_x1 + x1*x1; % E(x2^2) ex2_2 = var_x2 + x2*x2; % cost which is sum of E(x1^2 + x2^2) c = ex1_2 + ex2_2;
In the figure, notice that the optimum is around 0.6. If this had been just a one step problem, the optimum u0 would have been 0.5 (using cautious control). Having more than one step makes it advantagous to use a bigger u0 to reduce the uncertainty about the parameter b.
Matlab code to plot optimal u versus other variables:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
clear uarray
x0 = -1;
b = 1;
var_b0 = 1;
var_w = 1;
i = 1;
for x0 = 0.0:-0.1:-2.0
j = 1;
for var_b0 = 0.0: 0.1: 5.0
u = findbestu( x0, b, var_b0, var_w );
uarray(i,j) = u;
j = j + 1;
end
i = i + 1;
end
figure(1)
y = 0.0:-0.1:-2.0;
x = 0.0: 0.1: 5.0;
surf(x,y,uarray)
view(-37.5+180,30)
title('Optimal u0 when varying x0 and Var_b0')
ylabel('x0')
xlabel('Var(b)')
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Matlab subroutine to find best u
function u = findbestu( x0, b, var_b0, var_w ) % x0 is first state % b: nominal estimate of b remains the same throughout % var_b0: initial b parameter variance % var_w: variance of process noise % x(k+1) = x(k) + b(k)*u(k) + w % b(k+1) = b(k) % v is zero mean noise with variance var_w % u0 is first action % c = cost( u0, x0, b, var_b0 ) N = 1001; u0s(N) = 0; cs(N) = 0; u_best = 0; c_best = 1e30; for i = 1:N u0 = 10*(i-1)/(N-1); c = cost( u0, x0, b, var_b0, var_w ); if ( c < c_best ) u_best = u0; c_best = c; end u0s(i) = u0; cs(i) = c; end u = u_best;
Matlab code to plot u0 vs b and var_b0:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
clear uarray
x0 = -1;
b = 1;
var_b0 = 1;
var_w = 1;
i = 1;
for b = 0.1:0.1:2.0
j = 1;
for var_b0 = 0: 0.1: 5.0
u = findbestu( x0, b, var_b0, var_w );
uarray(i,j) = u;
j = j + 1;
end
i = i + 1;
end
figure(2)
y = 0.1:0.1:2.0;
x = 0: 0.1: 5.0;
surf(x,y,uarray)
title('Optimal u0 when varying b and Var(b)')
ylabel('b')
xlabel('Var(b)')
view(-37.5+180,30)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Matlab code to plot u0 vs var_w and var_b0:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
clear uarray
x0 = -1;
b = 1;
var_b0 = 1;
var_w = 1;
i = 1;
for var_w = 0.1:0.1:2.5
j = 1;
for var_b0 = 0: 0.1: 5.0
u = findbestu( x0, b, var_b0, var_w );
uarray(i,j) = u;
j = j + 1;
end
i = i + 1;
end
figure(3)
y = 0.1:0.1:2.5;
x = 0: 0.1: 5.0;
surf(x,y,uarray)
title('Optimal u0 when varying var_w and Var(b)')
ylabel('var_w')
xlabel('Var(b)')
view(-37.5+180,30)
% var_w has no effect
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%