Dual Control Example


Here is Matlab code for a simple dual control example. There is no measurement noise, so the x(i) are known perfectly. The control gain b is not known perfectly but with variance var_b(i). There is process noise with variance var_w = 1. The first section below is the top level code, and the second file is a subroutine that computes the cost of an initial action.


Matlab code to plot cost of various initial actions on a 2 step problem:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% x(k+1) = x(k) + b(k)*u(k) + w
% b(k+1) = b(k)
% w is zero mean noise with variance var_w = 1

% x0 is first state
x0 = -1;

% nominal estimate of b remains the same throughout
b = 1;

% initial b parameter variance
var_b0 = 1;

% var_w: variance of process noise w
var_w = 1

% u0 is first action

N = 100;
u0s(N) = 0;
cs(N) = 0;

for i = 1:N
 u0 = (i-1)/(N-1);
 c = cost( u0, x0, b, var_b0, var_w );
 u0s(i) = u0;
 cs(i) = c;
end

plot( u0s, cs );
xlabel('u0')
ylabel('cost')
title('x0=-1,b=1,var_w=1')

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


Matlab subroutine to compute cost of initial action

function c = cost( u0, x0, b, var_b0, var_w )
% c = cost( u0, x0, b, var_b0, var_w ): compute cost of initially taking
% action u0
% x0 is the starting state,
% b is the prior on the value of the b parameter,
% var_b0 is the initial variance of b
% x(k+1) = x(k) + b(k)*u(k) + w
% b(k+1) = b(k)
% w is zero mean noise with variance var_w

% step indices are appended to variable names:
% x1 = x(1), x2 = x(2), var_b1 = var_b(1), u1 = u(1), ...

% variance of b after first step
var_b1 = var_b0*var_w/(u0*u0*var_b0 + var_w);

% expectation of x1
x1 = x0 + b*u0;

% use cautious control on last step since further learning is not useful
% since no further decisions will be made
u1 = -b*x1/(var_b1 + b*b);

% expectation of x2
x2 = x1 + b*u1;

% Var(x1)
var_x1 = var_b0*u0*u0 + var_w;

% Var(x2)
var_x2 = var_b1*u1*u1 + var_w;

% E(x1^2)
ex1_2 = var_x1 + x1*x1;

% E(x2^2)
ex2_2 = var_x2 + x2*x2;

% cost which is sum of E(x1^2 + x2^2)
c = ex1_2 + ex2_2;


In the figure, notice that the optimum is around 0.6. If this had been just a one step problem, the optimum u0 would have been 0.5 (using cautious control). Having more than one step makes it advantagous to use a bigger u0 to reduce the uncertainty about the parameter b.


Matlab code to plot optimal u versus other variables:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

clear uarray

x0 = -1;
b = 1;
var_b0 = 1;
var_w = 1;

i = 1;

for x0 = 0.0:-0.1:-2.0
 j = 1;
 for var_b0 = 0.0: 0.1: 5.0
 u = findbestu( x0, b, var_b0, var_w );
 uarray(i,j) = u;
 j = j + 1;
 end
i = i + 1;
end

figure(1)
y = 0.0:-0.1:-2.0;
x = 0.0: 0.1: 5.0;
surf(x,y,uarray)
view(-37.5+180,30)
title('Optimal u0 when varying x0 and Var_b0')
ylabel('x0')
xlabel('Var(b)')

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%



Matlab subroutine to find best u

function u = findbestu( x0, b, var_b0, var_w )
% x0 is first state
% b: nominal estimate of b remains the same throughout
% var_b0: initial b parameter variance
% var_w: variance of process noise

% x(k+1) = x(k) + b(k)*u(k) + w
% b(k+1) = b(k)
% v is zero mean noise with variance var_w

% u0 is first action
% c = cost( u0, x0, b, var_b0 )

N = 1001;
u0s(N) = 0;
cs(N) = 0;

u_best = 0;
c_best = 1e30;

for i = 1:N
 u0 = 10*(i-1)/(N-1);
 c = cost( u0, x0, b, var_b0, var_w );
 if ( c < c_best )
  u_best = u0;
  c_best = c;
 end
 u0s(i) = u0;
 cs(i) = c;
end

u = u_best;


Matlab code to plot u0 vs b and var_b0:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

clear uarray

x0 = -1;
b = 1;
var_b0 = 1;
var_w = 1;

i = 1;

for b = 0.1:0.1:2.0
 j = 1;
 for var_b0 = 0: 0.1: 5.0
  u = findbestu( x0, b, var_b0, var_w );
  uarray(i,j) = u;
  j = j + 1;
 end
i = i + 1;
end

figure(2)
y = 0.1:0.1:2.0;
x = 0: 0.1: 5.0;
surf(x,y,uarray)
title('Optimal u0 when varying b and Var(b)')
ylabel('b')
xlabel('Var(b)')
view(-37.5+180,30)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%



Matlab code to plot u0 vs var_w and var_b0:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

clear uarray

x0 = -1;
b = 1;
var_b0 = 1;
var_w = 1;

i = 1;

for var_w = 0.1:0.1:2.5
 j = 1;
 for var_b0 = 0: 0.1: 5.0
  u = findbestu( x0, b, var_b0, var_w );
  uarray(i,j) = u;
  j = j + 1;
 end
i = i + 1;
end

figure(3)
y = 0.1:0.1:2.5;
x = 0: 0.1: 5.0;
surf(x,y,uarray)
title('Optimal u0 when varying var_w and Var(b)')
ylabel('var_w')
xlabel('Var(b)')
view(-37.5+180,30)

% var_w has no effect

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%