The process of gradient descent involves moving towards smaller error in an error function. The first order linear approximation of the function is: For a vector u, E(u+delta_u)=E(u)+Sum(i,dE/du_i*delta_u_i) With the constraint that |delta_u|=C (a constant) => delta_u_i=1/C*dE/du_i