Optimization for machine learning

To minimize/maximize a function $F$, there are a few choices:

• only needs first derivative of $F$. simple to implement
• each iteration is cheap
• has an extra parameter (learning rate)

• needs first and second derivatives of $F$ (Hessian matrix)