3.6.5 The 1D Search
In this section, we detail how to compute an appropriate step size for a typical optimization step. This is referred to as a 1D line search, since we are searching for the minimum along the search direction. One common approach is to use polynomial interpolation to first construct an approximation for the objective function in the search direction.
Consider that at some point, \(x^0\), have \(\nabla J = [ 1 \ 2]^ T\). The steepest descent direction is then \(S = -\nabla J = [-1 \ -2]^ T\). To compute \(\alpha\), we sample at three points along the direction \(S\), i.e., we compute the objective function value \(J(x^0+\alpha S)\). We find
\(\alpha =0\), \(J=10\)
\(\alpha =1\), \(J=6\)
\(\alpha =2\), \(J=8\)
Also, we know that for \(\alpha =0\),
Using these four pieces of data, we can fit a cubic polynomial to describe \(J(\alpha )\):
We solve for the coefficients by setting up the system
giving solution \(c_1 = 10\), \(c_2 = -5\), \(c_3=0\), \(c_4=1\).
So we approximate \(J\) as \(J(\alpha ) = 10-5\alpha + \alpha ^3\).
Now we want to find the \(\alpha\) that minimizes \(J(\alpha )\), i.e., set \(-5+3\alpha ^2=0\), giving \(\alpha =\sqrt {\frac{5}{3}}=1.291\).
This is the value of \(\alpha\) that minimizes \(J(x+\alpha S)\) in the given direction \(S\).
Consider that at some point, \(x^0\), we have \(\nabla J=[1 3]^ T\). The steepest descent direction is then \(S=-\nabla J = [-1 -3]^ T\). To compute a good step size \(\alpha\), we sample at three points along the direction \(S\), i.e., we compute the objective function value \(J(x^0 + \alpha S)\) for three different values of \(\alpha\). Let's say that we find:
\(\alpha =0, J=8\)
\(\alpha =1, J=6\)
\(\alpha =2, J=8\)
Fit a cubic equation to approximate \(J(\alpha )\), and find the value of \(\alpha\) that minimizes this cubic equation.
The resulting value of \(\alpha\) (to three decimals) that approximately minimizes \(J(x+\alpha S)\) in the given direction \(S\) is: