Line search and BFGS method


riccardolongoni
10-30-2007, 06:46 AM
Dear all,

I have a question about the BFGS method (to be used in a large-scale unconstrained optimization problem).

The algorithm presented in Section 10.7 (Numerical Recipes in C++, Second Edition) is the standard BFGS method, but wouldn't it be better to use the limited-memory version of Liu-Nocedal (L-BFGS)? What are the differences/advantages of the two versions?

I have also a technical question regarding the implementation of the line search (which in turn is used in the BFGS method): I am referring to the C++ function lnsrch of Sect 9.7. At some point of the algorithm one has to find the minimum of the degree-three polynomial of eq. (9.7.12), and this minimum is given by eq. (9.7.14). However in the code on page 390, a different formula is used (in particular for the case b>0).
What is the reason of this difference?

Thank you very much
Riccardo Longoni

Saul Teukolsky
10-30-2007, 12:41 PM
Dear Riccardo,

The limited memory version is designed for very large problems where storage of intermediate results could cause difficulty. It is a more complicated algorithm that is beyond the scope of NR.

The reason for the different formula for the minimum when b > 0 is to minimize roundoff. This is discussed at the beginning of Section 5.6 on the roots of quadratic equations.

Best wishes,
Saul Teukolsky

riccardolongoni
11-07-2007, 11:54 AM
Thank you for your kind answer.

I have another question regarding line search (Section 9.7, Second Edition) and in particular regarding formula (9.7.10).

The book says that one always tries the full Newton step lambda=1 first, and, if the step is not acceptable, then one approximates the function with a degree-two polynomial whose minimum is given by eq. (9.7.11).

The is however a problem when g(1)-g(0)-g'(0) <0.

In this case in fact the minimum is actually the maximum, and instead of setting lambda equal to eq. (9.7.11) I would leave lambda=1 (which is the minimum of the degree-two approximation in [0,1]).

At the next iteration, using the book's code, one would try with a degree-three approximation on the interval [0,0.1], while with my suggestion one would try in [0,1].

It seems to me that an "acceptable minimum" could be likely be in [0.1,1], or is there any reason to exclude the search in this interval?

Do you think this modification could speed up the algorithm?

Thank you very much in advance.

Riccardo Longoni

Saul Teukolsky
11-07-2007, 12:39 PM
Dear Riccardo,

The inequality will in practice hardly ever have the sign you are worried about, since we are starting with a descent direction. However, if it does give an unacceptable value of lambda, you cannot choose lambda =1. You got to this point by already having tried lambda =1 and found it unacceptable. So you must reduce lambda, which we take to be 0.1 for this step. On the next iteration, hopefully we are out of the "difficult" region and lambda=1 will work.

The purpose of this modification is to make the algorithm robust. Most of the time it will not be necessary.

Best wishes,
Saul Teukolsky

riccardolongoni
11-08-2007, 02:16 AM
Thanks again for the reply. Indeed if the step lambda=1 is not acceptable then

g(1) - g(0) - alpha*g'(0) > 0

and if we are searching in a descend direction (g'(0)<0), then

g(1) - g(0) - g'(0) > g(1) - g(0) - alpha*g'(0) > 0.

Sorry, I missed this point..

Best regards
Riccardo Longoni