Problems with gradient convergence criteria for dfpmin() (10.9)


nils
11-06-2007, 05:24 PM
Two problems with the convergence criteria in dfpmin

1. The denominator for the convergence on the gradient is currently computed as

den=MAX(fret,1.0);

but for negative fret values, this is alway 1.0 !
I think this line should be

den=MAX(abs(fret), 1.0);

because the function to minimize does not need to be positive and the minimum can be a negative value.

2. The components of the gradient (partial derivatives) are multiplied with the absolute value of the components of the current point:

temp = abs(g[i])*MAX(abs(p[i]),1.0)/den

however, this should use the *change* of the component rather than its absolute value (the criteria should be invariant to translations of the minimization problem). In addition, the MAX(...,1.0) magic is useless, if not harmful here (guess it is a copy+paste bug from the denominator calculation of the Delta(x) convergence criteria a few lines above in the same routine. In a denominator, MAX(...,1.0) prevents dividing by zero and switches from relative to absolute differences - but here it is a multiplier, not a denominator).

So I think this line should read

temp = abs(g[i])*abs(xi[i])/den

(The vector xi is already computed as the difference of the current point minus the previous point before).

Both problems might have been unnoticed for a longer time, since problem 1. simply means that absolute instead relative differences are used and 2. does not have much impact if all the components of the solution are in order of magnitude one, as is stated in the last paragraph of section 9.7.2

Saul Teukolsky
11-07-2007, 12:29 PM
Dear Nils,

1. Yes, this should be abs(fret). This bug has been unnoticed for a very long time! In practice, it probably doesn't cause serious problems, but we'll fix it in the next reprinting.

2. The test for a small gradient is actually subtle. It is described in the Dennis and Schnabel reference we give in the book in their section 7.2. The basic idea is to define the relative gradient along the i-direction at the current point x as

relative rate of change in f / relative rate of change in x
= (delta f / f) / (delta x / x)

Approximating delta f by (grad f) (delta x) gives the formula we use. (You should treat x as a vector here, so it's really the ith component we're talking about.) You must safeguard against zero values of both f and x, as is clear from the above expression.

The test you propose would actually work OK on many problems, but it's not the test we use.

Best wishes,
Saul Teukolsky

nils
11-08-2007, 02:53 AM
Dear Saul,

thanks for your clarification on item 2. Now everything is clear to me, including this MAX(...,10.) spell - it is that safeguard against zero x components that you mentioned, but now appears in the nominator.

I think it would be worth a comment in the code, since it is not obvious. When I was reading that code "line by line" as you recommend in section 1.0.1 of the book, I was simply choking on that line. A comment like "relative change in f divided by relative change in x" or simply "(df/f) / (dx/x)" would help a lot to the understanding.

And it points out another important "feature" of the routine: The code is not invariant to translations in x - you already mentined in the text before as "badly scaled".

Regarding problem 1., I found it in the routines
- minimize() (10.8) and potentially it also exists in
- newt() (9.7.2, not sure whether f is guaranteed positive here)