Saul Teukolsky
09-01-2003, 10:52 AM
Recent compiler advances may cause the routines svdcmp, jacobi, tqli and hqr to fail to converge for certain problems. So far we have seen the problem only with the gcc/g++ compiler, version 3.0 or later, on pentium machines only, and only for a single example involving svdcmp. We would like to hear reports involving other compilers/hardware.
The problem arises in convergence tests for negligible x of the form
if (x+y == y)
which can fail when values are kept in 80-bit registers but would be satisfied with 64-bit values for x and y. The fix is to force the compiler to make 64-bit tests by writing the intermediate value to memory:
temp=x+y;
if (temp == y)
As an added precaution, declare temp to be of type "volatile". This prevents the compiler from optimizing away the storage of temp (although the gcc compiler isn't yet smart enough to do this.)
Here is a complete list of changes that need to be implemented:
svdcmp.cpp has 4 changes:
1) After the 7th line of code, insert the declaration
DP volatile temp;
2) Change
if (fabs(rv1[l])+anorm == anorm) {
to
temp=fabs(rv1[l])+anorm;
if (temp == anorm) {
3) Change
if (fabs(w[nm])+anorm == anorm) break;
to
temp=fabs(w[nm])+anorm;
if (temp == anorm) break;
4) Change
if (fabs(f)+anorm == anorm) break;
to
temp=fabs(f)+anorm;
if (temp == anorm) break;
jacobi.cpp has 3 changes:
1) After the 4th line of code from the start of the jacobi routine, insert
DP volatile temp1,temp2;
2) Change
if (i > 4 && (fabs(d[ip])+g) == fabs(d[ip])
&& (fabs(d[iq])+g) == fabs(d[iq]))
to
temp1=fabs(d[ip])+g;
temp2=fabs(d[iq])+g;
if (i > 4 && temp1 == fabs(d[ip]) && temp2 == fabs(d[iq]))
3) Change
if ((fabs(h)+g) == fabs(h))
to
temp1=fabs(h)+g;
if (temp1 == fabs(h))
tqli.cpp has 2 changes:
1) After the 7th line of code, insert
DP volatile temp;
2) Change
if (fabs(e[m])+dd == dd) break;
to
temp=fabs(e[m])+dd;
if (temp == dd) break;
hqr.cpp has 3 changes:
1) After the 8th line of code, insert
DP volatile temp;
2) Change
if (fabs(a[l][l-1]) + s == s) {
to
temp=fabs(a[l][l-1]) + s;
if (temp == s) {
3) Change
if (u+v == v) break;
to
temp=u+v;
if (temp == v) break;
The problem arises in convergence tests for negligible x of the form
if (x+y == y)
which can fail when values are kept in 80-bit registers but would be satisfied with 64-bit values for x and y. The fix is to force the compiler to make 64-bit tests by writing the intermediate value to memory:
temp=x+y;
if (temp == y)
As an added precaution, declare temp to be of type "volatile". This prevents the compiler from optimizing away the storage of temp (although the gcc compiler isn't yet smart enough to do this.)
Here is a complete list of changes that need to be implemented:
svdcmp.cpp has 4 changes:
1) After the 7th line of code, insert the declaration
DP volatile temp;
2) Change
if (fabs(rv1[l])+anorm == anorm) {
to
temp=fabs(rv1[l])+anorm;
if (temp == anorm) {
3) Change
if (fabs(w[nm])+anorm == anorm) break;
to
temp=fabs(w[nm])+anorm;
if (temp == anorm) break;
4) Change
if (fabs(f)+anorm == anorm) break;
to
temp=fabs(f)+anorm;
if (temp == anorm) break;
jacobi.cpp has 3 changes:
1) After the 4th line of code from the start of the jacobi routine, insert
DP volatile temp1,temp2;
2) Change
if (i > 4 && (fabs(d[ip])+g) == fabs(d[ip])
&& (fabs(d[iq])+g) == fabs(d[iq]))
to
temp1=fabs(d[ip])+g;
temp2=fabs(d[iq])+g;
if (i > 4 && temp1 == fabs(d[ip]) && temp2 == fabs(d[iq]))
3) Change
if ((fabs(h)+g) == fabs(h))
to
temp1=fabs(h)+g;
if (temp1 == fabs(h))
tqli.cpp has 2 changes:
1) After the 7th line of code, insert
DP volatile temp;
2) Change
if (fabs(e[m])+dd == dd) break;
to
temp=fabs(e[m])+dd;
if (temp == dd) break;
hqr.cpp has 3 changes:
1) After the 8th line of code, insert
DP volatile temp;
2) Change
if (fabs(a[l][l-1]) + s == s) {
to
temp=fabs(a[l][l-1]) + s;
if (temp == s) {
3) Change
if (u+v == v) break;
to
temp=u+v;
if (temp == v) break;