Speed of VecDoub vs. double array
snaqvi
08-01-2010, 06:08 PM
In trying to optimize the speed of my programs, I saw a dramatic difference in speeds when I used a simple double array vs. the NRvector<double> class. 
int main()
{
 VecDoub A(3);
  cout << 0<<endl;
 double A1[3];
  for(int i=0;i<999999999;i++)
     for(int j=0;j<100;j++)
     {
      A1[0]=43;
      A1[1]=A1[0]*2.142;
      A1[2]=A1[1]/5.31;
     }
 cout<< "end loop 1" <<endl;
  for(int i=0;i<999999999;i++)
     for(int j=0;j<100;j++)
     {
      A[0]=43;
      A[1]=A[0]*2.142;
      A[2]=A[1]/5.31;
     }
 cout<< "end loop 2" <<endl;
}
compiled on linux/fedora12 using g++ main.c -O3 [optimization level 3]
The difference in times for the 1st and 2nd loop was more than a factor of 10, i.e. the VecDoub version seems to run painfully slow compared to the simple array version. Is there something I am doing wrong? Thank you.
davekw7x
08-01-2010, 08:15 PM
In trying to optimize the speed ...Since you don't do anything with the results of the calculations, perhaps the compiler is optimizing away some stuff.
Maybe try something like the following:
#include <sys/time.h>
#include "nr3.h"
int main()
{
    // If you don't declare the array to be volatile, the
    // compiler may optimize away some of the access statements
    volatile double A1[3];
    //
    // The compiler won't optimize away access to a VecDoub
    // since access involves calling the overloaded [] operator
    // and function calls are not optimized away.
    // 
    VecDoub A(3);
    //std::vector<double> A(3);
    struct timeval tv1;
    struct timeval tv2;
    unsigned long elapsed;
    double sum;
    cout << "Loop1 uses an array." << endl;
    sum = 0.0;
    gettimeofday(&tv1, NULL);
    for (int i = 0; i < 999999999; i++) {
        A1[0] = 43;
        A1[1] = A1[0] * 2.142;
        A1[2] = A1[1] / 5.31;
        sum += A1[2] + A1[1] + A1[0];
    }
    gettimeofday(&tv2, NULL);
    elapsed = (tv2.tv_sec - tv1.tv_sec) * 1000000 + (tv2.tv_usec - tv1.tv_usec);
    //Print out the sum so that the compiler won't optimize away all
    // of the calculations
    cout << "End of loop 1: sum = " << sum << endl;
    cout << "Elapsed time = " << elapsed/1000000.0 << " seconds" << endl << endl;
    cout << "Loop 2 uses the VecDoub" << endl;
    sum = 0.0;
    gettimeofday(&tv1, NULL);
    for (int i = 0; i < 999999999; i++) {
        A[0] = 43;
        A[1] = A[0] * 2.142;
        A[2] = A[1] / 5.31;
        sum += A[2] + A[1] + A[0];
    }
    gettimeofday(&tv2, NULL);
    elapsed = (tv2.tv_sec - tv1.tv_sec) * 1000000 + (tv2.tv_usec - tv1.tv_usec);
    cout << "End of loop 2: sum = " << sum << endl;
    cout << "Elapsed time = " << elapsed/1000000.0 << " seconds" << endl;
    return 0;
}
Output on my modest Centos system (compiled with g++ version 4.1.2 and -O3 command-line switch):
Loop1 uses an array.
End of loop 1: sum = 1.52452e+11
Elapsed time = 11.2584 seconds
Loop 2 uses the VecDoub
End of loop 2: sum = 1.52452e+11
Elapsed time = 13.2858 seconds
There is always some overhead when you access a vector class object with the overloaded [] operator.  (Try it with a std::vector<double> instead of a VecDoub and see how that compares.)
That's the price you pay for the having the convenience of automatic dynamic resizing, copying with assignment statements, etc.  If you don't need these, then, by all means, stick with arrays (if you have a choice).  Maybe you can rewrite the NR functions of interest to be more optimal.  Just remember, you will also have to rewrite all of the functions that those functions call if they have parameters are some kind of NRVector.
You can also investigate other libraries, such as the GNU Scientific Library.  Some of its functions (not just ones with vectors) are significantly faster than the corresponding Numerical Recipes routines.  Now if they only had a text that elucidates them...
Regards,
Dave
snaqvi
08-03-2010, 11:59 AM
Thanks for the analysis. I am glad that the difference is not as big for more realistic problems than the one I showed.
davekw7x
08-03-2010, 01:30 PM
...difference is not as big...My simple example still had a place where the compiler optimized stuff differently in one loop than in the other.
In practice (with real-world calculations that don't get optimized away) there is little, if any, difference in performance.
Regards,
Dave