Faster sine and cosine integrals, 6.8.2
Maxim Yurkin
03-08-2008, 05:44 AM
In my program I calculate a lot of (up to 10^8 per run) sine and cosine integrals. Currently I am basically using the routine from NR3 for that. For certain input parameters my program spends significant amount of time on these integrals.
So the question is whether it is possible to accelerate these calculations, and what factor of improvement can be expected. For example, I have taken a quick look at GSL implementation of the same routine. It uses Chebyshev series with 20 precomputed coefficients instead of power series with coefficients computed on the way. But I haven't tried it yet (it uses a lot of GSL-specific code, hence it is not easy to take a single routine out), so the question is can I improve significantly or is NR implementation is close to optimal?
More generally, what are the possibilities for acceleration of multiple function evaluations (e.g. tabulation)? And when this acceleration methods can be beneficial?
davekw7x
03-08-2008, 10:50 AM
...
So the question is whether it is possible to accelerate these calculations
It is (almost) always possible, depending on the actual conditions and requirements.
...what factor of improvement can be expected.
That's an easy question to answer. The answer is: It Depends.
The "optimal" strategy might depend on the range of input values and the error specification. For example, a series that calculates terms "on the fly" and terminates when the results are "close enough" may be more efficient for small values than a routine that does a fixed number of steps with pre-calculated Chebychev coefficients.
As for your specific question, you could make a simple program with nr3 cisi() that calculates elapsed time for a number of iterations. Then write a simple program using gsl_sf_Ci() and/or gsl_sf_Si() and compare results. Use a mix of input values that are typical (or worst-case or whatever is important) for your application.
Note that nr3 cisi actually calculates both the Ci value and the Si value but they are separate functions for gsl.
Note also, that the Numerical Recipes narrative doesn't make any claims for ultimate optimization, but shows a very straightforward and clear implementation that derives from fundamentals. See Footnote.
Here are the results of a couple of simple programs on my Linux system:
Cosine and Sine Integrals with gsl_sf_Ci and gsl_sf_Si
x Ci(x) Si(x)
0.1 -1.727868 0.099944
0.2 -1.042206 0.199556
0.6 -0.022271 0.588129
0.7 0.100515 0.681222
1.8 0.456811 1.505817
1.9 0.441940 1.557775
2.0 0.422981 1.605413
2.1 0.400512 1.648699
2.2 0.375075 1.687625
3.3 0.024678 1.848081
6.3 -0.019888 1.418174
6.4 -0.004181 1.419223
6.5 0.011102 1.421794
6.6 0.025823 1.425816
10.0 -0.045456 1.658348
12.5 -0.011408 1.492337
15.0 0.046279 1.618194
Elapsed time for 1000000 calculations = 14.094999 seconds.
Cosine and Sine Integrals with NR3 cisi
x Ci(x) Si(x)
0.1 -1.727868 0.099944
0.2 -1.042206 0.199556
0.6 -0.022271 0.588129
0.7 0.100515 0.681222
1.8 0.456811 1.505817
1.9 0.441940 1.557775
2.0 0.422981 1.605413
2.1 0.400512 1.648699
2.2 0.375075 1.687625
3.3 0.024678 1.848081
6.3 -0.019888 1.418174
6.4 -0.004181 1.419223
6.5 0.011102 1.421794
6.6 0.025823 1.425816
10.0 -0.045456 1.658348
12.5 -0.011408 1.492337
15.0 0.046279 1.618194
Elapsed time for 100000 calculations = 13.993086 seconds.
Since the number of iterations for the gsl functions is one million and the number for nr3 functions is one hundred thousand and the elapsed times on my not-very-busy Linux system are comparable, one might conclude that, for GNU compilers and for this set of input values, the gsl stuff is faster by a factor of about 10.
...when this acceleration methods can be beneficial?
Huh? What the heck do you mean by that? Faster run time is beneficial when it's important to you and your application. Clear, portable code (a short function that doesn't require any other "stuff") may be more important for other people and their applications. See Footnote (again).
Note that I don't necessarily think the gsl code is unclear or nonportable, but, for some people, it may be a little bothersome to extract just the part that you need for your non-gsl application. Whether it's worth it is your call. I mean, it depends. (But I said that already.)
Regards
Dave
Footnote:
"The purpose of computing is insight not numbers."
---Richard W. Hamming.
Maxim Yurkin
03-09-2008, 04:17 AM
Thanks a lot, Dave. Now I see that gsl implementation is definitely worth trying.