Random number generators
pizza
04-16-2009, 04:29 AM
Hi there,
I was wondering why the book does not even mention generators like the Mersenne Twister and Subtract-with-Carry, which made it into important standards like upcoming C++0x.
Conversely, I'm wondering if the authors' "Ran" has been peer-reviewed, as I could find no reference to it outside of the book itself.
Thank you
Bill Press
04-16-2009, 08:16 PM
Very reasonable question. Mersenne twistor is not included because it has just too many operations per random value generated.  Except perhaps for cryptographic applications, this is completely unnecessary and wasteful.
Add-with-carry (or subtract-with-borrow) are good algorithms, but (see discussion on p. 350 or NR3) we'd still only use them in a combined generator, and they didn't quite fit the scheme adopted in Section 7.1 of NR3.
While the generators in NR3 have been extensively tested by the authors, and are based on principles from the peer-reviewed literature, they have not been independently reviewed.
Note that we no longer recommend using any of the generators in earlier editions of NR. They come from an previous age where it was normal to generate only "millions" of deviates, not "billions or trillions" as today.  They were good enough then, but not now.
Cheers,
Bill P.
pizza
04-17-2009, 04:27 AM
Thanks Bill.
One thing about Mersenne Twister for the benefit of the casual reader
Mersenne twistor is not included because it has just too many operations per random value generated.  Except perhaps for cryptographic applications, this is completely unnecessary and wasteful.
Looking on the web, it would seem that actually MT is not suitable for cryptographic applications. It is suggested that one should use even more expensive algorithms like Blum Blum Shub instead.
http://en.wikipedia.org/wiki/Mersenne_Twister
http://en.wikipedia.org/wiki/Blum_Blum_Shub
Unfortunately I cannot support this with any more references than those found on Wikipedia.
stephan80
04-11-2011, 04:03 AM
Hi,
indeed the Mersenne Twister seems to need more lines of code and possibly more operations per random number, but the performance still beats the NR3-generators. Here I compared Ran and Ranq1 (from NR3) and the Mersenne Twister mt_19937 from the Gnu Scientific Library (GSL) on an iMac with a 2.8 GHz Intel Core 2 Duo. I used the clock() function from <ctime> to evaluate the time needed for one Billion random 32bit - integers:
ran: 20.2135 sec
ranq1: 12.0369 sec
gsl_mt_19937: 9.52048 sec
The difference is not huge and I believe I can totally live with ranq1, but given this great performance I still wonder why it isn't even mentioned in the NR3 book?
Thanks,
Stephan
davekw7x
04-11-2011, 06:57 PM
.....
ranq1: 12.0369 sec
gsl_mt_19937: 9.52048 sec
I am very interested in the test programs you are using.
Here are my simple programs (get 1000000000 variates and print max and min values):
/*
   1000000000 numbers from GSL mt19937
   davekw7x
*/
#include <stdio.h>
#include <gsl/gsl_rng.h>
int main()
{
    const gsl_rng_type *T;
    gsl_rng *r;
    unsigned long rmin;
    unsigned long rmax;
    unsigned long long i;
    const unsigned long long NumRans = 1000000000ULL;
/*
   1000000000 calls to get 32-bit numbers
   from Ranq1
   davekw7x
*/
#include <nr3.h>
#include <ran.h>
int main()
{
    Ranq1 ran(time(0));
    Ullong num = 1000000000uLL;
    Uint rmin = 0xffffffff;
    Uint rmax = 0;
    for (Ullong n = 0; n < num; n++) {
        Uint r = ran.int32();
        if (r < rmin) {
            rmin = r;
        }
        else if (r > rmax) {
            rmax = r;
        }
    }
    cout << endl;
    cout << "With Ranq1, for " << num  << " calls of int32() " << endl
         << "(rmin,rmax) = (" << rmin
         << "," << rmax << ")"
         << endl;
    return 0;
}
    gsl_rng_env_setup();
    T = gsl_rng_default;
    
    r = gsl_rng_alloc(T);
    rmin = gsl_rng_max(r);
    rmax = gsl_rng_min(r);
    printf ("GNU Scientific Library generator type: %s\n", gsl_rng_name (r));
    for (i = 0; i < NumRans; i++) {
        unsigned long u = gsl_rng_get(r);
        if (u < rmin) {
            rmin = u;
        }
        else if (u > rmax) {
            rmax = u;
        }
    }
    printf("For %llu random numbers:\n", NumRans);
    printf("rmin     = %10lu\n", rmin);
    printf("rmax     = %10lu\n", rmax);
    printf("RAND_MAX = %10lu\n", gsl_rng_max(r));
    gsl_rng_free(r);
    return 0;
}
Using the Linux utility "time" to get the output on my creaky (but not cranky) old Linux workstation running Centos 5.6 Linux:
______________________________________________
GNU Scientific Library generator type: mt19937
For 1000000000 random numbers:
rmin     =          0
rmax     = 4294967286
RAND_MAX = 4294967295
real    0m24.080s
user    0m23.666s
sys     0m0.031s
_____________________________________________
Now, for the NR test program:
/*
   1000000000 calls to get 32-bit numbers
   from Ranq1
   davekw7x
*/
#include <nr3.h>
#include <ran.h>
int main()
{
    Ranq1 ran(time(0));
    Ullong num = 1000000000LL;
    Uint rmin = 0xffffffff;
    Uint rmax = 0;
    for (Ullong n = 0; n < num; n++) {
        Uint r = ran.int32();
        if (r < rmin) {
            rmin = r;
        }
        else if (r > rmax) {
            rmax = r;
        }
    }
    cout << endl;
    cout << "With Ranq1, for " << num  << " calls of int32() " << endl
         << "(rmin,rmax) = (" << rmin
         << "," << rmax << ")"
         << endl;
    return 0;
}
Output:
______________________________________________
With Ranq1, for 1000000000 calls of int32() 
(rmin,rmax) = (6,4294967295)
real    0m12.069s
user    0m11.975s
sys     0m0.016s
_____________________________________________
Bottom line: As far as I'm concerned it's not a competition, and I'm not particularly "expert" at (or interested in, at this point in my voyage on spaceship Earth) comparing the virtues and vices of the different schemes.
I hate to repeat myself, but I am genuinely interested in how you got the results that are different from mine.
Regards,
Dave
Footnote:
gsl version 1.13
GNU gcc version 4.1.2
I didn't record the times for NR3 Ran, but they were about 23 seconds (Roughly 2 times the Ranq1 time and 95% of the gsl mt19937 time).
I also modified the programs to use clock() and gettimeofday() to estimate the time in the loop.  Results were essentially the same as those reported by the time utility for "user" time and "real" time, respectively, on my not-very-busy workstation.  (When other stuff is going on, the "real" time can get noticeably higher than "user" time.)
stephan80
04-12-2011, 02:51 AM
Here is my program. Did I do a trivial mistake? Anyway, I agree there is no "competition", I was just interested in how the mersenne twister performs yesterday and thought I post the results.
#include "nr3.h"
#include "ran.h"
#include <iostream>
#include <gsl/gsl_rng.h>
#include <ctime>
int main() {
    
    Ranq1 ranq(123);
    Ran ran(123);
    gsl_rng *rng = gsl_rng_alloc(gsl_rng_mt19937);
    gsl_rng_set(rng, 123);
    
    clock_t t = clock();
    
    const int n = 1000000000;
	
    std::cerr << "ran: ";
    for(int i = 0; i < n; ++i) {
        ran.int32();
    }
    clock_t new_t = clock();
    std::cerr << (new_t - t) / double(CLOCKS_PER_SEC) << " sec\n";
    t = new_t;
    std::cerr << "ranq: ";
    for(int i = 0; i < n; ++i) {
        ranq.int32();
    }
    new_t = clock();
    std::cerr << (new_t - t) / double(CLOCKS_PER_SEC) << " sec\n";
    t = new_t;
    
    
    std::cerr << "gsl: ";
    for(int i = 0; i < n; ++i) {
        gsl_rng_get(rng);
    }
    new_t = clock();
    std::cerr << (new_t - t) / double(CLOCKS_PER_SEC) << " sec\n";
    t = new_t;
    
}
stephan80
04-12-2011, 03:26 AM
OK guys, I just realized that I did not use any compilerwise optimization. When I compile the test-program with -O3, the NR3 generators outcompete the gsl's ones (I also put in min-max evaluations for comparison with Dave's program): 
ran: 6.1351 sec, (rmin, rmax) = (0, 4294967287)
ranq: 4.74101 sec, (rmin, rmax) = (1, 4294967290)
gsl: 9.80734 sec, (rmin, rmax) = (4, 4294967292)
Note however, that the gsl is linked as a dynamic library that has been compiled by my sysadmin with whatever optimization he used, so maybe this comparison is not fair at this point.
I checked the website of mersenne-twister. It seems they have a new "fast" version. I will not do a comparison with that, but if anyone is interested please let us know.
Cheers,
Stephan
davekw7x
04-12-2011, 08:35 AM
OK guys, I just realized...
Here's what I got with your program:
ran: 0 sec
ranq: 0 sec
gsl: 16.04 sec
The NR3 stuff is infinitely faster!  Or would it be more precise to say that the gsl stuff is infinitely slower, since the execution time factor from zero to 13.67 seconds is infinite?
Any guesses as to why the NR3 function loops were optimized out and the gsl function loop was not?
Well, here's my guess:  I think that the gsl function was expanded in-line and the NR3 functions were not.  But it's only a guess.  The interesting thing is that, in general, function calls are not optimized away.  However if a loop only calculates the value of a variable that isn't used anywhere, the entire loop may (or may not) be optimized away.
Bottom line: When benchmarking, you have to take everything into consideration.  Metrics for programs with do-nothing loops are particularly unreliable when comparing performance.  Compiler (and library) versions are very important when comparing results with the same code on different systems.  See Footnote.
if anyone is interested please let us know
Heck, my problem is that I am interested in just about everything.  (It's a curse).  I can't speak for the owners of this forum, but I always enjoy stuff like this.  I say, "Show the results."
Regards,
Dave
Footnote:
One thing that I didn't mention in my previous post is that I routinely compile with the gcc/g++ -O3 flag.  Without any optimization flags, the NR loops were not optimized away, and the run times were between 2.5 and 3 times as long as with my original post. (That's why I usually specify -O3)
The gsl loop was actually faster for your program(!)  
(Remember that in my programs, each time through the loop the result from the random number generator was tested to see whether there was a new max or min value. so a gsl do-nothing loop is actually faster than a gsl do-something loop.) 
(Since the loops don't actually do anything with the results of the function calls, in my opinion the results still don't count for much, but they do show the importance of trying different optimization settings and reporting them along with the conclusions.)
stephan80
04-15-2011, 02:55 AM
I also noticed that the NR3-Function calls get optimized away. You have to use the random number. That's why I said I actually did this min- max- Analysis that you suggested. If you do that you should get about the result I got in my last post. I also don't understand why it optimizes away function calls. But if the function values are never used it makes sense, right :-)
Stephan
ichbin
04-29-2011, 03:27 AM
Bill gives a good reason for not including the Mersenne twister. George Marsglia has also pointed out that there are simpler alternatives that we have every reason to believe are just as good. But...
Mersenne twister has become the IBM of random number generators: no one every got fired for using it. It has become practically de rigueur in fields like financial mathematics. If only for the sake of not giving the impression that the authors are unaware of it, it would be good for the text to discuss it.
On page 343 of the NR3 book, the authors say: "If you need a random integer between 1 and n (inclusive), say, then the expression 1 + myran.int64() % (n-1) is perfectly OK (though there are faster idioms than the use of %)."
Can someone tell me how to generate integers over a bounded interval faster than using the modulus operator?  Here is what I tried:
LBOUND + static_cast<int> ((UBOUND - LBOUND + 1.0) * NRstream.doub());
But, turns out, the modulus operator method is faster:
LBOUND + NRstream.int64()%(UBOUND+1-LBOUND);
So, does anyone know what the authors have in mind when they say: "though there are faster idioms than the use of %"?
It's worth noting that there's an error in the book at this point: it should be %n, not %(n-1).
As for what's quicker than %, it may depend on the range you're using and whether you care about average speed or worst-case speed. For instance, suppose you want numbers between 0 and 65534 inclusive; you may do best to generate 16 bits (e.g., by taking rng()&65535, which will in fact be equivalent to rng()%65536 unless your compiler is really stupid and then retry if you are so unfortunate as to get 65535.
you may do best to generate 16 bits (e.g., by taking rng()&65535, which will in fact be equivalent to rng()%65536 unless your compiler is really stupid and then retry if you are so unfortunate as to get 65535.
And "rng()" is a random integer, right?  That's a very interesting idea.  I will try this.
Yup, rng() here is anything that returns good random integers.
Important note: If you happen to be using a *bad* random number generator (e.g., a linear congruential generator) then using the bottom 16 bits (or, generally, the value modulo some fixed number) is likely to make it worse. In that situation you'd do better to take 16 high bits rather than 16 low bits.
The RNGs in the latest NR shouldn't have that problem: their low bits should be about as good as their high bits.
I'm using the following code:
randomInt =  LB + (int64()&(UB-LB))
where int64() is Ranq1::int64() from NR 3rd Ed.
However, when I examine a sample of 5000 variates (i.e. 5000 values of randomInt drawn by consecutive calls to Ranq1::int64() from a single stream, the distribution looks odd.  The integers are supposed to be uniformly randomly distributed over the interval [20,50].  They are within that interval, but look at the "gaps" in the following histogram of those variates:
http://www7.pic-upload.de/thumb/07.09.13/jz4t324x4ctz.jpg (http://www.pic-upload.de/view-20646449/new.jpg.html)
It seems that this method (using bitwise &) is not sampling all the possible outcomes of the random variable.  That is, some integers in the interval [20,50] are apparently never drawn.
By comparison, the following histogram shows 5000 variates from the same uniform distribution but sampled using the mod (%) idiom:
http://www7.pic-upload.de/thumb/07.09.13/bw5rkwu6ul8.jpg (http://www.pic-upload.de/view-20646461/current.jpg.html)
It looks to me like the & idiom you suggest does not result in a valid uniform distribution.  But, maybe I'm implementing your suggestion incorrectly.  Am I?
How can I modify my method of using the bitwise & operator so that it samples integers from a valid uniform distribution in the interval [LB,UB]?
Note:  The % method is almost 3 times slower than the & method in my program, so I'm really hopeful that I can get the & method to work correctly.
My last post was responded to on Stackoverflow:
Using bitwise & instead of modulus operator to randomly sample integers from a range (http://stackoverflow.com/questions/18674977/using-bitwise-instead-of-modulus-operator-to-randomly-sample-integers-from-a-r)