Binomialdev Bug?


stephan80
09-28-2009, 04:18 AM
Hi,

If I sample many times from the binomial deviate generator, I sometimes get a result _larger_ than the number of samples I put it. Am I doing something wrong or is there a bug?

I have the following little program:

#include "nr3.h"
#include "ran.h"
#include "gamma.h"
#include "deviates.h"

int main (int argc, char * const argv[]) {
if(argc != 2) {
fprintf(stderr, "Params: seed\n");
return 1;
}
Ullong seed = atoi(argv[1]);

Binomialdev rng(100, 0.3, seed);
for(int i = 0; i < 100000; ++i) {
int sample = rng.dev();
printf("%d %d\n", i, sample);
if(sample > 100 || sample < 0) {
fprintf(stderr, "Ouch... this should never happen!\n");
return 1;
}
}
return 0;
}


This program just tries to sample 100.000 times and give an alarm output if the result is smaller than 0 or bigger than 100, where 100 is the parameter "n" in the binomial generator.
If I compile this program with gcc 4.0.1 on Darwin (UNIX) and run it with any seed, I indeed get too high Ns. I tried different seeds, they all fail. For example with seed "1234" I get after 39457 loops the value 257!

Any help?

Thanks,
Stephan

davekw7x
09-28-2009, 08:59 AM
...I sometimes get a result _larger_ than the number of samples I put it... I could not duplicate your problem using GNU g++ version 4.1.2 on Centos Linux with your code.

I also tried the following and never got a bad value:

#include "nr3.h"
#include "ran.h"
#include "gamma.h"
#include "deviates.h"

int main () {
Ullong goodcount;
Ullong badcount;
Ullong numvalues = 1000000;

Ullong seed;

//
// Make sure the following shows sizeof(Ullong) = 8
//
cout << "sizeof(Ullong) = " << sizeof(Ullong) << endl << endl;
if (sizeof(Ullong) != 8) {
cout << "Sorry, we can't do business" << endl;
exit(EXIT_FAILURE);
}

cout << "Enter a seed value: ";
while (cin >> seed) {
cout << " seed = " << seed << endl;

Binomialdev rng(100, 0.3, seed);

goodcount = 0;
badcount = 0;

for(Ullong i = 0; i < numvalues; ++i) {
int sample = rng.dev();
if(sample > 100 || sample < 0) {
++badcount;
cerr << "With i = " << i
<< ", sample value = " << sample << endl;
}
else {
++goodcount;
}
}
cout << "Number of samples = " << numvalues << endl
<< "Number of good values = " << goodcount << endl
<< "Number of bad values = " << badcount << endl << endl;

cout << "Enter another seed: ";
}
cout << "Goodbye for now." << endl;
return 0;
}


Output:

sizeof(Ullong) = 8

Enter a seed value: 1234
seed = 1234
Number of samples = 1000000
Number of good values = 1000000
Number of bad values = 0

Enter another seed: 6666666666666666
seed = 6666666666666666
Number of samples = 1000000
Number of good values = 1000000
Number of bad values = 0

Enter another seed: quit
Goodbye for now.




Regards,

Dave

stephan80
09-28-2009, 11:05 AM
The mystery deepens. If I compile and run your program I get:

sizeof(Ullong) = 8

Enter a seed value: 1234
seed = 1234
With i = 39457, sample value = 257
With i = 40750, sample value = 104
With i = 50265, sample value = 104
With i = 85307, sample value = 106
With i = 85666, sample value = 107
With i = 95904, sample value = 109
With i = 99752, sample value = 111
Number of samples = 100000
Number of good values = 99993
Number of bad values = 7

Enter another seed: 879
seed = 879
With i = 18057, sample value = 123
With i = 27590, sample value = 104
With i = 40504, sample value = 128
With i = 41966, sample value = 108
With i = 69226, sample value = 105
With i = 79723, sample value = 133
With i = 94748, sample value = 139
With i = 95337, sample value = 110
Number of samples = 100000
Number of good values = 99992
Number of bad values = 8

Enter another seed:

no clue yet... working on it.
Thanks for this reply, at least it can't be a general bug then...

Stephan

davekw7x
09-28-2009, 02:23 PM
The mystery deepens...

(If you don't like long-winded explanations, see the Footnote now.)

On the other hand, if you like a good detective story (I'm not exactly Hercule Poirot, but some might find it interesting):

Hmmmmm...

I notice the following in the initializer for Binomialdev in deviates.h:


if (n <= 64) {
//
// Some stuff
//

swch = 0;
} else if (n*p < 30.) {
//
// Some more stuff
//
swch = 1;
} else {
//
// Yet more stuff
//
swch = 2;
}


So, I tried to see if I could find the "corner cases," which would incorporate values of the parameters that would make the floating point inequality "almost-but-not-quite hold" and sets of parameters that would make the inequality "just-barely hold." This might give us a clue as to why my results are different from yours.

Here's what I discovered.

I made the following changes in my test program:

Int n = 100;
Doub p = 0.30000000000000002;

cout << setprecision(20) << fixed
<< "p = " << p << ", n = " << n
<< ", n*p = " << n*p << endl;

.
.
.

Binomialdev rng(n, p, seed);
.
.
.


Here's the result (looks familiar):


p = 0.30000000000000004441, n = 100, n*p = 30.00000000000000355271
sizeof(Ullong) = 8

Enter a seed value: 1234
seed = 1234
With i = 39457, sample value = 257
With i = 40750, sample value = 104
With i = 50265, sample value = 104
With i = 85307, sample value = 106
With i = 85666, sample value = 107
With i = 95904, sample value = 109
With i = 99752, sample value = 111
Number of samples = 100000
Number of good values = 99993
Number of bad values = 7


Note that if I use the following

Doub p = 0.30000000000000001;


Here's the output:

p = 0.29999999999999998890, n = 100, n*p = 30.00000000000000000000
sizeof(Ullong) = 8

Enter a seed value: 1234
seed = 1234
Number of samples = 100000
Number of good values = 100000
Number of bad values = 0




Conclusion: Different handling of floating point numbers (guard bits or whatever) results in slightly different roundoff errors in the internal workings of code with different compilers. You "just happened" to pick a "corner case" for which a bug in the function showed up for my compiler and not for yours.

In fact, further testing with n = 100 showed that for values of p between 0.3 and 0.7 the function can give invalid values. (Actually this can happen in any case for which n> 64 and p is between 0.3 and 0.7)

So: Unless I'm missing something (wouldn't be the first time) it's a bug in the routine after all, and not so subtle at that. See Footnote.

I mean, the fact that it didn't show up with my compiler and it did for yours for your particular case requires a little head-scratching, but the bug itself is real.



Regards,

Dave

Footnote: In my copy of deviates.h, inside the dev() function for Binomaldev, line 219 is

if (k < 0) continue;


Shouldn't it be something like

if ((k < 0) || (k >n)) continue;


This is in the branch where swch == 2. The variable swch was set to 2 in the initializer for n > 64 and n*p > .3 and n*p < .7

stephan80
09-28-2009, 04:51 PM
Hi Dave,

I exactly agree with your point about p=0.3. I also learned how these switches change between different methods of getting the deviate. I do not get a single error, if np<30 (switch 1). So picking p=0.3 was just very unfortunate from me to pick since for the reasons you pointed out it might not be reproducible. The routine seems to work with switches 1 and 0. Only if N>64 AND Np>30, i.e. in switch 2, there seems to be a bug.

So as I understand you can confirm that N=100 and p=0.5 gives invalid values, right? If this is the case, there should be a bug. I will try out your idea about this last if-statement later. I am currently reading part of chapter 7 in the book that explains the ratio-of-uniform method. This is the method connected with switch 2. But maybe this little if-statement is exactly the problem.

Thanks a lot already for your detective work. I'll post whether the if-statement solves it for me.

Best,
Stephan

davekw7x
09-28-2009, 06:10 PM
...So picking p=0.3 was just very unfortunate I think that it was very fortunate to have found the bug at a preliminary stage of your project. In fact, for your value of n, any value between 0.3 and 0.7 can give invalid values.

you can confirm that N=100 and p=0.5 gives invalid values, right?Yes. Since seeing your posts, I have tested a little more and I observed invalid outputs for various values of n > 64 (including n = 100) and for various values of p between 0.3 and 0.7 (including p = 0.5), as I expected after looking at the details of the code.

Changing the conditional to reject values greater than n as well as rejecting values less than zero is the fix, as far as I am concerned. I will leave it the the authors to confirm the bug and approve the fix. (I don't work here, you know; I just come here to learn.)


Regards,

Dave

stephan80
09-29-2009, 01:37 AM
In fact, for your value of n, any value between 0.3 and 0.7 can give invalid values.

I agree. Thanks for confirming that.


Changing the conditional to reject values greater than n as well as rejecting values less than zero is the fix, as far as I am concerned.

This works for me as well... although I want to double check whether the resulting histogram of deviates is still binomial :-)


(I don't work here, you know; I just come here to learn.)


I know I know, so do I...
should I do anything more about this bug, are the authors reading this?

Regards,

Stephan

Bill Press
09-29-2009, 04:14 PM
Yes, the authors do read the Forum!
Yes, you are both absolutely correct as to the bug and the fix. In Binomialdev::dev(), the line
if (k < 0) continue;
should be replaced by
if (k < 0 || k > n) continue;
Mathematically, this fix changes the returned distribution very slightly. I've verified that in 10^9 calls on the worst case (that I can find) the deviation from the true distribution is statistically undetectable. This is a serious bug, because it returns values that are not only wrong, but unexpected. Thanks for finding it!

Cheers,
Bill P.

stephan80
09-30-2009, 04:38 AM
OK, thanks for reacting quickly.

Regards,
Stephan