Bug in Contingency Table Analysis Code


dvavra
10-24-2007, 04:57 PM
Well, maybe BUG is too strong a word. The code I have (2.10) doesn't properly deal with extreme data entry and can divide by zero, etc. I don't have the 3.0 code but, since the 2.10 problems have been around so long, I'll bet they weren't fixed in version 3. If these have been fixed -- GREAT! and forget I said anything.

For example:

cntab2 (contingency association) fails if the contingency table has complete correlation. Take the following 2x2 table:

15000 0
0 15000

fails because two of the values compute to p=0.0 and cntab2 tries to compute ln(0).

Both cntab1 and cntab2 can fail if the table is completely empty, that is, total sum = 0.0 because it attempts to divide by 0.0.

cntab1 also fails if chisq=0.0.

One solution is to add TINY to statements containing log (like log(p+TINY)) and any division for example, x/(sum+TINY). A slightly more accurate answer would be to never add TINY and avoid the division if sum==0.


DAV

dvavra
10-24-2007, 06:00 PM
I take back what I posted earlier. I was looking at code that someone else had previously modified for whatever reason. I finally compared it against the book. The cntab2 code from the book looks correct. It has guard statements around the log(p) statements and guards against division by zero.

If df is zero in cntab1, the gammaq function calls nrerror which prevents execution of the sqrt(chisq /(chisq+sum)) code. So this is caught and chisq==0 is caught in gser. The cntab1 code apparently was modified to avoid the nrerror call. Somewhat understandable -- a chisq of zero is highly unusual but shouldn't result in an error.