Goodness of fit meaning in 15.2 - repost


sda
06-17-2010, 08:23 AM
Having now obtained a third edition version of the book and finding my same difficulty in understanding, I am reposting (with the hope that greater visibility here may lead to a response) questions posed two months ago in the obsolete editions section of this forum.
----
I am trying to understand the meaning of the goodness-of-fit estimate in section 15.2, fitting data to a straight line. We compute a and b to minimize the chi-square in 15.2.2. Then I read the following: "The probability Q that a value of chi-square as poor as the value (15.2.2) should occur by chance is Q = ...."

I interpret "as poor as" to mean "as large as" or "as bad as". Since the chi-square is to be minimized, I would think this should say "as good as" since a random pair of a and b would likely result in a much larger chi-square value. However, the authors' emphasis placed on the word "poor" makes me think I'm missing something.

The text goes on to say that Q > 0.1 indicates a good fit, that Q < 0.001 should call the validity of the model into question. This also seems backwards to me since a high probability (Q near 1) of the chi-square value occurring by chance indicates that randomly selected a and b would have served just as well. Similarly, a low probability of this chi-square occurring by chance sounds like a good thing, yet with smaller Q we are to have less, rather than more, confidence in the model(?).

I am hopeful that someone can explain this, perhaps in slightly different terms, and help me understand.

ichbin
06-26-2010, 05:21 PM
The text is correct, although the authors' exposition and choice of critical Q values is a bit non-standard. Here is my attempt to explain.

Under the "null hypothesis" that the data arises from the model, there is a known distribution of chi-squared values. The area below (to the left of) any particular comparison value is called P; the area above (to the right of) any particular value is called Q. Thus having an ununsually high chi-squared means having a low Q -- very little area remains to the right of your comparison point.

The most widely accepted critical value is Q=0.05 (i.e. P=0.95). If, under the model, you would expect a lower chi-squared 95% of the time, you reject the model. This criteria is most widely accepted in social and biomedical circles, where a "positive result" (the drug results are better than random) typically means rejecting the null hypothesis. In the physical sciences, where a "positive result" (the data follows my equation) typically means accepting the full hypothesis, I would still tend to call the model "rejected" for Q < 0.05, but I would only call it "confirmed" for Q >~ 0.1-0.2; for values in between I would call the experiment "inconclusive".

The NR authors' claim that Q~0.001 might be acceptable is a little strange, but I do get what they mean. Remember that, from the statistical viewpoint, "the model" isn't just the physical model, it's the physical model plus the error model (independent measurements, Gaussian errors). When the NR authors mean when they say that Q~0.001 might still be acceptable is that you should consider the possibility that your error model is wrong (e.g. you have underestimated your error bars, or the errors are Laplacian rather than Gaussian, so outliers are more common than you expected) but your physical model is right. If, on more careful examination of your data and your experiment, you are convinced that your error model is right, then you should definitely reject your physical model for Q~0.001.

It's also worth considering the meaning of unusually large Q values, i.e. unsusually small P values. If Q > 0.95 (i.e. P < 0.05), it may indicate a problem with your error model (e.g. you have over-estimated your error bars).