Chapter 16: Bug in HMM forwardbackward?


SimonG
10-17-2013, 01:04 PM
Dear NR,

Here's what looks to me like a serious bug in the HMM forwardbackward() function. Sorry if it has been reported or if I'm missing something obvious. I didn't try the NR code, I just recoded following the text in python, but it seems like the bug is in the textbook code and text. It's all fixed in my code, but I thought I'd report it in case it's useful to folks : )

Expected behavior: forwardbackward() returns the posterior probability of state i given observations.

Observed behavior: The posterior of the most likely states is lower than expected near the beginning of the sequence.

Interpretation:
In the conventional expression of the forward-backward algorithm, the forward quantities (\alpha) and the backward quantities (\beta) have a slightly different interpretation: usually, one of them is P(y_{left}|state), the other is P(y_{right}, state), the difference in conditioning provides us with an extra Bayes factor: the product gives us

P(y_{left}|state) P(y_{right}, state) = P(y_{left}|state) P(y_{right}|state) P(state) =P(y|state)P(state) \propto P(state|y),

which is what we want

In the text, both alpha and beta are defined as conditional probabilities:

\alpha=P(y_{left}|state),
\beta=P(y_{right}| state),
so there should be a missing Bayes factor in 16.3.18. That should have a large effect, so why did it not come up before?

The recursion given in the text (16.3.14) is actually for the alternate definition \alpha=P(y_{left}, state), rather than that for the alpha defined in the text. To see this, notice how the recursions in 16.3.14 and 16.3.16 are not symmetrical in the indices of the A if we swap alpha and beta and the direction of time.

So it looks like the errors canceled out, but there's still one issue: the initialization of the \alpha is done as for \alpha=P(y_{left}|state), rather than \alpha=P(y_{left}, state) (i.e., we're missing a P(state) in the initialization).

The effect will be important if you have states that differ strongly in their equilibrium probabilities, and short sequences (since the initialization will only be important along the boundary). Unfortunately, I can attest that such situations do occur in practical situations...

Cheers,

Simon Gravel