Abstract
This is the supplementaryR
file for Hidden Markov Model in
the lecture note “HMM”.
Suppose we have a dishonest casino that may use two dice. One of them is a fair die, and the other one is a loaded die that has a higher chance to roll a 6. We use a ``emission probability matrix’’ \(\mathbf{B}\) to describe these probabilities.
\[ \mathbf{B} = \begin{pmatrix} 1/6 & 1/6 & 1/6 & 1/6 & 1/6 & 1/6 \\ 0.1 & 0.1 & 0.1 & 0.1 & 0.1 & 0.5 \end{pmatrix} \] When a particular die is used, the dealer will have a certain probability to switch to the other die, or keep using the current die. Suppose the probability to stay on a fair die is 0.98 and the probability to stay on a loaded die is 0.95. We have the following ``transition probability matrix’’ \(\mathbf{A}\):
\[ \mathbf{A} = \begin{pmatrix} 0.98 & 0.02 \\ 0.05 & 0.95 \end{pmatrix} \] Based on this mechanisim, we may observe the outcome of these rolls, denoted as \((X_1, X_2, \ldots, X_n)\). However, the underlying states (which die was used) was not observed, and denote them as \((Z_1, Z_2, \ldots, Z_n)\). Of course, we do not know \(\mathbf{A}\) and \(\mathbf{B}\) either. It is assumed that we have the memoryless \[p(Z_t | Z_{t-1}, \ldots, Z_1),\] and the stationary \[p(Z_t | Z_{t-1}) = p(Z_2 | Z_1),\] for all \(t = 2, \ldots, n\). This discribes a typical version of a hidden markov model. The goal is to infer about the unosbserved hidden states once we observe the rolls. A simple statistical approach is to maximize the log-likelihood
\[\log \big[ p( \mathbf{x} | \boldsymbol \theta ) \big] = \log \Big[ \sum_\mathbf{z} p( \mathbf{z} | \boldsymbol \theta ) p( \mathbf{x} | \mathbf{z}, \boldsymbol \theta ) \Big] \] where \(\mathbf{x}\) is the observed vector of rolls, and \(\boldsymbol \theta\) is a collection of all unknown parameters, including \(\mathbf{A}\) \(\mathbf{B}\), and the probability of using either die at the first stage, denoted as \(w\). However, this likelihood is very difficult to solve because we do not have \(\mathbf{z}\) and there are a lot of summations to calculate. Instead, we use the EM algorithem introduced previously to iteratively updating the estimator of \(\boldsymbol \theta\). Furthermore, we will incoorperate the Baum-Welch algorithm and the Viterbi algorithm. Both will be introduced during the lecture.
This example is taken from an online source here. We will first generate 2000 observations, following the underlying model described above:
require(HMM)
set.seed(2)
nSim = 2000
States = c("Fair", "Unfair")
Symbols = 1:6
transProbs = matrix(c(0.98, 0.02, 0.05, 0.95), c(length(States), length(States)), byrow = TRUE)
emissionProbs = matrix(c(rep(1/6, 6), c(rep(0.1, 5), 0.5)), c(length(States), length(Symbols)), byrow = TRUE)
hmm = initHMM(States, Symbols, transProbs = transProbs, emissionProbs = emissionProbs)
sim = simHMM(hmm, nSim)
The following code will solve for the parameter estimates using a forward-backward algorithm and use the Viterbi algorithm to infer about the most probable state.
vit = viterbi(hmm, sim$observation)
f = forward(hmm, sim$observation)
b = backward(hmm, sim$observation)
i <- f[1, nSim]
j <- f[2, nSim]
probObservations = (i + log(1 + exp(j - i)))
posterior = exp((f + b) - probObservations)
x = list(hmm = hmm, sim = sim, vit = vit, posterior = posterior)
We plot the results:
# plot the observed states
plot(x$sim$observation, ylim = c(-7.5, 6), pch = 3, main = "Fair and unfair die",
xlab = "", ylab = "Throw nr.", bty = "n", yaxt = "n")
axis(2, at = 1:6)
#######Simulated, which die was used (truth)####################
text(0, -1.2, adj = 0, cex = 1.2, col = "black", "Truth: green = fair die; red = loaded die")
for (i in 1:nSim) {
if (x$sim$states[i] == "Fair")
rect(i, -1, i + 1, 0, col = "green", border = NA)
else rect(i, -1, i + 1, 0, col = "red", border = NA)
}
######## Most probable path (viterbi)#######################
text(0, -3.2, adj = 0, cex = 1.2, col = "black", "Most probable path")
for (i in 1:nSim) {
if (x$vit[i] == "Fair")
rect(i, -3, i + 1, -2, col = "green", border = NA)
else rect(i, -3, i + 1, -2, col = "red", border = NA)
}
##################Differences:
text(0, -5.2, adj = 0, cex = 1.2, col = "black", "Difference")
differing = !(x$sim$states == x$vit)
for (i in 1:nSim) {
if (differing[i])
rect(i, -5, i + 1, -4, col = rgb(0.3, 0.3, 0.3),
border = NA)
else rect(i, -5, i + 1, -4, col = rgb(0.9, 0.9, 0.9),
border = NA)
}
#################Posterior-probability:#########################
points(x$posterior[2, ] - 3, type = "l")
###############Difference with classification by posterior-probability:############
text(0, -7.2, adj = 0, cex = 1.2, col = "black", "Difference by posterior-probability")
differing = !(x$sim$states == x$vit)
for (i in 1:nSim) {
if (posterior[1, i] > 0.5) {
if (x$sim$states[i] == "Fair")
rect(i, -7, i + 1, -6, col = rgb(0.9, 0.9, 0.9),
border = NA)
else rect(i, -7, i + 1, -6, col = rgb(0.3, 0.3, 0.3),
border = NA)
}
else {
if (x$sim$states[i] == "Unfair")
rect(i, -7, i + 1, -6, col = rgb(0.9, 0.9, 0.9),
border = NA)
else rect(i, -7, i + 1, -6, col = rgb(0.3, 0.3, 0.3),
border = NA)
}
}