Hidden Markov Model (HMM) 的用途包羅萬象,including communication, speech recognition, reinforcement learning, bioinformatics. 只要是 (discrete state)+(random process) 有關的問題,都可以發現 Markov Model or Hidden Markov Model.
HMM 包含兩個部分:hidden 和 Markov Model, 定義如下:
Let and
be discrete-time stochastic processes. The pair of
is a hidden markov model if
is a Markov process and is NOT directly observable ("hidden")
is observable and
.
Markov Process (Chain)
第一 為 Markov process. 同樣的定義是
直觀的例子:預測明天的天氣,只要知道今天天氣就有足夠的資訊。就算再多告訴我昨天天氣,或是過去五天的天氣,也沒有提供額外的資訊預測明天的天氣。
注意這並不是說明天的天氣只和今天有關,和昨天或過去無關。也就是
正確的作法
也就是明天的天氣和今天,昨天都有關。但是我們只要知道今天的天氣,就不再需要昨天的天氣。
Transition Probability Matrix
一般會定義 , 從 i 到 j 的機率,可以得到 transition probability matrix,
$latex P=\left[\begin{array}{cccccc}
P_{1,1} & P_{1,2} & \ldots & P_{1, j} & \ldots & P_{1, S} \\
P_{2,1} & P_{2,2} & \ldots & P_{2, j} & \ldots & P_{2, S} \\
\vdots & \vdots & \ddots & \vdots & \ddots & \vdots \\
P_{i, 1} & P_{i, 2} & \ldots & P_{i, j} & \ldots & P_{i, S} \\
\vdots & \vdots & \ddots & \vdots & \ddots & \vdots \\
P_{S, 1} & P_{S, 2} & \ldots & P_{S, j} & \ldots & P_{S, S}
\end{array}\right] $
and
[@wikiMarkovChain2020]
Time-Homogeneous Markov Chain (or Stationary Markov Chain)
for all t.
- 也就是
和 t 無關。
結合上面兩個性質,可以推出:
- … 依次類推
另外 .
定義 and
, with
可以得到
- 假設
is non-singular matrix, 可以做 eigendecomposition
, and the eigenvalues
where
is the initial distribution.
- Let
where
is the i-th eigenvector corresponding
.
with
whenand
is the normalization constant.
Stationary Distribution and Eigenvectors
Markov chain 分為兩大類:
-
Markov chain with absorbing state, 重點是 transient behavior, 從 initial state 走多少步進入 absorbing state. 常用在 reinforcement learning.
- Assuming state i is the absorbing state (sink), then
這會讓
= [ 0,..,0, 1, 0, ..0 ], only the i-th column is 1 column vector, the rests are 0 column vectors. 對應的 eigenvalue=1 的 eigenvector=[1,0, ..].
- How about source state, i.e.
? 直接變成 INIT state (見下例). 如果不是 INIT state 可以直接忽略這個 state.
- Assuming state i is the absorbing state (sink), then
-
Markov chain without absorbing state, 重點是 initial state effect 都消失而進入 stationary distribution (不是 steady state, 因為 MC 會在所有 state 徘徊)。Markov chain stationary distribution 常用於 queue theory, 排隊理論。排隊的人數對應 state of
, 可以看的數的清清楚楚。這和後面的 Hidden Markov Model 最大的不同是 state 是 invisible or hidden. 需要透過 observable
推論。
with
is the normalized eigenvector of
corresponding eigenvalue=1
Hidden Markov Model
Hidden Markov Model 把 Markov chain 提升一整個層次。數學上很簡單, 變成 hidden or invisible, 但透過 observable
可以得到
的 information, i.e.
assuming time homogeneous. 這個機率稱為 emission probability,一般用 B matrix 表示如下圖, where
. 直接可以得到
, 用矩陣表示:
就是
的機率分佈。[@wikiHiddenMarkov2020]
另一個常用的 backward probability using Bayisian formula:

我們看一個實際的例子:[@burlandoHiddenMarkov2019]

可以用 Python code 描述:
observations = ('PAINT', 'CLEAN', 'SHOP', 'BIKE')
start_probability = {'Rainy': 0.4, 'Sunny': 0.6}
transition_probability = {
'Rainy' : {'Rainy': 0.6, 'Sunny': 0.4},
'Sunny' : {'Rainy': 0.2, 'Sunny': 0.8},
}
emission_probability = {
'Rainy' : {'PAINT': 0.3, 'CLEAN': 0.45, 'SHOP': 0.2, 'BIKE': 0.05},
'Sunny' : {'PAINT': 0.4, 'CLEAN': 0.1, 'SHOP': 0.2, 'BIKE': 0.3},
}
$latex P=\left[\begin{array}{cccccc}
0.6 & 0.4 \\
0.2 & 0.8
\end{array}\right]\quad $ and $latex \quad B=\left[\begin{array}{cccccc}
0.3 & 0.45 & 0.2 & 0.05 \\
0.4 & 0.1 & 0.2 & 0.3
\end{array}\right] $
先看 Markov chain 的部分,從 P 可以得到 stationary distribution from eigen-vectors.
$latex P=\left[\begin{array}{cccccc}
-0.894 & -0.707 \\
0.447 & -0.707
\end{array}\right]
\left[\begin{array}{cccccc}
0.4 & 0 \\
0 & 1
\end{array}\right]
\left[\begin{array}{cccccc}
-0.745 & 0.745 \\
-0.471 & -0.943
\end{array}\right] $
then
$latex \left[\begin{array}{cccccc}
-0.745 & 0.745 \\
-0.471 & -0.943
\end{array}\right]
P =
\left[\begin{array}{cccccc}
0.4 & 0 \\
0 & 1
\end{array}\right]
\left[\begin{array}{cccccc}
-0.745 & 0.745 \\
-0.471 & -0.943
\end{array}\right] $
then
$latex \left[\begin{array}{cccccc}
-0.745 & 0.745
\end{array}\right]
P = 0.4
\left[\begin{array}{cccccc}
-0.745 & 0.745
\end{array}\right] $ eigenvector with eigenvalue = 0.4 (控制 initial state 消失的速度)
$latex \left[\begin{array}{cccccc}
-0.471 & -0.943
\end{array}\right]
P =
\left[\begin{array}{cccccc}
-0.471 & -0.943
\end{array}\right] $ eigenvector with eigenvalue = 1 (use for stationary distribution)
After normalization
$latex \left[\begin{array}{cccccc}
1/3 & 2/3
\end{array}\right]
P =
\left[\begin{array}{cccccc}
1/3 & 2/3
\end{array}\right] $
The initial distribution: .
The stationary distribution after the long time: independent of initial distribution.
再看如何從 output 推論 state, 重要因為 HMM 只能從 output 推論 state.
重點是如何和實際問題關聯,有三種應用情境:[@jurafskyHIddenMarkov2019]
- The Likelihood problem (forward and backward algorithm)
- The Decoding problem (viterbi decoding algorithm)
- The Learning problem (EM algorithm)
Likelihood Problem
假設 observation sequence 是: O = { PAINT, CLEAN, SHOP, BIKE }. What is the likelihood that this observation sequence O can derive from the above HMM , i.e.
? 到底什麼是
, an initial state, a known state sequence? or partial state, e.g. the first day is Sunny.
What is the difference between likelihood and (conditional) probability? NO! is parameter in likelihood problem. 這裡就是指
matrices.
Wrong Forward algorithm
In any case, 我們可以用 forward algorithm 計算 likelihood.
The following is the wrong interpretation!, 應該用 recursion instead of direct calculation!
, and
is the observable probability at time t.
以 O = { PAINT, CLEAN, SHOP, BIKE } 為例:先算出 ,
,
,
,
從 , 可以得到
但這不是我們要的!
我們要找出
where : PAINT,
:CLEAN,
:SHOP,
:BIKE.
正確的方法必須要用 recursion!!
Correct Forward algorithm
整體的邏輯:
分為三步 [@burlandoHiddenMarkov2019]
- Initialization:
- Recursion:
- Termination
Initialization
如下圖:
也就是 where
上式可以用 conditional probability 展開:
我們看實際的例子:
- i = "sunny":
- i = "rainy":
Why break ? 因為
可以變成 recursive!

Recursion
這和我之前想的完全不一樣。正確作法:我們先從 開始。
From to
:
Why break ? 因為
可以變成 recursive!
where
第一項可以忽略
想辦法做出 recursion!
中間項可以忽略
這就是我們要的結果: where
是否能更進一步,得到: NO!!!
無法簡化成
再看實際的例子:
注意 同時包含
和
information.

另一路是:

From to
:
where
第一項可以忽略
想辦法做出 recursion!
中間項可以忽略
The recursion is summarized below:
. 為什麼要分解成
?
- 因為
無法變成 recursive, 但是
可以變成 recursive!
- Forward algorithm 結案! 所以

Backward algorithm
Usually, to find the solution to the Likelihood problem we don’t really require to know the Backward Algorithm. However, its explanation and resolution is a good litmus test to show that the Forward algorithm works proficiently, and moreover, by understanding it now, we can be ready for when the time will come to use it for the resolution of the third problem of Learning.
The Backward Algorithm is similar to the Forward algorithm, but like the name suggests it goes backward in time. There’s again an Initialization, a Recursion and a Termination.
Initialization
What this simple equation is telling us is that at time T (at the end of the observation sequence) the backward variables of every state is equal to 1. Simple as that.

如下圖:
也就是 where
上式可以用 conditional probability 展開:
我們看實際的例子:
- i = "sunny":
- i = "rainy":
Recursion
From
to 
Given
可以忽略
w.r.t


From
to 
忽略第一項的
忽略第一項的
Termination
是 conditional probability:
.
加上
不影響
例如:
and
consistent with the previous result.

Conclusion
比較 forward 和 backward
where
有幾點不同:
-
and
都有 recursive formula.
-
是 joint probability:
-
是 conditional probability:
. 注意這只適用
也就是最多到
-
Backward algorithm 最後一步比較曲折
-
如何計算
? 可以用前面的
得到。 如果是
-
For example
consistent with the previous computed results below:
-
-
One More Thing
上述 和
可以用 recursion 計算,這是非常棒的結果。每次多一個 observable
只需要 incremental 加上 extra term, 可以 reuse 之前的結果,不用每次從頭算起。節省計算量和儲存量。
這只是前菜,更重要的是在後面討論 Viterbi decoding, a special case of dynamic programming, recursion 可以大幅降低 decoding 的複雜度。這對 error correct code decoding, speech recognition, etc. 的實用,是絕對的關鍵!
如果你只想到這裡,只能說缺乏想像的勇氣。以下應該是一個 surprise!
上式有用之處在於 divide-and-conquer! 可以把計算 切成兩段。一段從頭
計算,另一段從尾
計算。兩段可以獨立平行計算,不受 recursion 限制,最後用上式整合!
如何證明?利用:
and
Let ,
, and

再回到之前的例子


實際可以同時計算 和
, 結合
and
能最快得到完整 HMM 的 likelihood.
New Exploration (To Be Complete)
Q1: HMM likelihood 可以分多段嗎?e.g. ,
,
A1: Let and
where
Q2: HMM likelihood 中間段如何計算?
A2:
This is forward path:
This is backward path:
this is backward path
The forward path
where vector
from
to
Q3: Use Radix 4 is better than multiple segments! when the state number is small?
Q4: Kalman filter
Q5: Dynamic programming!!
Reference
Burlando, Maria. n.d. “Hidden Markov Models 1: The Likelihood Problem.”
Medium. Accessed September 17, 2020.
https://medium.com/@Ayra_Lux/hidden-markov-models-part-1-the-likelihood-problem-8dd1066a784e.
Wiki. 2020a. “Hidden Markov Model.” Wikipedia.
https://en.wikipedia.org/w/index.php?title=Hidden_Markov_model&oldid=977869964.
———. 2020b. “Markov Chain.” Wikipedia.
https://en.wikipedia.org/w/index.php?title=Markov_chain&oldid=978157011.