homolo.gy - The uncertainty function H(X)

The uncertainty function is an important function in both Information Theory and Coding Theory that determines composition of codes for efficiently transmitting data with a low probability of error. “Information” in Information Theory does not mean the content of the transmitted message–rather, it means the certainty you gain when the message is actually transmitted. For example, if you know what the message is before it’s transmitted, you gain no information. The uncertainty function is a mathematical measure of the information gained by a given message. Remember that in this context, the \(X\) represents a set of possibilities, not a single value. There are four axioms for an uncertainty function, and they are as follows:

\[ H(\frac{1}{M}, \cdots, \frac{1}{M}) = f(M) \]

where \(f(M)\) is a monotonically increasing function of \(M\) (\(M = 1, 2, \cdots\)). This means that as the domain of possible values increases, the uncertainty of the value should also increase.

\[ f(ML) = f(M) + f(L) \]

This means that as the uncertainty of two independent values together should be equal to the sum of the uncertainty of each value separately.

\[ \begin{align*} H(p_1, \ldots, p_M) &= H(\sum_{i=1}^{r} p_i, \sum_{i=r+1}^{M} p_i) \\ &+ \sum_{i=1}^{r} p_i H(\frac{p_1}{\sum_{i=1}^{r} p_i}, \ldots, \frac{p_r}{\sum_{i=1}^{r} p_i}) \\ &+ \sum_{i=r+1}^{M} p_i H(\frac{p_r+1}{\sum_{i=r+1}^{M} p_i}, \ldots, \frac{p_M}{\sum_{i=r+1}^{M} p_i}) \end{align*} \]

This means that if you split X into two subgroups, then the uncertainty of X should be equal to the uncertainty of both subgroups, where each subgroup is multiplied by the sum of probabilities it represents.

\[ H(p, 1 - p) \,\, \text{is continuous} \]

This is for mathematical convenience when finding the uncertainty function. So, given these four axioms, it turns out that the only function satisfying it is:

\[ H(p_1, \cdots, p_M) = -C \sum_{i=1}^M p_i \log_b{p_i} \]

where \(C\) is any positive number and and \(b > 1\). Unfortunately I’ve yet to understand the proof behind this, so I’ll leave that for another day.