The following is a simple example of the principle of maximum entropy. It is an excerpt from our new book: Information Theory: A Concise Introduction. It shows how to find a probability distribution that is consistent with the known facts. The correct distribution maximizes the entropy and introduces no additional assumptions.

Sparky is a tattoo artist who offers three different tattoos priced at 8, 12, and 16 dollars. At the end of the week he knows how many tattoos he did in total and how much money he made but he forgot to keep track of how many of the three different tattoos were sold. He asks Spike, his mathematician friend, to help him figure it out. Taking the total amount Sparky made, \(A\), and dividing by the number of tattoos, \(N\), gives Spike the average cost of a tattoo, \(a=A/N\). Letting \(p_1\), \(p_2\), and \(p_3\) be the probabilities of the 8, 12, and 16 dollar tattoos respectively, he can then set up the following equation for the average cost of a tattoo.

\[8p_1 + 12p_2 + 16p_3 = a\]

He gets another equation from the fact that the probabilities must sum to \(1\).

\[p_1 + p_2 + p_3 = 1\]

Now Spike has two equations with three unknowns. There are many possible solutions. How can he find the correct one? He decides to use the distribution that maximizes the entropy which is given by

\[H = -p_1\log(p_1)-p_2\log(p_2)-p_3\log(p_3)\]

Using the two previous equations he can write \(p_2\) and \(p_3\) as follows

\[p_2 = 4 - \frac{a}{4} - 2 p_{1}\] \[p_3 = \frac{a}{4} - 3 + p_{1}\]

Substituting these into the entropy equation gives him an expression for the entropy in terms of the probability \(p_1\). Next he finds the maximum of \(H(p_1)\) by taking the derivative with respect to \(p_1\), setting the result equal to zero and solving for \(p_1\). Checking to make sure he has a maximum and not a minimum, he gets the following expression for the value of \(p_1\) that maximizes the entropy.

\[p_1(a) = \frac{52-3a-\sqrt{64-3(a-12)^2}}{24}\]

The value of \(a\) must be in the range \([8,12]\). At \(a=8\) all the tattoos must have been the 8 dollar tattoo and at \(a=16\) they must all have been the 16 dollar tattoo. Checking, he gets \(p_1(8)=1\) and \(p_1(16)=0\) which is correct. A plot of the three probabilities as a function of \(a\) is shown in the figure below.

The constraints \(p_1(8)=1\) and \(p_1(16)=0\) could have been inferred without using the maximum entropy principle so that a reasonable assumption is \(p_{1}(a)=(16-a)/8\). For \(a=12\), this would result in the probabilities \(p_{1}(12)=1/2\), \(p_{2}(12)=0\), and \(p_{3}(12)=1/2\). The entropy would then be \(H=1\). Using maximum entropy we get \(p_{1}(12)=p_{2}(12)=p_{3}(12)=1/3\) and the entropy is \(H=\log(3)=1.5849625\ldots\). The higher entropy is a result of making fewer assumptions, i.e. no 12 dollar tattoos were sold.

© 2010-2015 Stefan Hollos and Richard Hollos

Tweet

blog comments powered by Disqus