What Did Bayes Really Say? — Endnotes

Introduction | Problem and Definitions | Propositions 1 – 7 | Bayes’s Billiards | Endnotes | References

Endnotes

#1 About forming the possessive of singular nouns

If you think I should be forming the possessive of our author’s surname in some way other than “Bayes’s”, read Strunk and White, Page 1, Rule 1.

Strunk W, White EB. *The elements of style.* 3d ed. New York: Macmillan; 1979. PAGE 1.

Have things changed? Not according to Benjamin Dreyer, Copy Chief at Random House, who wrote this in 2019:

… you’ll save yourself a lot of thinking time by not thinking about those s’s and just applying them. I’d even urge you to set aside the Traditional Exceptions for Antiquity and/or being the Son of God and go with:

Socrates’s,

Aeschylus’s,

Jesus’s

Dreyer B. Dreyer’s English: an utterly correct guide to clarity and style. First edition. New York: Random House; 2019. p. 39.

And what did Richard Price write in 1763, when he wanted to take over with his abridgement and end the part of the essay written by Bayes?

Return to article.

#2 Utility vs. monetary value

Bayes may not have known that, in a 1738 essay, Daniel Bernoulli distinguished between price and utility:

the value of an item must not be based on its price, but rather on the utility it yields. The price of the item is dependent only on the thing itself and is equal for everyone; the utility, however, is dependent on the particular circumstances of the person making the estimate. Thus there is no doubt that a gain of one thousand ducats is more significant to a pauper than to a rich man though both gain the same amount.

Daniel Bernoulli, Exposition of a New Theory on the Measurement of Risk. Papers of the Imperial Academy ofSciences in St. Petersburg, 1738.

Bayes doesn’t bother with this distinction between monetary values and utility. If he did, he would equate “value” with utility.

Return to article.

3 Alternative explanation of Prop. 4

Dispense with the awkward $N$ by setting it equal to 1. Now, $P(B) = b/N = b$ and $P(B \cap A) =$ P/N = P, also let $d_{B}$ be the first day that $B$ occurs.

\begin{align*} P(d_{B} = 1) &= b\\ P(d_{B} = 2) &= (1-b)b\\ P(d_{B} = 3) &= (1-b)^2b\\ P(d_{B} = i) &= (1-b)^{i-1}b \end{align*}

This is the “First Success” distribution.
The probability that $i$ is the first day that $B$ occurs and $A$ also occurs on that day is

$$P(A \cap d_{B} = i) = (1-b)^{i-1} \textbf{P} $$

To get $P(W)$ , the probability of receiving N, sum over all $i$ .

$$P(W) = \sum_{i=1}^\infty (1-b)^{i-1} \textbf{P}$$ $$P(W) = \frac{\textbf{P}}{b}$$

Return to article.

#4 More on the Odds Form of Bayes’s Rule

Here, again, is the odds form of Bayes’s Rule:

\begin{align*} \frac{P(A|B)}{P(A^c|B)} &= \frac{P(B|A)}{P(B|A^c)}\frac{P(A)}{P(A^c)} \\ Odds(A|B) &= LR_A(B) \times Odds(A) \end{align*}

We can convert the multiplication into addition by taking the (base 10) logarithm:

$$\log{Odds(A|B)} = \log{LR_A(B)} + \log{Odds(A)}$$

Jaynes (page 91) multiplies through by 10 to get

$$10\log{Odds(A|B)} = 10\log{LR_A(B)} + 10\log{Odds(A)}$$

He denotes $10\log{Odds(A)}$ as $e(A)$ and $10\log{Odds(A|B)}$ as $e(A|B)$ , so

$$e(A|B) = 10\log{LR_A(B)} + e(A)$$

Jaynes’s units for $e(A)$ are decibels, abbreviated db. A 1 db increase in $e(A)$ corresponds to multiplying $Odds(A)$ by $10^{0.1} \approx 1.26$ . If $Odds(A)$ is a small number (e.g., $<0.1$ ), then a 1 db increase in $e(A)$ also corresponds to multiplying $P(A)$ by 1.26. When $Odds(A)$ is a large number, a 1 db increase in $e(A)$ corresponds to a much smaller multiple of $P(A)$ . According to Jaynes (p. 93), “a 1 db change in [e(A)] is about the smallest increment in plausability that is perceptible to our intuition.”

During his code breaking work in World War II, Alan Turing expressed plausibility using the same quantity, $10\log{Odds}$ , and called the units “decibans” instead of decibels. (Jaynes, p. 116)

The likelihood ratio for $A$ of $B$ is the ratio of posterior odds to prior odds,

$$ LR_A(B) = \frac{Odds(A|B)}{Odds(A)} .$$

But it is not the “Odds Ratio”. The Odds Ratio for $A$ with respect to $B$ , $OR_A(B)$ , is the ratio of posterior odds of $A$ given $B$ to posterior odds of $A$ given $B^c$ :

$$OR_A(B) = \frac{Odds(A|B)}{Odds(A|B^c)} .$$

$LR_A(B)$ and $OR_A(B)$ have the same numerator but different denominators.