Informatics and Machine Learning. Stephen Winters-Hilt
Чтение книги онлайн.
Читать онлайн книгу Informatics and Machine Learning - Stephen Winters-Hilt страница 22
2.5 Statistics, Conditional Probability, and Bayes' Rule
So far we have counts and probabilities, but what of the probability of X when you know Y has occurred (where X is dependent on Y)? How to account for a greater state of knowledge? It turns out the answer to this was not put on a formal mathematical footing until half way thru the twentieth century, with the Cox derivation [101] .
2.5.1 The Calculus of Conditional Probabilities: The Cox Derivation
The rules of probability, including those describing conditional probabilities, can be obtained using an elegant derivation by Cox [101] . The Cox derivation uses the rules of logic (Boolean algebra) and two simple assumptions. The first assumption is in terms of “b|a,” where b|a ≡ “likelihood” of proposition b when proposition a is known to be true. (The interpretation of “likelihood” as “probability” will fall out of the derivation.) The first assumption is that likelihood c‐and‐b|a is determined by a function of the likelihoods b|a and c|b‐and‐a:
(Assumption 1) c‐and‐b|a = F(c|b‐and‐a, b|a),
for some function F. Consistency with the Boolean algebra then restricts F such that (Assumption 1) reduces to:
where f is a function of one variable and C is a constant. For the trivial choice of function and constant there is:
which is the conventional rule for conditional probabilities (and c‐and‐b|a is rewritten as p(c,b|a), etc.). The second assumption relates the likelihoods of propositions b and ~b when the proposition a is known to be true:
(Assumption 2) ~b|a = S(b|a),
for some function S. Consistency with the Boolean algebra of propositions then forces two relations on S:
which together can be solved to give:
where m is an arbitrary constant. For m = 1 we obtain the relation p(b|a) + p(~b|a) = 1, the ordinary rule for probabilities. In general, the conventions for Assumption 1 can be matched to those on Assumption 2, such that the likelihood relations reduce to the conventional relations on probabilities. Note: conditional probability relationships can be grouped:
to obtain the classic Bayes Theorem.
2.5.2 Bayes' Rule
The derivation of Bayes’ rule is obtained from the property of conditional probability:
Bayes' Rule provides an update rule for probability distributions in response to observed information. Terminology:
p(xi ) is referred to as the “prior distribution on X” in this context.
p(xi ∣ yj ) is referred to as the “posterior distribution on X given Y.”
2.5.3 Estimation Based on Maximal Conditional Probabilities
There are two ways to do an estimation given a conditional problem. The first is to seek a maximal probability based on the optimal choice of outcome (maximum a posteriori [MAP]), versus a maximal probability (referred to as a “likelihood” in this context) given choice of conditioning (maximum likelihood [ML]).