Informatics and Machine Learning. Stephen Winters-Hilt

Чтение книги онлайн.

Читать онлайн книгу Informatics and Machine Learning - Stephen Winters-Hilt страница 22

Informatics and Machine Learning - Stephen Winters-Hilt

Скачать книгу

upper L Endscripts left-parenthesis x Subscript i Baseline minus upper E left-parenthesis upper X right-parenthesis right-parenthesis squared p left-parenthesis x Subscript i Baseline right-parenthesis 2nd Row equals sigma-summation Underscript left-brace x Subscript i Baseline vertical-bar StartAbsoluteValue x Subscript i Baseline minus upper E left-parenthesis upper X right-parenthesis EndAbsoluteValue greater-than k right-brace Endscripts left-parenthesis x Subscript i Baseline minus upper E left-parenthesis upper X right-parenthesis right-parenthesis squared p left-parenthesis x Subscript i Baseline right-parenthesis 3rd Row plus sigma-summation Underscript left-brace x Subscript i Baseline vertical-bar StartAbsoluteValue x Subscript i Baseline minus upper E left-parenthesis upper X right-parenthesis EndAbsoluteValue less-than-or-equal-to k right-brace Endscripts left-parenthesis x Subscript i Baseline minus upper E left-parenthesis upper X right-parenthesis right-parenthesis squared p left-parenthesis x Subscript i Baseline right-parenthesis 4th Row greater-than-or-equal-to k squared upper P left-parenthesis StartAbsoluteValue upper X en-dash upper E left-parenthesis upper X right-parenthesis EndAbsoluteValue greater-than k right-parenthesis EndLayout"/>

      So far we have counts and probabilities, but what of the probability of X when you know Y has occurred (where X is dependent on Y)? How to account for a greater state of knowledge? It turns out the answer to this was not put on a formal mathematical footing until half way thru the twentieth century, with the Cox derivation [101] .

      2.5.1 The Calculus of Conditional Probabilities: The Cox Derivation

      The rules of probability, including those describing conditional probabilities, can be obtained using an elegant derivation by Cox [101] . The Cox derivation uses the rules of logic (Boolean algebra) and two simple assumptions. The first assumption is in terms of “b|a,” where b|a ≡ “likelihood” of proposition b when proposition a is known to be true. (The interpretation of “likelihood” as “probability” will fall out of the derivation.) The first assumption is that likelihood c‐and‐b|a is determined by a function of the likelihoods b|a and c|b‐and‐a:

      (Assumption 1) c‐and‐b|a = F(c|b‐and‐a, b|a),

      for some function F. Consistency with the Boolean algebra then restricts F such that (Assumption 1) reduces to:

italic upper C f left-parenthesis c hyphen and negative b bar a right-parenthesis equals f left-parenthesis c bar b hyphen and negative a right-parenthesis f left-parenthesis b bar a right-parenthesis

      where f is a function of one variable and C is a constant. For the trivial choice of function and constant there is:

p left-parenthesis c comma b bar a right-parenthesis equals p left-parenthesis c bar b comma a right-parenthesis p left-parenthesis b bar a right-parenthesis

      which is the conventional rule for conditional probabilities (and c‐and‐b|a is rewritten as p(c,b|a), etc.). The second assumption relates the likelihoods of propositions b and ~b when the proposition a is known to be true:

      (Assumption 2) ~b|a = S(b|a),

      for some function S. Consistency with the Boolean algebra of propositions then forces two relations on S:

upper S left-bracket upper S left-parenthesis x right-parenthesis right-bracket equals x and italic x upper S left-bracket upper S left-parenthesis y right-parenthesis slash x right-bracket equals italic y upper S left-bracket upper S left-parenthesis x right-parenthesis slash y right-bracket

      which together can be solved to give:

upper S left-parenthesis p right-parenthesis equals left-parenthesis 1 minus p Superscript m Baseline right-parenthesis Superscript 1 slash m p left-parenthesis b bar a right-parenthesis equals p left-parenthesis a bar b right-parenthesis p left-parenthesis b right-parenthesis slash p left-parenthesis a right-parenthesis

      to obtain the classic Bayes Theorem.

      2.5.2 Bayes' Rule

      The derivation of Bayes’ rule is obtained from the property of conditional probability:

p left-parenthesis x Subscript i Baseline comma y Subscript j Baseline right-parenthesis equals p left-parenthesis x Subscript i Baseline bar y Subscript j Baseline right-parenthesis p left-parenthesis y Subscript j Baseline right-parenthesis equals p left-parenthesis y Subscript j Baseline bar x Subscript i Baseline right-parenthesis p left-parenthesis x Subscript i Baseline right-parenthesis p left-parenthesis x Subscript i Baseline bar y Subscript j Baseline right-parenthesis equals p left-parenthesis y Subscript j Baseline bar x Subscript i Baseline right-parenthesis p left-parenthesis x Subscript i Baseline right-parenthesis slash p left-parenthesis y Subscript j Baseline right-parenthesis equals StartFraction p left-parenthesis y Subscript j Baseline bar x Subscript i Baseline right-parenthesis p left-parenthesis x Subscript i Baseline right-parenthesis Over sigma-summation Underscript i equals 1 Overscript upper L Endscripts p left-parenthesis y Subscript j Baseline bar x Subscript i Baseline right-parenthesis p left-parenthesis x Subscript i Baseline right-parenthesis EndFraction p left-parenthesis x Subscript i Baseline bar y Subscript j Baseline right-parenthesis equals StartFraction p left-parenthesis y Subscript j Baseline bar x Subscript i Baseline right-parenthesis p left-parenthesis x Subscript i Baseline right-parenthesis Over sigma-summation Underscript i equals 1 Overscript upper L Endscripts p left-parenthesis y Subscript j Baseline bar x Subscript i Baseline right-parenthesis p left-parenthesis x Subscript i Baseline right-parenthesis EndFraction

      Bayes' Rule provides an update rule for probability distributions in response to observed information. Terminology:

       p(xi ) is referred to as the “prior distribution on X” in this context.

       p(xi ∣ yj ) is referred to as the “posterior distribution on X given Y.”

      2.5.3 Estimation Based on Maximal Conditional Probabilities

      There are two ways to do an estimation given a conditional problem. The first is to seek a maximal probability based on the optimal choice of outcome (maximum a posteriori [MAP]), versus a maximal probability (referred to as a “likelihood” in this context) given choice of conditioning (maximum likelihood [ML]).

      

Скачать книгу