Читать онлайн книгу - Informatics and Machine Learning. Stephen Winters-Hilt. Математика. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Informatics and Machine Learning - Stephen Winters-Hilt

Скачать книгу

to Frequencies to Probabilities

The conventional relations on probabilities say nothing as to their interpretation. According to the Frequentist (frequency‐based) interpretation, probabilities are defined in terms of fractions of a set of observations, as the number of observations tends to infinity (where the LLN works to advantage). In practice, infinite observations are not done, and often only one observation is done (predicting the winner of a marathon, for example). In the case of one race, however, it seems intuitive that prior information would still be beneficial to predicting winners. With the formal introduction of prior probabilities we then arrive at the Bayesian interpretation. From the Bayesian perspective, prior probabilities can be encoded as “pseudocounts” in the frequentist framework (i.e. observation counts do not necessarily initialize from zero). In the computer implementations used here there are typically tuned/selected psuedocounts and minimum/maximum probability cutoffs, thus the implementations can be formally described on a Bayesian footing [1, 3].

Whenever you can list all the outcomes for some situation (like rolls on a six‐sided die), it is natural to think of the “probabilities” of those outcomes, where it is also natural for the outcome probabilities sum to one. So, with probability we assume there are “rules” (the probability assignments), and using those rules we make predictions on future outcomes. The rules are a mathematical framework, thus probability is a mathematical encapsulation of outcomes.

How did we get the “rules,” the probability assignments on outcomes? This is the realm of statistics, where we have a bunch of data and we want to distill any rules that we can, such as a complete set of outcomes (observed) and their assigned (estimated) probabilities. If the analysis to go from raw data to a probability model was somehow done in one step, then it could be said that statistics is whatever takes you from raw data to a probability model, and hopefully do so without dependency on a probability model. In practice, however, the statistical determination of a probability model suitable for a collection of data is like the identification of a physical law in mathematical form given raw data – it is math and a lot more, including an iterative creative/inventive process where models are attempted and discarded, and built from existing models.

2.4 Identifying Emergent/Convergent Statistics and Anomalous Statistics

Expectation, E(X), of random variable (r.v.) X:

upper E left-parenthesis upper X right-parenthesis identical-to sigma-summation Underscript i equals 1 Overscript upper L Endscripts x Subscript i Baseline p left-parenthesis x Subscript i Baseline right-parenthesis if x Subscript i Baseline German element-of German upper R

X is the total of rolling two six‐sided dice: X = 2 can occur in one way, rolling “snake eyes,” while rolling X = 7 can be done in six ways, etc. E(X) = 7. Now consider the expectation for rolling a single die, now E(X) = 3.5. Notice that the value of the expectation need not be one of your possible outcomes (it is really hard to roll a 3.5).

The expectation, E(g(X)), of a function g of r.v. X:

upper E left-parenthesis g left-parenthesis upper X right-parenthesis right-parenthesis identical-to sigma-summation Underscript i equals 1 Overscript upper L Endscripts g left-parenthesis x Subscript i Baseline right-parenthesis p left-parenthesis x Subscript i Baseline right-parenthesis if x Subscript i Baseline German element-of German upper R

Consider special case g(X) where g(xi ) = −log(p(xi )):

upper H left-parenthesis upper X right-parenthesis identical-to upper E left-bracket g left-parenthesis upper X right-parenthesis right-bracket equals minus sigma-summation Underscript i equals 1 Overscript upper L Endscripts p left-parenthesis x Subscript i Baseline right-parenthesis log left-parenthesis p left-parenthesis x Subscript i Baseline right-parenthesis right-parenthesis if p left-parenthesis x Subscript i Baseline right-parenthesis element-of German upper R plus comma

which is Shannon Entropy for the discrete distribution p(xi). For Mutual Information, similarly, use g(X,Y) = log(p(xi , yi )/p(xi )p(yi )) :

upper I left-parenthesis upper X semicolon upper Y right-parenthesis identical-to upper E left-bracket g left-parenthesis upper X comma upper Y right-parenthesis right-bracket identical-to sigma-summation Underscript i equals 1 Overscript upper L Endscripts p left-parenthesis x Subscript i Baseline comma y Subscript i Baseline right-parenthesis log left-parenthesis p left-parenthesis x Subscript i Baseline comma y Subscript i Baseline right-parenthesis slash p left-parenthesis x Subscript i Baseline right-parenthesis p left-parenthesis y Subscript i Baseline right-parenthesis right-parenthesis

if p(xi ), p(yi ), p(xi , yi ) are all ∈ℜ⁺, which is the Relative Entropy between a joint distribution and the same distribution if r.v.'s independent: D( p(xi , yi ) ‖ p(xi )p(yi ) ).

Jensen's Inequality:

Let φ(⋅) be a convex function on a convex subset of the real line: φ: χ➔ℜ. Convexity by definition: φ(λ₁ x₁+…y_n x_n) ≤ λ₁ φ(x₁)+ … +λ_n φ(x_n), where λ_i ≥ 0 and ∑λ_i = 1. Thus, if λ₁ = p(x₁), we satisfy the relations for line interpolation as well as discrete probability distributions, so can rewrite in terms of the Expectation definition:

phi left-parenthesis upper E left-parenthesis upper X right-parenthesis right-parenthesis less-than-or-equal-to upper E left-parenthesis phi left-parenthesis upper X right-parenthesis right-parenthesis

Since φ(x) = −log(x) is a convex function:

log left-parenthesis upper E left-parenthesis upper X right-parenthesis right-parenthesis greater-than-or-equal-to upper E left-parenthesis log left-parenthesis upper X right-parenthesis right-parenthesis equals minus upper H left-parenthesis upper X right-parenthesis

Variance:

upper V a r left-parenthesis upper X right-parenthesis identical-to upper E left-parenthesis left-bracket upper X minus upper E left-parenthesis upper X right-parenthesis right-bracket squared right-parenthesis equals sigma-summation Underscript i equals 1 Overscript upper L Endscripts left-parenthesis x Subscript i Baseline minus upper E left-parenthesis upper X right-parenthesis right-parenthesis squared p left-parenthesis x Subscript i Baseline right-parenthesis equals upper E left-parenthesis upper X squared right-parenthesis minus left-parenthesis upper E left-parenthesis upper X right-parenthesis right-parenthesis squared

Chebyshev's Inequality:

For k > 0, P(|X − E(X)| > k) ≤ Var(X)/k²

Proof: StartLayout 1st Row upper V a r left-parenthesis upper X right-parenthesis
<noindex><p style= Скачать книгу

Informatics and Machine Learning. Stephen Winters-Hilt

Чтение книги онлайн.

Читать онлайн книгу Informatics and Machine Learning - Stephen Winters-Hilt страница 21

Информация о книге:

2.4 Identifying Emergent/Convergent Statistics and Anomalous Statistics