Data Science in Theory and Practice. Maria Cristina Mariani

Чтение книги онлайн.

Читать онлайн книгу Data Science in Theory and Practice - Maria Cristina Mariani страница 20

Data Science in Theory and Practice - Maria Cristina Mariani

Скачать книгу

href="#fb3_img_img_88a4c458-1aef-51a7-b963-ce9fab490ff8.png" alt="upper X"/> is said to have a binomial distribution with parameters n and p if it has a pmf shown below

upper P left-parenthesis x semicolon p comma n right-parenthesis equals StartBinomialOrMatrix n Choose k EndBinomialOrMatrix left-parenthesis p right-parenthesis Superscript x Baseline left-parenthesis 1 minus p right-parenthesis Superscript left-parenthesis n minus x right-parenthesis Baseline for x equals 0 comma 1 comma ellipsis comma n comma

      where p is the probability of success on an individual trial and n is number of trials in the binomial experiment.

      The multinomial distribution is a generalization of the binomial distribution. Specifically, assume that n independent distributions may result in one of the k outcomes generically labeled upper S equals StartSet 1 comma 2 comma ellipsis comma k EndSet, each with corresponding probabilities left-parenthesis p 1 comma ellipsis comma p Subscript k Baseline right-parenthesis. Now define a vector bold upper X equals left-parenthesis upper X 1 comma ellipsis comma upper X Subscript k Baseline right-parenthesis, where each of the upper X Subscript i counts the number of outcomes i in the resulting sample of size n. The joint distribution of the vector bold upper X is

f left-parenthesis x 1 comma ellipsis comma x Subscript k Baseline right-parenthesis equals StartFraction n factorial Over x 1 factorial ellipsis x Subscript k Baseline factorial EndFraction p 1 Superscript x 1 Baseline ellipsis p Subscript k Superscript x Super Subscript k Superscript Baseline bold 1 Subscript left-brace bold x bold 1 bold plus bold midline-horizontal-ellipsis bold plus bold x Sub Subscript bold k Subscript bold equals bold n right-brace Baseline period

      In the same way as the binomial probabilities appear as coefficients in the binomial expansion of left-parenthesis p plus left-parenthesis 1 minus p right-parenthesis right-parenthesis Superscript n, the multinomial probabilities are the coefficients in the multinomial expansion left-parenthesis p 1 plus midline-horizontal-ellipsis plus p Subscript k Baseline right-parenthesis Superscript n, so they sum to 1. This expansion in fact gives the name of the distribution.

      If we label the outcome i as a success and everything else a failure, then upper X Subscript i simply counts successes in n independent trials and thus upper X Subscript i Baseline tilde Binom left-parenthesis n comma p Subscript i Baseline right-parenthesis. Thus, the first moment of the random vector and the diagonal elements in the covariance matrix are easy to calculate as n p Subscript i and n p Subscript i Baseline left-parenthesis 1 minus p Subscript i Baseline right-parenthesis, respectively. The off‐diagonal elements (covariances) are not that complicated to calculate either. However, for multinomial random vectors, the first two moments are difficult to compute. The one‐dimensional marginal distributions are binomial; however, the joint distribution of left-parenthesis upper X 1 comma ellipsis comma upper X Subscript r Baseline right-parenthesis, the first r components, is not multinomial. Instead, suppose we group the first r categories into 1 and we let upper Y equals upper X 1 plus midline-horizontal-ellipsis plus upper X Subscript r. Because the categories are linked, that is, upper X 1 plus midline-horizontal-ellipsis plus upper X Subscript k Baseline equals n, we also have that upper Y equals n minus upper X Subscript r plus 1 Baseline minus midline-horizontal-ellipsis minus upper X Subscript k. We can easily verify that the vector left-parenthesis upper Y comma upper X Subscript r plus 1 Baseline comma ellipsis comma upper X Subscript k Baseline right-parenthesis, or equivalently left-parenthesis n minus upper X Subscript r plus 1 Baseline minus midline-horizontal-ellipsis minus upper X Subscript k Baseline comma upper X Subscript r plus 1 Baseline comma ellipsis comma upper X Subscript k Baseline right-parenthesis, will have a multinomial distribution with associated probabilities left-parenthesis p Subscript upper Y Baseline comma p Subscript r plus 1 Baseline comma ellipsis comma p Subscript k Baseline right-parenthesis equals left-parenthesis p 1 plus midline-horizontal-ellipsis plus p Subscript r Baseline comma p Subscript r plus 1 Baseline comma ellipsis comma p Subscript k Baseline right-parenthesis.

      Next consider the conditional distribution of the first r components given the last k minus r components. That is, the distribution of

left-parenthesis upper X 1 comma ellipsis comma upper X Subscript r Baseline right-parenthesis bar upper X Subscript r plus 1 Baseline equals n Subscript r plus 1 Baseline comma ellipsis comma upper X Subscript k Baseline equals n Subscript k Baseline period

      2.3.3 Multivariate Normal Distribution

Скачать книгу