Data Science in Theory and Practice. Maria Cristina Mariani

Чтение книги онлайн.

Читать онлайн книгу Data Science in Theory and Practice - Maria Cristina Mariani страница 19

Data Science in Theory and Practice - Maria Cristina Mariani

Скачать книгу

left-parenthesis 1 minus x right-parenthesis Superscript beta minus 1 Baseline comma 2nd Column if 0 less-than x less-than 1 comma 2nd Row 1st Column 0 comma 2nd Column if otherwise comma EndMatrix"/>

      where alpha greater-than 0 and beta greater-than 0.

      The Dirichlet distribution Dir left-parenthesis bold-italic alpha right-parenthesis, named after Johann Peter Gustav Lejeune Dirichlet (1805–1859), is a multivariate distribution parameterized by a vector bold alpha of positive parameters left-parenthesis alpha 1 comma ellipsis comma alpha Subscript n Baseline right-parenthesis.

      Specifically, the joint density of an n‐dimensional random vector bold upper X tilde Dir left-parenthesis bold-italic alpha right-parenthesis is defined as:

f left-parenthesis x 1 comma ellipsis comma x Subscript n Baseline right-parenthesis equals StartFraction 1 Over bold upper B left-parenthesis bold-italic alpha right-parenthesis EndFraction left-parenthesis product Underscript i equals 1 Overscript n Endscripts x Subscript i Superscript alpha Super Subscript i Superscript minus 1 Baseline bold 1 Subscript left-brace x Sub Subscript i Subscript greater-than 0 right-brace Baseline right-parenthesis bold 1 Subscript left-brace x 1 plus midline-horizontal-ellipsis plus x Sub Subscript n Subscript equals 1 right-brace Baseline comma

      where 1 Subscript left-brace x 1 plus midline-horizontal-ellipsis plus x Sub Subscript n Subscript equals 1 right-brace is an indicator function.

1 Subscript upper A Baseline colon upper X right-arrow StartSet 0 comma 1 EndSet

      defined as

1 Subscript upper A Baseline left-parenthesis x right-parenthesis equals Start 2 By 2 Matrix 1st Row 1st Column 1 comma 2nd Column if x element-of upper A comma 2nd Row 1st Column 0 comma 2nd Column if x not-an-element-of upper A period EndMatrix

      The components of the random vector bold upper X thus are always positive and have the property upper X 1 plus midline-horizontal-ellipsis plus upper X Subscript n Baseline equals 1. The normalizing constant bold upper B left-parenthesis bold-italic alpha right-parenthesis is the multinomial beta function, that is defined as:

bold upper B left-parenthesis bold-italic alpha right-parenthesis equals StartFraction product Underscript i equals 1 Overscript n Endscripts normal upper Gamma left-parenthesis alpha Subscript i Baseline right-parenthesis Over normal upper Gamma left-parenthesis sigma-summation Underscript i equals 1 Overscript n Endscripts alpha Subscript i Baseline right-parenthesis EndFraction equals StartFraction product Underscript i equals 1 Overscript n Endscripts normal upper Gamma left-parenthesis alpha Subscript i Baseline right-parenthesis Over normal upper Gamma left-parenthesis alpha 0 right-parenthesis EndFraction comma

      where we used the notation alpha 0 equals sigma-summation Underscript i equals 1 Overscript n Endscripts alpha Subscript i and normal upper Gamma left-parenthesis x right-parenthesis equals integral Subscript 0 Superscript infinity Baseline t Superscript x minus 1 Baseline e Superscript negative t Baseline d t for the Gamma function.

      Because the Dirichlet distribution creates n positive numbers that always sum to 1, it is extremely useful to create candidates for probabilities of n possible outcomes. This distribution is very popular and related to the multinomial distribution which needs n numbers summing to 1 to model the probabilities in the distribution. The multinomial distribution is defined in Section 2.3.2.

      With the notation mentioned above and alpha 0 as the sum of all parameters, we can calculate the moments of the distribution. The first moment vector has coordinates:

upper E left-bracket upper X Subscript i Baseline right-bracket equals StartFraction alpha Subscript i Baseline Over alpha 0 EndFraction period

      The covariance matrix has elements:

Var left-parenthesis upper X Subscript i Baseline right-parenthesis equals StartFraction alpha Subscript i Baseline left-parenthesis alpha 0 minus alpha Subscript i Baseline right-parenthesis Over alpha 0 squared left-parenthesis alpha 0 plus 1 right-parenthesis EndFraction comma

      and when i not-equals j

Cov left-parenthesis upper X Subscript i Baseline comma upper X Subscript j Baseline right-parenthesis equals StartFraction minus alpha Subscript i Baseline alpha Subscript j Baseline Over alpha 0 squared left-parenthesis alpha 0 plus 1 right-parenthesis EndFraction period

      The covariance matrix is singular (its determinant is zero).

      Finally, the univariate marginal distributions are all beta with parameters upper X Subscript i Baseline tilde Beta left-parenthesis alpha Subscript i Baseline comma alpha 0 minus alpha Subscript i Baseline right-parenthesis. All these are in the reference (see Balakrishnan and Nevzorov 2004).

      Please refer to Lin (2016) for the proof of the properties of the Dirichlet distribution.

      2.3.2 Multinomial Distribution

      Definition 2.24 (Binomial distribution) A random variable

Скачать книгу