Data Science in Theory and Practice. Maria Cristina Mariani

Чтение книги онлайн.

Читать онлайн книгу Data Science in Theory and Practice - Maria Cristina Mariani страница 18

Data Science in Theory and Practice - Maria Cristina Mariani

Скачать книгу

      Now we can define the moments of the random vector. The first moment is a vector

upper E left-bracket bold upper X right-bracket equals mu Subscript bold upper X Baseline equals Start 3 By 1 Matrix 1st Row upper E left-bracket upper X 1 right-bracket 2nd Row vertical-ellipsis 3rd Row upper E left-bracket upper X Subscript n Baseline right-bracket EndMatrix period

      The expectation applies to each component in the random vector. Expectations of functions of random vectors are computed just as with univariate random variables. We recall that expectation of a random variable is its average value.

      The second moment requires calculating all the combination of the components. The result can be presented in a matrix form. The second central moment can be presented as the covariance matrix.

      (2.1)StartLayout 1st Row 1st Column Cov left-parenthesis bold upper X right-parenthesis 2nd Column equals upper E left-bracket left-parenthesis bold upper X minus mu Subscript bold upper X Baseline right-parenthesis left-parenthesis bold upper X minus mu Subscript bold upper X Baseline right-parenthesis Superscript t Baseline right-bracket 2nd Row 1st Column Blank 2nd Column equals Start 4 By 4 Matrix 1st Row 1st Column Var left-parenthesis upper X 1 right-parenthesis 2nd Column Cov left-parenthesis upper X 1 comma upper X 2 right-parenthesis 3rd Column ellipsis 4th Column Cov left-parenthesis upper X 1 comma upper X Subscript n Baseline right-parenthesis 2nd Row 1st Column Cov left-parenthesis upper X 2 comma upper X 1 right-parenthesis 2nd Column Var left-parenthesis upper X 2 right-parenthesis 3rd Column ellipsis 4th Column Cov left-parenthesis upper X 2 comma upper X Subscript n Baseline right-parenthesis 3rd Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 3rd Column down-right-diagonal-ellipsis 4th Column vertical-ellipsis 4th Row 1st Column Cov left-parenthesis upper X Subscript n Baseline comma upper X 1 right-parenthesis 2nd Column Cov left-parenthesis upper X Subscript n Baseline comma upper X 2 right-parenthesis 3rd Column ellipsis 4th Column Var left-parenthesis upper X Subscript n Baseline right-parenthesis EndMatrix comma EndLayout

      where we used the transpose matrix notation and since the Cov left-parenthesis upper X Subscript i Baseline comma upper X Subscript j Baseline right-parenthesis equals Cov left-parenthesis upper X Subscript j Baseline comma upper X Subscript i Baseline right-parenthesis, the matrix is symmetric.

      We note that the covariance matrix is positive semidefinite (nonnegative definite), i.e. for any vector u element-of double-struck upper R Superscript n, we have u Superscript upper T Baseline bold upper X u less-than-or-equal-to 0.

      Now we explain why the covariance matrix has to be semidefinite. Take any vector u element-of double-struck upper R Superscript n. Then the product

StartLayout 1st Row 1st Column Var left-parenthesis u Superscript t Baseline bold upper X right-parenthesis 2nd Column equals upper E left-bracket left-parenthesis u Superscript upper T Baseline bold upper X minus u Superscript upper T Baseline mu Subscript bold upper X Baseline right-parenthesis squared right-bracket 2nd Row 1st Column Blank 2nd Column equals upper E left-bracket left-parenthesis u Superscript upper T Baseline bold upper X minus u Superscript upper T Baseline mu Subscript bold upper X Baseline right-parenthesis left-parenthesis u Superscript t Baseline bold upper X minus u Superscript upper T Baseline mu Subscript bold upper X Baseline right-parenthesis Superscript t Baseline right-bracket 3rd Row 1st Column Blank 2nd Column equals upper E left-bracket u Superscript upper T Baseline left-parenthesis bold upper X minus mu Subscript bold upper X Baseline right-parenthesis left-parenthesis bold upper X minus mu Subscript bold upper X Baseline right-parenthesis Superscript t Baseline left-parenthesis u Superscript upper T Baseline right-parenthesis Superscript t Baseline right-bracket 4th Row 1st Column Blank 2nd Column equals u Superscript upper T Baseline Cov left-parenthesis bold upper X right-parenthesis u period EndLayout

      Since the variance is always nonnegative, the covariance matrix must be nonnegative definite (or positive semidefinite). We recall that a square symmetric matrix upper A element-of double-struck upper R Superscript n times n is positive semidefinite if u Superscript t Baseline upper A u greater-than-or-equal-to 0 comma for-all u element-of double-struck upper R Superscript n. This difference is in fact important in the context of random variables since you may be able to construct a linear combination u Superscript upper T Baseline bold upper X which is not always constant but whose variance is equal to zero.

      The covariance matrix is discussed in detail in Chapter 3.

      We now present examples of multivariate distributions.

      2.3.1 The Dirichlet Distribution

      Before we discuss the Dirichlet distribution, we define the Beta distribution.

      Definition 2.22 (Beta distribution) A random variable upper X is said to have a Beta distribution with parameters alpha and beta if it has a pdf f left-parenthesis x right-parenthesis defined as:

f left-parenthesis x right-parenthesis equals Start 2 By 2 Matrix 1st Row 1st Column StartFraction normal upper Gamma left-parenthesis alpha plus beta right-parenthesis Over normal upper Gamma left-parenthesis alpha right-parenthesis normal upper Gamma left-parenthesis beta right-parenthesis EndFraction x Superscript alpha minus 1 Baseline 
						<noindex><p style= Скачать книгу