Data Science in Theory and Practice. Maria Cristina Mariani

Чтение книги онлайн.

Читать онлайн книгу Data Science in Theory and Practice - Maria Cristina Mariani страница 24

Data Science in Theory and Practice - Maria Cristina Mariani

Скачать книгу

      This implies that the average dollar sales for two movies is $40.00. Therefore, the average amount of dollars that cost a movie is 20 dollars.

      The variance–covariance matrix (or simply the covariance matrix) of a random vector bold upper X is given by

Cov left-parenthesis bold upper X right-parenthesis equals upper E left-bracket left-parenthesis bold upper X minus upper E left-parenthesis bold upper X right-parenthesis right-parenthesis left-parenthesis bold upper X minus upper E left-parenthesis bold upper X right-parenthesis right-parenthesis Superscript upper T Baseline right-bracket comma

      where upper E left-parenthesis bold upper X right-parenthesis is the mean vector.

      In (3.1), the covariance matrix consists of the variances of the variables along the main diagonal and the covariances between each pair of variables in the other matrix positions. The sample covariance of the ith and kth variables, s Subscript i k, is calculated using the ith and kth columns of bold upper X:

      (3.2)s Subscript i k Baseline equals StartFraction 1 Over n minus 1 EndFraction sigma-summation Underscript j equals 1 Overscript n Endscripts left-parenthesis x Subscript j i Baseline minus x overbar Subscript i Baseline right-parenthesis left-parenthesis x Subscript j k Baseline minus x overbar Subscript k Baseline right-parenthesis comma i equals 1 comma 2 comma ellipsis comma p comma k equals 1 comma 2 comma ellipsis comma p comma

      where n is the number of measurements.

      For example if i equals 1 comma k equals 2:

      (3.3)s 12 equals StartFraction 1 Over n minus 1 EndFraction sigma-summation Underscript j equals 1 Overscript n Endscripts left-parenthesis x Subscript j Baseline 1 Baseline minus x overbar Subscript 1 Baseline right-parenthesis left-parenthesis x Subscript j Baseline 2 Baseline minus x overbar Subscript 2 Baseline right-parenthesis

      and if i equals 1 comma k equals 1:

      we have the sample variance.

      The sample covariance measures the association between the ith and kth variables. The sample covariance reduces to the sample variance when i equals k as observed in (3.4). We note that the sample covariance matrix (3.1) is symmetric, i.e. s Subscript i k Baseline equals s Subscript k i for all i and k because of its definition. Other names used for the covariance matrix are variance matrix, variance–covariance matrix, and dispersion matrix. In finance the concept of covariance is applied in portfolio theory, in the diversification method, that reduces the risk by choosing assets that do not present a high positive covariance with each other.

      If bold upper X is a random vector taking on any possible value in a multivariate population, the population covariance matrix is defined as

Скачать книгу