Data Science in Theory and Practice. Maria Cristina Mariani

Чтение книги онлайн.

Читать онлайн книгу Data Science in Theory and Practice - Maria Cristina Mariani страница 25

Data Science in Theory and Practice - Maria Cristina Mariani

Скачать книгу

alt="upper X Subscript i k"/> for i not-equals k.

      The notation sigma-summation for the covariance matrix is widely used and seems natural because sigma-summation is the uppercase version of sigma.

      Example 3.3 Consider the following data matrix introduced in Example 3.1:

bold upper X equals Start 3 By 2 Matrix 1st Row 1st Column 48 2nd Column 3 2nd Row 1st Column 22 2nd Column 1 3rd Row 1st Column 50 2nd Column 2 EndMatrix period

      Each receipt yields a pair of measurements, total dollar sales, and number of movies sold. Since there are three receipts, we have a total of three observations on each variable. We find the sample variances and covariance bold upper S Subscript n as follows:

StartLayout 1st Row 1st Column s 11 2nd Column equals one half sigma-summation Underscript j equals 1 Overscript 3 Endscripts left-parenthesis x Subscript j Baseline 1 Baseline minus x overbar Subscript 1 Baseline right-parenthesis squared 2nd Row 1st Column Blank 2nd Column one half left-parenthesis left-parenthesis 48 minus 40 right-parenthesis squared plus left-parenthesis 22 minus 40 right-parenthesis squared plus left-parenthesis 50 minus 40 right-parenthesis squared right-parenthesis equals 244 comma 3rd Row 1st Column s 22 2nd Column equals one half sigma-summation Underscript j equals 1 Overscript 3 Endscripts left-parenthesis x Subscript j Baseline 2 Baseline minus x overbar Subscript 2 Baseline right-parenthesis squared 4th Row 1st Column Blank 2nd Column one half left-parenthesis left-parenthesis 3 minus 2 right-parenthesis squared plus left-parenthesis 1 minus 2 right-parenthesis squared plus left-parenthesis 2 minus 2 right-parenthesis squared right-parenthesis equals 1 comma 5th Row 1st Column s 12 2nd Column equals one half sigma-summation Underscript j equals 1 Overscript 3 Endscripts left-parenthesis x Subscript j Baseline 1 Baseline minus x overbar Subscript 1 Baseline right-parenthesis left-parenthesis x Subscript j Baseline 2 Baseline minus x overbar Subscript 2 Baseline right-parenthesis 6th Row 1st Column Blank 2nd Column one half left-parenthesis left-parenthesis 48 minus 40 right-parenthesis left-parenthesis 3 minus 2 right-parenthesis plus left-parenthesis 22 minus 40 right-parenthesis left-parenthesis 1 minus 2 right-parenthesis plus left-parenthesis 50 minus 40 right-parenthesis left-parenthesis 2 minus 2 right-parenthesis right-parenthesis equals 13 comma 7th Row 1st Column s 21 2nd Column equals s 12 period EndLayout

      Therefore,

bold upper S Subscript n Baseline equals Start 2 By 2 Matrix 1st Row 1st Column 244 2nd Column 13 2nd Row 1st Column 13 2nd Column 1 EndMatrix period

      A correlation matrix is a table showing correlation coefficients between variables. Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. The sample correlation between the ith and kth variables is defined as

      where

StartLayout 1st Row 1st Column s Subscript i k 2nd Column equals StartFraction 1 Over n minus 1 EndFraction sigma-summation Underscript j equals 1 Overscript n Endscripts left-parenthesis x Subscript j i Baseline minus x overbar Subscript i Baseline right-parenthesis left-parenthesis x Subscript j k Baseline minus x overbar Subscript k Baseline right-parenthesis comma i equals 1 comma 2 comma ellipsis comma p and k equals 1 comma 2 comma ellipsis comma p comma 2nd Row 1st Column s Subscript i i 2nd Column equals StartFraction 1 Over n minus 1 EndFraction sigma-summation Underscript j equals 1 Overscript n Endscripts left-parenthesis x Subscript j i Baseline minus x overbar Subscript i Baseline right-parenthesis squared comma i equals 1 comma 2 comma ellipsis comma p comma 3rd Row 1st Column s Subscript k k 2nd Column equals StartFraction 1 Over n minus 1 EndFraction sigma-summation Underscript j equals 1 Overscript n Endscripts left-parenthesis x Subscript j k Baseline minus x overbar Subscript k Baseline right-parenthesis squared comma k equals 1 comma 2 comma ellipsis comma p period EndLayout

      (3.7)r Subscript i k Baseline equals StartFraction sigma-summation Underscript j equals 1 Overscript n Endscripts left-parenthesis x Subscript j i Baseline minus x overbar Subscript i Baseline right-parenthesis left-parenthesis x Subscript j k Baseline minus x overbar Subscript k Baseline right-parenthesis Over StartRoot sigma-summation Underscript j equals 1 Overscript n Endscripts left-parenthesis x Subscript j i Baseline minus x overbar Subscript i Baseline right-parenthesis squared EndRoot StartRoot sigma-summation Underscript j equals 1 Overscript n Endscripts left-parenthesis x Subscript j k Baseline minus x overbar Subscript k Baseline right-parenthesis squared EndRoot EndFraction

      for i equals 1 comma 2 comma ellipsis comma p and k equals 1 comma 2 comma ellipsis comma p. We note that the sample correlation is symmetric since r Subscript i k Baseline equals r Subscript k i for all i and k.

      The sample correlation coefficient is a measure of the linear association between two variables and does not depend on the units of measurement, i.e. when you construct the sample correlation coefficient, the units of measurement that are used cancel out. The sample correlation matrix is analogous to the covariance matrix with correlations in place of covariances:

      The population correlation matrix similar to (

Скачать книгу