Data Science in Theory and Practice. Maria Cristina Mariani
Чтение книги онлайн.
Читать онлайн книгу Data Science in Theory and Practice - Maria Cristina Mariani страница 25

The notation for the covariance matrix is widely used and seems natural because
is the uppercase version of
.
Example 3.3 Consider the following data matrix introduced in Example 3.1:
Each receipt yields a pair of measurements, total dollar sales, and number of movies sold. Since there are three receipts, we have a total of three observations on each variable. We find the sample variances and covariance as follows:
Therefore,
3.5 Correlation Matrices
A correlation matrix is a table showing correlation coefficients between variables. Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. The sample correlation between the th and
th variables is defined as
(3.6)
where
Substituting and
into (3.6) and canceling terms, we obtain
(3.7)
for and
. We note that the sample correlation is symmetric since
for all
and
.
The sample correlation coefficient is a measure of the linear association between two variables and does not depend on the units of measurement, i.e. when you construct the sample correlation coefficient, the units of measurement that are used cancel out. The sample correlation matrix is analogous to the covariance matrix with correlations in place of covariances:
(3.8)