Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis

Чтение книги онлайн.

Читать онлайн книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis страница 25

Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis

Скачать книгу

      2.1.4 Joint Probability Densities: Bivariate and Multivariate Distributions

      A univariate density expresses the probability of a single random variable within a specified interval of values along the abscissa. A joint probability density, analogous to a joint probability, expresses the probability of simultaneously observing two random variables over a given interval of values. The bivariate normal density is given by:

equation

      where ρ2 is the squared Pearson correlation coefficient between x1 and x2.

      Empirical bivariate distributions (as opposed to bivariate densities) are those showing the joint occurrence on two variables. For instance, again using Galton's data, we plot parent height by child height, in which we also fit both regression lines (see Chapter 7) using lm :

      > plot(parent, child, main = "Bivariate Plot of Parent and Child Height") > abline(lm(parent~child)) > abline(lm(child~parent))Schematic illustration of bivariate density.

      Note the relation between parent height and child height. Recall that a mathematical relation is a subset of the Cartesian product. The Cartesian product in the plot consists of all theoretically possible parent–child pairings. The fact that shorter than average parents tend to have shorter than average children and taller than average parents tend to have taller than average children reveals the linear form of the mathematical relation. In the plot are regression lines for child height as a function of parent height and parent height as a function of child height. Computing both the mean of child and of parent, we obtain:

      > mean(child) [1] 68.08847 > mean(parent) [1] 68.30819

      Turning now to multivariate distributions, the multivariate density is given by:

equation

      Most multivariate procedures make some assumption regarding the multivariate normality of sampling distributions. Evaluating such an assumption is intrinsically difficult due to the high dimensionality of the data. The best researchers can usually do is attempt to verify univariate and bivariate normality through such devices as histograms and scatterplots. Fortunately, as is the case for methods assuming univariate normality, multivariate procedures are relatively robust, in most cases, to modest violations. Though Mardia's test (Mardia, 1970) is favored by some (e.g., Romeu and Ozturk, 1993), no single method for evaluating multivariate normality appears to be fully adequate. Visual inspections of Q–Q plots (to be discussed) are usually sufficient for applied purposes.

Schematic illustration of a three-dimensional scatterplot with density contour and points.

      Source: Figure taken from JMP 12 Essential Graphing, Copyright © 2015, SAS Institute Inc., USA. All Rights Reserved. Reproduced with permission of SAS Institute Inc, Cary, NC.

      The chi‐square distribution is given by:

equation equation

      The chi‐square distribution plays an important role in mathematical statistics and is associated with a number of tests on model coefficients in a variety of statistical methods. The multivariate analog to the chi‐square distribution is that of the Wishart distribution (see Rencher, 1998, p. 53, for details).

      The chi‐square goodness‐of‐fit test is one such statistical method that utilizes the chi‐square test statistic to evaluate the tenability of a null hypothesis. Recall that such a test is suitable for categorical data in which counts (i.e., instead of means, medians, etc.) are computed within each cell of the design. The goodness‐of‐fit test is given by

equation

Скачать книгу