Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis

Чтение книги онлайн.

Читать онлайн книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis страница 30

Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis

Скачать книгу

being communicated by the notation “E(yiμ)2.” We can “unpack” this expression to read

equation

      where p(yi) is the probability of the given deviation, (yiμ), for in this case, a discrete random variable.

      When we speak of moments of a distribution or of a random variable, we are referring to such things as the mean, variance, skewness, and kurtosis.

      The first moment of a distribution is its mean. For a discrete random variable yi, the expectation is given by:

equation

      where yi is the given value of the variable, and p(yi) is its associated probability. When yi is a continuous random variable, the expectation is given by:

equation

      Notice again that in both cases, whether the variable is discrete or continuous, we are simply summing products of values of the variable with its probability, or density if the variable is continuous. In the case of the discrete variable, the products are “explicit” in that our notation tells us to take each value of y (i.e., yi) and multiply by the probability of that given value, p(yi). In the case of a continuous variable, the products are a bit more implicit one might say, since the “probability” of any particular value in a continuous density is equal to 0. Hence, the product yip(yi) is equal to the given value of yi multiplied by its corresponding density.

      2.6.1 Sample and Population Mean Vectors

      We often wish to analyze data simultaneously on several response variables. For this, we require vector and matrix notation to express our responses. The matrix operations presented here are surveyed more comprehensively in the Appendix and in any book on elementary matrix algebra.

Schematic illustration of a weighing balance.

      Consider the following vector:

equation

      where y1 is observation 1 up to observation yn.

      We can write the sample mean vector images for several variables y1 through yp as

equation

      where images is the mean of the pth variable.

      The expectation of individual observations within each vector is equal to the population mean μ, of which the expectation of the sample vector y is equal to the population vector, μ. This is simply an extension of scalar algebra to that of matrices:

equation

      Likewise, the expectations of individual sample means images, images, … images are equal to their population counterparts, μ1, μ2, … μp. The expectation of the sample mean vector images is equal to the population mean vector, μ:

equation

      We note also that images is an unbiased estimator of μ since images.

      Recall that we said that the mean is the first moment of a distribution. We discuss the second moment of a distribution, that of the variance, shortly. Before we do so, a brief discussion of estimation is required.

      The goal of statistical inference is, in general, to estimate parameters of a population. We distinguish between point estimators and interval estimators. A point estimator is a function of a sample and is used to estimate a parameter in the population. Because estimates generated by estimators will vary from sample to sample, and thus have a probability distribution associated with them, estimators are also often random variables. For example, the sample mean images is an estimator of the population mean μ. However, if we sample a bunch of images from a population for which μ is the actual population mean, we know, both from experience and statistical theory, that images will vary from sample to sample. This is why the estimator images is often a random variable, because its values will each have associated with them a given probability (density) of occurrence. When we use the estimator to obtain a particular number, that number is known as an estimate. An interval estimator provides a range of values within which the true parameter is hypothesized to exist within some probability. A popular interval estimator is that of the confidence interval, a topic we discuss later in this chapter.

      More generally, if T is some statistic, then we can use T as an estimator of a population parameter θ. Whether the estimator T is any good depends on

Скачать книгу