Statistics and Probability with Applications for Engineers and Scientists Using MINITAB, R and JMP. Bhisham C. Gupta
Чтение книги онлайн.
Читать онлайн книгу Statistics and Probability with Applications for Engineers and Scientists Using MINITAB, R and JMP - Bhisham C. Gupta страница 44
Figure 2.5.1 Frequency distributions showing the shape and location of measures of centrality.
Definition 2.5.3
A data set is symmetric when the values in the data set that lie equidistant from the mean, on either side, occur with equal frequency.
Definition 2.5.4
A data set is left‐skewed when values in the data set that are greater than the median occur with relatively higher frequency than those values that are smaller than the median. The values smaller than the median are scattered to the left far from the median.
Definition 2.5.5
A data set is right‐skewed when values in the data set that are smaller than the median occur with relatively higher frequency than those values that are greater than the median. The values greater than the median are scattered to the right far from the median.
2.5.2 Measures of Dispersion
In the previous section, we discussed measures of centrality, which provide information about the location of the center of frequency distributions of the data sets under consideration. For example, consider the frequency distribution curves shown in Figure 2.5.2. Measures of central tendency do not portray the whole picture of any data set. For example, it can be seen in Figure 2.5.2 that the two frequency distributions have the same mean, median, and mode. Interestingly, however, the two distributions are very different. The major difference is in the variation among the values associated with each distribution. It is important, then, for us to know about the variation among the values of the data set. Information about variation is provided by measures known as measures of dispersion. In this section, we study three measures of dispersion: range, variance, and standard deviation.
Figure 2.5.2 Two frequency distribution curves with equal mean, median, and mode values.
Range
The range of a data set is the easiest measure of dispersion to calculate. Range is defined as
(2.5.5)
The range is not an efficient measure of dispersion because it takes into consideration only the largest and the smallest values and none of the remaining observations. For example, if a data set has 100 distinct observations, it uses only two observations and ignores the remaining 98 observations. As a rule of thumb, if the data set contains 10 or fewer observations, the range is considered a reasonably good measure of dispersion. For data sets containing more than 10 observations, the range is not considered to be an efficient measure of dispersion.
Example 2.5.9 (Tensile strength) The following data gives the tensile strength (in psi) of a sample of certain material submitted for inspection. Find the range for this data set:
8538.24, 8450.16, 8494.27, 8317.34, 8443.99, 8368.04, 8368.94, 8424.41, 8427.34, 8517.64
Solution: The largest and the smallest values in the data set are 8538.24 and 8317.34, respectively. Therefore, the range for this data set is
Variance
One of the most interesting pieces of information associated with any data is how the values in the data set vary from one another. Of course, the range can give us some idea of variability. Unfortunately, the range does not help us understand centrality. To better understand variability, we rely on more powerful indicators such as the variance, which is a value that focuses on how far the observations within a data set deviate from their mean.
For example, if the values in a data set are
(2.5.6)
Further the sample variance, denoted by
(2.5.7)
For computational purposes, we give below the simplified forms for the population variance and the sample variances.
(2.5.8)
Note that one difficulty in using the variance as the measure of dispersion is that the units for measuring the variance are not the same as those for data values. Rather, variance is expressed as a square of the units used for the data values. For example, if the data values are dollar amounts,