Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis
Чтение книги онлайн.
Читать онлайн книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis страница 35
We can say that over all samples of a given size n, the probability is 0.95 for the following event to occur:
How was (2.2) obtained? Recall the calculation of a z‐score for a mean:
Suppose now that we want to have a 0.025 area on either side of the normal distribution. This value corresponds to a z‐score of 1.96, since the probability of a z‐score of ±1.96 is 2(1 – 0.9750021) = 0.0499958, which is approximately 5% of the total curve. So, from the z‐score, we have
We can modify the equality slightly to get the following:
We interpret (2.3) as follows:
Over all possible samples, the probability is 0.95 that the range between
Very important to note regarding the above statement is that μ is not the random variable. The part that is random is the sample on which is computed the interval. That is, the probability statement is not about μ but rather is about samples. The population mean μ is assumed to be fixed. The 95% confidence interval tells us that if we continued to sample repeatedly, and on each sample computed a confidence interval, then 95% of these intervals would include the true parameter.
The 99% confidence interval for the mean is likewise given by:
Notice that the only difference between (2.3) and (2.4) is the choice of different critical values on either side of μ (i.e., 1.96 for the 95% interval and 2.58 for the 99% interval).
Though of course not very useful, a 100% confidence interval, if constructed, would be defined as:
If you think about it carefully, the 100% confidence interval should make perfect sense. If you would like to be 100% “sure” that the interval will cover the true population mean, then you have to extend your limits to negative and positive infinity, otherwise, you could not be fully confident. Likewise, on the other extreme, a 0% interval would simply have
That is, if you want to have zero confidence in guessing the location of the population mean, μ, then guess the sample mean
2.14 MAXIMUM LIKELIHOOD
When we speak of likelihood, we mean the probability of some sample data or set of observations conditional on some hypothesized parameter or set of parameters (Everitt, 2002). Conditional probability statements such as p(D/H0) can very generally be considered simple examples of likelihoods, where typically the set of parameters, in this case, may be simply μ and σ2. A likelihood function is the likelihood of a parameter given data (see Fox, 2016).
When we speak of maximum‐likelihood estimation, we mean the process of maximizing a likelihood subject to certain parameter conditions. As a simple example, suppose we obtain 8 heads on 10 flips of a presumably fair coin. Our null hypothesis was that the coin is fair, meaning that the probability of heads is p(H) = 0.5. However, our actual obtained result of 8 heads on 10 flips would suggest the true probability of heads to be closer to p(H) = 0.8. Thus, we ask the question:
Which value of θmakes the observed result most likely?
If we only had two choices of θ to select from, 0.5 and 0.8, our answer would have to be 0.8, since this value of the parameter θ makes the sample result of 8 heads out of 10 flips most likely. That is the essence of how maximum‐likelihood estimation works (see Hays, 1994, for a similar example). ML is the most common method of estimating parameters in many models, including factor analysis, path analysis, and structural equation models to be discussed later in the book. There are very good reasons why mathematical statisticians generally approve of maximum likelihood. We summarize some of their most favorable properties.
Firstly, ML estimators are asymptotically unbiased, which means that bias essentially vanishes as sample size increases without bound (Bollen, 1989). Secondly, ML estimators are consistent and asymptotically efficient, the latter meaning that the estimator has a small asymptotic variance relative to many other estimators. Thirdly, ML estimators are asymptotically