Applied Univariate, Bivariate, and Multivariate Statistics Using Python. Daniel J. Denis

Чтение книги онлайн.

Читать онлайн книгу Applied Univariate, Bivariate, and Multivariate Statistics Using Python - Daniel J. Denis страница 16

Applied Univariate, Bivariate, and Multivariate Statistics Using Python - Daniel J. Denis

Скачать книгу

an option since we can operationalize the independent variable as a binary dummy-coded predictor. When we flip things around, such that the grouping variable is now the response and VO2-max is the predictor, we are in the realm of discriminant analysis or logistic regression on two groups. Here, we would like to predict group membership based on the continuous predictor. Notice that these models are answering different research questions, but at their core, it stands that they must have great technical similarity. As we will see as we progress, indeed they do. Within a t-test, for example, can be considered, at least on a conceptual level, to house a very primitive discriminant function! When doing a t-test, we don’t “see” the idea of a discriminant function simply because it is not a question we are asking. Nonetheless, it is there in concept underlying the technique. Once you understand the commonality of what underlies virtually all of these models, they will quickly lose their mystery. You will be less inclined to survey a decision-tree using statistical methods and see different procedures. What you will rather see is one larger model with special cases and peculiarities in each method.

       Most statistical models, even if used for different research purposes and to answer different research questions, are quite technically similar at their core. One of the goals of learning and understanding statistical modeling is to grasp as quickly as possible this similarity so that you realize that differences in approach often have more to do with differences in research questions rather than differences in underlying technical details.

      1.6.1 Continuity Is Not Always Clear-Cut

      1.7 Using Abstract Systems to Describe Physical Phenomena: Understanding Numerical vs. Physical Differences

      One of the key starting points to using and applying statistics to real phenomena is to understand and appreciate the difference between the tool you are using and the “stuff” you are applying it to. They are often not one-to-one. Simply because we represent a difference numerically does not imply that the difference exists on a physical level. Making this distinction is extremely important, especially in today’s age where everything is about “data” and hence it is simply taken for granted that what we choose to measure is “real” and our measuring tool and system can capture such differences. In some cases, it can, but in others, automatically equating numerical differences with actual substantive differences is foolish.

      As an example, suppose I developed a questionnaire to assess your degree of pizza preference. Suppose I scaled the questionnaire from 0 to 10, where “0” indicates a dislike for pizza and “10” indicates a strong preference. Suppose you circle “7” as your choice and your friend circles “5.” Does that mean you prefer pizza more than your friend? Not necessarily. Simply because you have selected a higher number may not mean you enjoy pizza more. It may simply mean you selected a higher number. The measured distance between 5 and 7 may not equate to an actual difference in pizza preference.

      Scales of measurement (Stevens, 1946) have been developed to try to highlight these and other issues, but, as we will see, they are far from adequate in solving the measurement problem. Everything we measure is based on a scale. We attempt to capture the phenomena and assign a numerical measurement to it. A nominal scale is one in which labels are simply given to values of the variable. For example, “short” vs. “tall” when measuring height would represent a variable measurable on a nominal scale. However, we can do better. Since “tall” presumably contains more height than “short,” we can say tall > short (i.e. tall is greater than short) and assign the variable measurable on an ordinal scale. The next level of measurement is that of an interval scale in which distances between values on the scale are presumed to be equal. For example, the difference in the number of coins in my pocket from 0 to 5 is the same distance between the number of coins from 5 to 10. If the scale has an absolute zero point, meaning that a measurement of “0” actually means “zero coins,” then the scale takes on the extra property of being a ratio scale.

      As an example, in chemistry and nutrition, the oxidative stability of an oil is a measure of how quickly the oil starts to degrade when heated and exposed to light. Presumably, consumers would prefer, on this basis, an oil with more oxidative stability than less (frying at very high temperatures can apparently degrade the oil). In a recent study (Guillaume and Ravetti, 2018), it was found that the oxidative stability for olive oil was higher than the oxidative ability of, say, sunflower oil. Hence, one might be tempted to select olive oil instead of sunflower oil on this basis. However, does the difference in oxidative values translate into anything meaningful, or is it simply a measure of numerical difference that for all purposes is somewhat academic? Olive oil may be more stable, but is that “more” amount really worth not using sunflower oil if you indeed prefer sunflower? It’s very easy when analyzing and interpreting data to fall into the ranking trap, where simply because one element ranks higher than another falsely implies a pragmatic or even meaningful increase on a physical level. The headline may be that “Olive oil is #1,” but is #10 practically pretty much the same anyway, or is the utility of the difference in oils enough to influence one’s decision? The ranking differences may be inconsequential to the decision. For example, if I told you your primary doctor ranked 100th out of 100 individuals graduating out of his or her graduating class, you may at first assume your doctor is not very good. However, the differences between ranking quantities may be extremely slight or so small when translated on a practical level to not matter at all or, at minimum, be negligible. Differences may even be due to measurement error and hence not exist beyond chance. Likewise,

Скачать книгу