Biostatistics Decoded. A. Gouveia Oliveira

Чтение книги онлайн.

Читать онлайн книгу Biostatistics Decoded - A. Gouveia Oliveira страница 21

Biostatistics Decoded - A. Gouveia Oliveira

Скачать книгу

AUR is a binary variable because it is displayed as a single value. Adverse events is a multi‐valued attribute and its values are binary variables, and we know that because the percentages do not sum to 100%.

      Why data analysis does not stop after descriptive statistics are obtained from the data? After all, we have obtained population estimates of means and proportions, which is the information we were looking for. Probably that is the thinking of many people who believe that they have to present a statistical analysis otherwise they will not get their paper published.

      Actually, sample means and proportions are even called point estimates of a population mean or proportion, because they are unbiased estimators of those quantities. However, that does not mean that the sample mean or sample proportion has a value close to the population mean or the population proportion. We can verify that with a simple experiment.

An illustration of the phenomenon of sampling variation and above are shown plots of the values in random samples of size n of an interval variable. Horizontal lines represent the sample means. Below is shown a histogram of the distribution of sample means of a large number of random samples.

      Now let us continue taking random samples of size n from that variable. The means will keep coming up with a different value every time. So we plot the means in a histogram to see if there is some discernible pattern in the frequency distribution of sample means. What we will eventually find is the histogram shown in Figure 1.22. If we could take an infinite number of samples and plotted the sample means, we would end up with a graph with the shape of the curve in the same figure.

      What we learn from that experiment is that the means of interval attributes of independent samples of a given size n, obtained from the same population, are subjected to random variation. Therefore, sample means are random variables, that is, they are variables because they can take many different values, and they are random because the values they take are determined by chance.

      We also learn, by looking at the graph in Figure 1.22, that sample means can have very different values and we can never assume that the population mean has value close to the value of the sample mean. Those are the reasons why we cannot assume that the value of the population mean is the same as the sample mean. So an important conclusion is that one must never, ever draw conclusions about a population based on the value of sample means. Sample means only describe the sample, never the population.

      Actually, if we went around taking some kind of interval‐based measurements (e.g. length, weight, concentration) from samples of any type of biological materials and plotted them in a histogram, we would find this shape almost everywhere. This pattern is so repetitive that it has been compared to familiar shapes, like bells or Napoleon hats.

      In other circumstances, outside the world of mathematics, people would say that we have here some kind of natural phenomenon. It seems as if some law, of physics or whatever, dictates the rules that variation must follow. This would imply that the variation we observe in everyday life is not chaotic in nature, but actually ruled by some universal law. If this were true, and if we knew what that law says, perhaps we could understand why, and especially how, variation appears.

Graph depicts the frequency distributions of some biological variables.

      So, what would be the nature of that law and is it known already? Yes it is, and it is actually very easy to understand how it works. Let us conduct a little experiment to see if we can create something whose values have a bell‐shaped distribution.

      We can, and the result is also presented in Figure 1.24. We simply write down all the possible combinations of values of the four equal variables and see in each case what the value of the fifth variable is. If all four variables have value 1, then the fifth variable will have value 4. If three variables have value 1 and one has value 2, then the fifth variable will have value 5. This may occur in four different ways – either the first variable had the value 2, or the second, or the third, or the

Скачать книгу