A Gentle Introduction to Statistics Using SAS Studio. Ron Cody
Чтение книги онлайн.
Читать онлайн книгу A Gentle Introduction to Statistics Using SAS Studio - Ron Cody страница 2
Learn more about this author by visiting his author page at http://support.sas.com/cody. There you can download free book excerpts, access example code and data, read the latest reviews, get updates, and more.
Acknowledgments
There is only one name on the cover of this book—mine. However, that doesn’t mean I am the only person who put in long hours creating this book.
First, I would like to thank and acknowledge the fantastic work of Lauree Shepard, my acquisitions and developmental editor. Lauree provided technical support, coordinated the team of editors, the reviewers, cover artist, and most importantly, provided psychological support to me! This is the second book I have produced with Lauree, and she is a delight to work with. Thank you, Lauree!
Because this is a book about statistics and SAS, I had a team of technical reviewers who had expertise in either statistics, SAS, or both. Once again I need to give a huge shout-out to Paul Grant. I’m pretty sure Paul has reviewed every book I have published with SAS Press. Not only does he carefully read every word, he also runs all of the programs to be sure the programs that you can download from my SAS author site match the programs printed in the book. That is an amazing amount of work, and I don’t understand why he keeps coming back for more. I have known Jeff Smith for over 40 years and co-authored two books with him. Jeff makes sure that my sometimes loose discussion of statistical topics will not upset “real” statisticians. Holly Sweeney is both a statistician and SAS expert. I was so fortunate to have her carefully read every word of this book. Her critiques and comments really helped make this a better book. My last technical reviewer, Amy Peters, is a developer in the SAS Studio group and helped me in so many ways. It’s such a pleasure to have an expert like Amy ready to assist anytime I call or email. So, a hearty thanks to Paul, Jeff, Holly, and Amy!
There is a team made up of Denise Jones (technical publishing specialist), Robert Harris (graphics designer), Suzanne Morgen (copy editor), and Melissa Hannah (digital marketing specialist) who all played key roles in getting this book to press (or e-Book). It really takes a team to produce a book like this, especially when it needs to be available in print form and several different electronic media. Thank you all!
I already mentioned Robert Harris (graphics designer) but he needs special thanks for creating three different cover designs for me to choose from. I liked all three, but the one you see here was my favorite (and my wife’s favorite also).
Speaking of wives, thank you Jan for your support and for making me take a break once in a while. You even took my picture for the back cover!
Chapter 1: Descriptive and Inferential Statistics
Overview
Many people have a misunderstanding of what statistics entails. The trouble stems from the fact that the word “statistics” has several different meanings. One meaning relates to numbers such as batting averages and political polls. When I tell people that I’m a statistician, they assume that I’m good with numbers. Actually, without a computer I would be lost.
The other meaning, the topic of this book, is to describe collections of numbers such as test scores and to describe properties of these numbers. This subset of statistics is known as descriptive statistics. Another subset of statistics, inferential statistics, takes up a major portion of this book. One of the goals of inferential statistics is to determine whether your experimental results are “statistically significant.” In other words, what is the probability that the result that you obtained could have occurred by chance, rather than an actual effect?
Descriptive Statistics
I am sure every reader of this book is already familiar with some aspects of descriptive statistics. From early in your education, you were assigned a grade in a course, based on your average. Averages (there are several types) describe what statisticians refer to as measures of location or measures of central tendency. Most basic statistics books describe three indicators of location: the mean, median, and mode. To compute a mean, you add up all the numbers and divide by how many numbers you have. For example, if you took five tests and your scores were 80, 82, 90, 96 and 96, the mean would be (80 + 82 + 90 + 96 + 96)/5 or 88.8. To compute a median, you arrange the numbers in order from lowest to highest and then find the middle—half the numbers will be below the median and half of the numbers will be above the median. In the example of the five test scores (notice that they are already in order from lowest to highest), the median is 90. If you have an even number of numbers, one method of computing the median is to average the two numbers in the middle. The last measure of central tendency is called the mode. It is defined as the most common number. In this example, the mode is 96 because it occurs more than any other number. If all the numbers are different, the mode is not defined.
Besides knowing the mean or median (the mode is rarely used), you can also compute several measures of dispersion. Dispersion describes how spread out the numbers are. One very simple measure of dispersion is the range, defined as the difference between the highest and lowest value. In the test score example, the range is 96 – 80 = 16. This is not a very good indicator of dispersion because it is computed using only two numbers—the highest and lowest value.
The most common measure of dispersion is called the standard deviation. The computation is a bit complicated, but a good way to think about the standard deviation is that it is similar to the average amount each of the numbers differs from the mean, treating each of the differences as a positive number. The actual computation of a standard deviation is to take the difference of each number from the mean, square all the differences (that makes all the values positive), add up all the squared differences, divide by the number of values, minus one, and then take the square root of this value. Because this calculation is a lot of work, we will let the computer do the calculation rather than doing it by hand.
Figure 1.1 below shows part of the output from SAS when you ask it to compute descriptive statistics on the five test scores:
Figure 1.1: Example of Output from SAS Studio
This shows three measures of location and several measures of dispersion (labeled Variability in the output). The value labeled “Std Deviation” is the standard deviation described previously, and the range is the same value that you calculated. The variance is the standard deviation squared, and it is used in many of the statistical tests that we discuss in this book.
Descriptive statistics includes many graphical techniques such as histograms and scatter plots that you will learn about in the chapter on SAS Studio descriptive statistics.
Inferential Statistics
Let’s imagine an experiment where you want to test if drinking regular coffee has an effect on heart rate. You want to do this experiment because you believe caffeine might increase heart rate, but you are not sure. To start, you get some volunteers who are willing to drink regular coffee or decaf coffee and have