Statistics for HCI. Alan Dix

Чтение книги онлайн.

Читать онлайн книгу Statistics for HCI - Alan Dix страница 13

Statistics for HCI - Alan Dix Synthesis Lectures on Human-Centered Informatics

Скачать книгу

within a few points of each other you really have no idea who will win.

      For those who’d like to understand the detailed stats for this (skip if you don’t!) …

      These three cases are simplified forms of the precise mathematical formula for the variance of a Binomial distribution np(1 –p), where n is the number in the sample and p the true population proportion for the thing you are measuring. When you are dealing with fairly small proportions the 1 –p term is close to 1, so the whole variance is close to np, that is the number with the given value. You then take the square root to give the standard deviation. The factor of 2 is because about 95% of measurements fall within 2 standard deviations. The reason this becomes 1.5 in the middle is that you can no longer treat (1 –p) as nearly 1, and for p = 0.5, this makes things smaller by square root of 0.5, which is about 0.7. Two times 0.7 is (about) one and half (I did say quick and dirty!).

      However, for survey data, or indeed any kind of data, these calculations of variability are in the end far less critical than ensuring that the sample really does adequately measure the thing you are after.

image

      Figure 2.3: Monty Hall problem—Should you swap doors? (source: https://en.wikipedia.org/wiki/Monty_Hall_problem#/media/File:Monty_open_door.svg.

      Is it fair?—Has the way you have selected people made one outcome more likely. For example, if you do an election opinion poll of your Facebook friends, this may not be indicative of the country at large!

      For surveys, has there been self-selection?—Maybe you asked a representative sample, but who actually answered? Often you get more responses from those who have strong feelings about the issue. For usability of software, this probably means those who have had a problem with it.

      Have you phrased the question fairly?—For example, people are far more likely to answer “Yes” to a question, so if you ask “do you want to leave?” you might get 60% saying “yes” and 40% saying “no,” but if you asked the question in the opposite way “do you want to stay?,” you might still get 60% saying “yes.”

      We will discuss these kinds of issue in greater detail in Chapter 11.

      Simple techniques can help, but even mathematicians can get it wrong.

      It would be nice if there was a magic bullet to make all of probability and statistics easy. I hope this book will help you make more sense of statistics, but there will always be difficult cases—our brains are just not built for complex probabilities. However, it may help to know that even experts can get it wrong!

      We’ll look now at two complex issues in probability that even mathematicians sometimes find hard: the Monty Hall problem and DNA evidence. We’ll also see how a simple technique can help you tune your common sense for this kind of problem. This is not the magic bullet, but it may sometimes help.

      There was a quiz show in the 1950s where the star prize was a car. After battling their way through previous rounds the winning contestant had one final challenge. There were three doors, behind one of which was the prize car, but behind each of the other two was a goat.

      The contestant chose a door, but to increase the drama of the moment, the quizmaster did not immediately open the chosen door. Instead, they opened one of the others. The quizmaster, who knew which was the winning door, would always open a door with a goat behind. The contestant was then given the chance to change their mind. Imagine you are the contestant. What do you think you should do?

      • Should you stick with the original choice?

      • Should you change to the remaining unopened door?

      • Or, doesn’t it make any difference?

      Although there is a correct answer, there are several apparently compelling arguments in either direction:

      One argument is that, as there were originally three closed doors, the chance of the car being behind the door you chose first was 1 in 3, whereas now that there are only two closed doors to choose from, the chance of it being behind the one you didn’t choose originally is 1 in 2, so you should change. However, the astute may have noticed that this is a slightly flawed probabilistic argument, as the probabilities don’t add up to one.

      A counter argument is that at the end there are two closed doors, so the chances are even as to which has the car behind it, and hence there is no advantage to changing.

      An information theoretic argument is similar—the remaining closed doors hide the car equally before and after the other door has been opened: you have no more knowledge, so why change your mind?

      Even mathematicians and statisticians can argue about this, and when they work it out by enumerating the cases, they do not always believe the answer. It is one of those cases where common sense simply does not help … even for a mathematician!

      Before revealing the correct answer, let’s have a thought experiment.

      Imagine if instead of three doors there were a million doors. Behind 999,999 doors are goats, but behind the one lucky door there is a car.

      I am the quizmaster and ask you to choose a door. Let’s say you choose door number 42. Now I now open 999,998 of the remaining doors, being careful to only open doors that hide goats. You are left with two doors, your original choice and the one door I have not opened. Do you want to change your mind?

image

      Figure 2.4: Monty Hall with a million doors?

      This time it is pretty obvious that you should change. There was virtually no chance of you having chosen the right door to start with, so it was almost certainly (999,999 out of a million) one of the others—I have helpfully discarded all the rest so the remaining door I didn’t open is almost certainly the correct one.

      It is as if, before I opened the 999,998 ‘goat’ doors, I’d asked you, “do you think the car is precisely behind door 42, or any of the others?”

      In fact, exactly the same reasoning holds for three doors. In that case there was a 2/3 chance that the car was behind one of the two doors you did not choose, and as the quizmaster I discarded one of those, the one that hid a goat. So it is twice as likely as your original choice that the car is behind the door I did not open. Regarding the information theoretic argument: the act of opening the goat door does add information because the quizmaster knows which door hides the car, and only opens a goat door. However, it still feels a bit like smoke and mirrors with three doors, even though the million-door version is obvious.

      Using the extreme case helps tune your common sense, often allowing you to see flaws in mistaken arguments, or work out the true explanation. It is not an infallible heuristic (sometimes arguments do change with scale), but it is often helpful.

Скачать книгу