Statistics for HCI. Alan Dix

Чтение книги онлайн.

Читать онлайн книгу Statistics for HCI - Alan Dix страница 14

Statistics for HCI - Alan Dix Synthesis Lectures on Human-Centered Informatics

Скачать книгу

DNA evidence is just such an example. Although each person’s DNA is almost unique, DNA testing is imperfect and has the possibility of error.

      Suppose there has been a murder, and remains of DNA have been found on the scene. The lab DNA matching has an accuracy of one in 100,000.

      Imagine two scenarios.

      Case 1: Shortly prior to the body being found, the victim had been known to have had a violent argument with a friend. The police match the DNA of the friend with DNA found at the murder scene. The friend is arrested and taken to court.

      Case 2: The police look up the DNA in the national DNA database and find a positive match. The matched person is arrested and taken to court.

      Similar cases have occurred and led to convictions based heavily on the DNA evidence. However, while in case 1 the DNA is strong corroborating evidence, in case 2 it is not. Yet courts, guided by ‘expert’ witnesses, have not understood the distinction and convicted people in situations like case 2. Belatedly, the problem has been recognised and in the UK there have been a number of appeals where longstanding cases have been overturned, sadly not before people have spent considerable periods behind bars for crimes they did not commit. One can only hope that similar evidence has not been crucial in jurisdictions with a death penalty.

      If you were the judge or jury in such a case would the difference be obvious to you?

      If not, we can use a similar trick to the one we used in the Monty Hall problem. There, we made the numbers a lot bigger; here we will make the numbers less extreme. Instead of a 1 in 100,000 chance of a false DNA match, let’s make it 1 in 100. While this is still useful, though not overwhelming, corroborative evidence in case 1, it is pretty obvious that if there are more than a few hundred people in the police database, then you are bound to find a match.

      It is as if a red Fiat Uno had been spotted outside the victim’s house. If the friend’s car was a red Fiat Uno it would be good additional circumstantial evidence, but simply arresting any red Fiat Uno owner would clearly be silly.

      If we return to the original 1 in 100,000 figure for a DNA match, it is the same. If there are more than a few hundred thousand people in the database then you are almost bound to find a match. This might be a way to find people you might investigate by looking for other evidence, indeed that’s the way several cold cases have been solved over recent years, but the DNA evidence would not in itself be strong.

      In summary, some diverting puzzles and also some very serious problems involving probability can be very hard to understand. Our common sense is not well tuned to probability. Even trained mathematicians can get confused, which is one of the reasons we turn to formulae and calculations. However, we saw that changing the scale of numbers in a problem can sometimes help your common sense to understand them.

      CHAPTER 3

       Properties of randomness

      We’ve seen how wild random phenomena can be; however, this does not mean they cannot be understood and at least partially tamed.

      When you take a measurement, whether it is the time for someone to complete a task using some software, or a preferred way of doing something, you are using that measurement to find out something about the ‘real’ world—the average time for completion, or the overall level of preference amongst your users.

      Two of the core things you need to know about are bias (is it a fair estimate of the real value) and variability (how likely is it to be close to the real value). Are your results fair and are they reliable?

      The word ‘bias’ in statistics has a precise meaning, but it is very close to its day-to-day meaning. Bias is about systematic effects that skew your results in one way or another. In particular, if you use your measurements to predict some real-world effect, is that effect likely to over-or under-estimate the true value? In other words, is it a fair estimate.

      Say you take 20 users, and measure their average time to complete some task. You then use that as an estimate of the ‘true’ value, the average time to completion of all your users. Your particular estimate may be low or high (as we saw with the coin tossing experiments). However, if you repeated that experiment very many times would the average of your estimates end up being the true average?

      If the complete user base were employees of a large company, and the company forced them to engage in your study, you could randomly select your 20 users, and in that case, yes, the estimate based on the users would be unbiased.1

      However, imagine you are interested in the popularity of Ariana Grande and issued a survey on a social network as a way to determine this. The effects would be very different depending on whether you chose to use LinkedIn or TikTok. No matter how randomly you select users from LinkedIn, they are probably not representative of the population as a whole, so you would end up with a biased estimate of Grande’s popularity.2

      However, the good news is that sometimes it is possible to model bias and correct for it. For example, you might ask questions about age or other demographics and then use known population demographics to add weight to groups under-represented in your sample … although I doubt this would work for the Ariana Grande example: if there are 15-year-old members of LinkedIn, they are unlikely to be typical 15-year-olds!

      Конец ознакомительного фрагмента.

      Текст предоставлен ООО «ЛитРес».

      Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.

      Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.

/9j/4R8yRXhpZgAATU0AKgAAAAgABwESAAMAAAABAAEAAAEaAAUAAAABAAAAYgEbAAUAAAABAAAA agEoAAMAAAABAAIAAAExAAIAAAAeAAAAcgEyAAIAAAAUAAAAkIdpAAQAAAABAAAApAAAANAALcbA AAAnEAAtxsAAACcQQWRvYmUgUGhvdG9zaG9wIENTNiAoV2luZG93cykAMjAyMDowNDoxNiAxNDoz OTo1OAAAA6ABAAMAAAABAAEAAKACAAQAAAABAAAIw6ADAAQAAAABAAAK2QAAAAAAAAAGAQMAAwAA AAEABgAAARoABQAAAAEAAAEeARsABQAAAAEAAAEmASgAAwAAAAEAAgAAAgEABAAAAAEAAAEuAgIA BAAAAAEAAB38AAAAAAAAAEgAAAABAAAASAAAAAH/2P/tAAxBZG9iZV9DTQAB/+4ADkFkb2JlAGSA AAAAAf/bAIQADAgICAkIDAkJDBELCgsRFQ8MDA8VGBMTFRMTGBEMDAwMDAwRDAwMDAwMDAwMDAwM DAwMDAwMDAwMDAwMDAwMDAENCwsNDg0QDg4QFA4ODhQUDg4ODhQRDAwMDAwREQwMDAwMDBEMDAwM DAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwM/8AAEQgAoACBAwEiAAIRAQMRAf/dAAQACf/EAT8AAAEF AQEBAQEBAAAAAAAAAAMAAQIEBQYHCAkKCwEAAQUBAQEBAQEAAAAAAAAAAQACAwQFBgcICQoLEAAB BAEDAgQCBQcGCAUDDDMBAAIRAwQhEjEFQVFhEyJ

Скачать книгу