Statistics for HCI. Alan Dix
Чтение книги онлайн.
Читать онлайн книгу Statistics for HCI - Alan Dix страница 6
There are some excellent books on advanced statistical techniques within HCI: Robertson and Kaptein’s collection Modern Statistical Methods for HCI [62] and Cairns’ Doing Better Statistics in Human–Computer Interaction [9]. This book is intended to complement these, allowing you to follow statistical arguments without necessarily knowing how to perform each of the analyses yourself, and, if you are using more advanced techniques, to understand them more thoroughly.
This book arose from a course on “Understanding Statistics” at CHI 2017, which itself drew on earlier short courses and tutorials from 20 years before. The fundamentals of statistics changed little in those 20 years; indeed, I could and should have written this book then. However, there have been two main developments, which have intensified both the need and the timeliness. The first is the increased availability, usability, and power of statistical tools such as R. These make it so much easier to apply statistics but can also lead to a false sense of security when complex methods are applied without understanding their purpose, assumptions and limitations. The second change has been a growing publicity about the problems of badly applied statistics—the ‘statistical crisis’: topics that were once only discussed amongst professional statisticians are now a matter of intense debate on the pages of Nature and in the halls of CHI. Again, this awareness is a very positive step but comes with the danger that HCI researchers and UX practitioners may reach for new forms of statistics with even less understanding and greater potential for misuse. Even worse, the fear of doing it wrong may lead some to avoid using statistics where appropriate or excuse abandoning it entirely.
We are in a world where big data rules, and nowhere more than in HCI where A–B testing and similar analysis of fine-grained logging means that automated analysis appears to be overtaking design expertise. To make sense of big data as well as the results of smaller laboratory experiments, surveys or field studies, it is essential that we are able to make sense of the statistics necessary to interpret quantitative data and to understand the limitations of numbers and how quantitative and qualitative methods can work together.
By the end of the book, you should have a richer understanding of: the nature of random phenomena and different kinds of uncertainty; the different options for analysing data and their strengths and weaknesses; ways to design studies and experiments to increase ‘power’—the likelihood of successfully uncovering real effects; and the pitfalls to avoid and issues to consider when dealing with empirical data. I hope that you will be better equipped to understand reports, data, and academic papers that use statistical techniques and to critically assess the validity of their results and how they may apply to your own practice or research. Most importantly, you will be better placed to design studies that efficiently use available resources and appropriately, effectively, and reliably analyse the results.
INTENDED READERSHIP
This book is intended for both experienced researchers and students who have already engaged, or intend to engage, in quantitative analysis of empirical data or other forms of statistical analysis. It will also be of value to practitioners using quantitative evaluation. There will be occasional formulae, but the focus of the book is on conceptual understanding, not mathematical skills.
Alan Dix
April 2020
Acknowledgments
First, I would like to thank Fiona, my wife, for her ongoing support and for reading this manuscript with her customary detail, not least by highlighting my continual tendency to write ‘it’ and ‘this’ when it is not at all clear what they refer to. Thanks also to the reviewers whose constructive comments led to quite substantial changes to the structure of this book, attendees at various tutorials and courses over the years who have given feedback on earlier versions of this material—including Ben for pointing out various errors (including one very embarrassing one) in a late draft. The photo of me on the cover was taken by Daniel Parry, who managed to fit me in at short notice just before the country shut down due to COVID-19. Many thanks, of course, to all the staff at Morgan & Claypool, especially Diane, Tondo, and Christine and I’m sure many others who I don’t know by name but have contributed in many ways to ensuring this book is of the highest quality.
Finally, writing this under coronavirus lockdown, the importance of understanding quantitative data is reinforced. I would like to dedicate this book to the frontline workers across the world during this critical time; in the UK, especially the staff of the NHS, but also all those providing essential services—from pharmacists and workers in care homes, to supermarket checkout assistants and parcel deliverers. Looking at the UK income distribution in Section 4.13, it is sobering to think that many of those who are putting their health and lives on the line will have incomes at the lowest end of these graphs. Behind every number is a human life. We can either use statistics to distance ourselves from the harsh reality of life or as a window to expose the neglected and overlooked. I hope that this book can help you achieve the latter.
Alan Dix
April 2020
CHAPTER 1
Introduction
In this introductory chapter we consider:
• the nature of human cognition, which makes it hard to understand probability, and hence why we need formal statistics;
• whether you need to worry about statistics at all;
• the way statistics operates to offer us insight into the complexities of the world; and
• the different phases in research and software development and where different forms of qualitative and quantitative analysis are appropriate.
1.1 WHY ARE PROBABILITY AND STATISTICS SO HARD?
Do you find probability and statistics hard? If so, don’t worry, it’s not just you; it’s basic human psychology.
We have two systems of thought1: (i) subconscious reactions that are based on semiprobabilistic associations, and (ii) conscious thinking that likes to have one model of the world and is really bad at probability. This is why we need to use mathematics and other explicit techniques to help us deal with probabilities. Furthermore, statistics needs both this mathematics of probability and an appreciation of what it means in the real world. Understanding this means you don’t have to feel bad about finding stats hard, and also helps to suggest ways to make it easier.
1.1.1 IN TWO MINDS
Skinner’s famous experiments with pigeons (Fig. 1.1) showed how certain kinds of learning could be studied in terms of associations between stimuli and rewards. If you present a reward enough times with the behaviour you want, the pigeon will learn to do it even when the original reward no longer happens. The learning is semi-probabilistic in the sense that if rewards are more common the learning is faster, or if rewards and penalties both happen at different frequencies, then you get a level of trade-off in the learning. At a cognitive level one can think of strengths of association