Data Theory. Simon Lindgren
Чтение книги онлайн.
Читать онлайн книгу Data Theory - Simon Lindgren страница 7
Many popular examples exist for illustrating how datafication is growing exponentially intense, the most famous one being Moore’s Law, according to which computers and their memory and storage will become ever more powerful by each unit of time (Moore, 1965). Another telling comparison is this one: The Great Library of Alexandria, which was established in the third century BCE, was regarded as the centre of knowledge in the ancient world. It was believed to hold within it the sum total of all human knowledge. Its entire collection has been estimated by historians to have been the size of 1,200 million terabytes. Today however, we have enough data in the world to give more than 300 times as much data to each person alive (Cukier and Mayer-Schoenberger, 2013).
We are no doubt in the midst of an ongoing data explosion, and along with it the development of ‘data science’. Data science is an interdisciplinarily oriented specialisation at the intersection of statistics and computer science, focusing on machine learning and other forms of algorithmic processing of large datasets to ‘liberate and create meaning from raw data’ rather than on hypothesis testing (Efron and Hastie, 2016, p. 451). Data science is a successor to the form of ‘data analysis’ proposed by the statistician John W. Tukey, whose analytical framework focused on ‘looking at data to see what it seems to say’, making partial descriptions and trying ‘to look beneath them for new insights’. In his exploratory vein, Tukey (1977, p. v) also emphasised that this type of analysis was concerned ‘with appearance, not with confirmation’. This focus on mathematical structure and algorithmic thinking, rather than on inferential statistical justification, is a precursor to the flourishing of data science in the wake of datafication.
All the things that people do online in the context of social media generate vast volumes of sociologically interesting data. Such data have been approached in highly data-driven ways within the field of data science, where the aim is often to get a general picture of some particular social pattern or process. Being data-driven is not a bad thing, but there must always be a balance between data and theory – between information and its interpretation. This is where sociology and social theory come into the picture, as they offer a wide range of conceptual frameworks, theories, that can aid in the analysis and understanding of the large amounts and many forms of social data that are proliferated in today’s world.
But in those cases where we see big data being analysed, there is far too often a disconnect between the data and the theory. One explanation for this may be that the popularity and impact of data science makes its data-driven ethos spill over also into the academic fields that try to learn from it. This means that we risk forgetting about theoretical analysis, which may fade in the light of sparkling infographics.
It is my argument that the social research that relies heavily on the computational amassing and processing of data must also have a theoretical sensitivity to it. While purely computational methods are extremely helpful when wrangling the units of information, the meanings behind the messy social data which are generated in this age of datafication can be better untangled if we also make use of the rich interpretive toolkit provided by sociological theories and theorising. The data do not speak for themselves, even though some big data evangelists have claimed that to be the case (Anderson, 2008).
Big data and data science are partly technological phenomena, which are about using computing power and algorithms to collect and analyse comparatively large datasets of, often, unstructured information. But they are also most prominently cultural and political phenomena that come along with the idea that huge unstructured datasets, often based on social media interactions and other digital traces left by people, when paired with methods like machine learning and natural language processing, can offer a higher form of truth which can be computationally distilled rather than interpretively achieved.
Such mythological beliefs are not new, however, as there has long been, if not a hierarchy, at least a strict division of research methods within the cultural and social sciences, where some methods – those that have come to be labelled ‘quantitative’, and that analyse data tables with statistical tools – have been vested with an ‘aura of truth, objectivity, and accuracy’ (boyd and Crawford, 2012, p. 663). Other methods – those commonly named ‘qualitative’, and involving close readings of textual data from interviews, observations, and documents – are seen as more interpretive and subjective, rendering richer but also (allegedly) more problematic results. This book rests on the belief that this distinction is not only annoying, but also wrong. We can get at approximations of ‘the truth’ by analysing social and cultural patterns, and those analyses are by definition interpretive, no matter the chosen methodological strategy. Especially in this day and age where data, the bigger the better, are fetishised, it is high time to move on from the unproductive dichotomy of ‘qualitative’ versus ‘quantitative’.
Data theory
Pure data science tends to focus very strongly simply on what is researchable. It goes for the issues for which there are data, no matter if those issues have any real-life urgency or not. The last decade has seen parts of the field of data science and parts of the social sciences become entangled in ways that risk a loss of theoretical grounding. In a seminal paper outlining the emerging discipline of ‘computational social science’, David Lazer and colleagues wrote that:
We live life in the network. We check our e-mails regularly, make mobile phone calls from almost any location, swipe transit cards to use public transportation, and make purchases with credit cards. Our movements in public places may be captured by video cameras, and our medical records stored as digital files. We may post blog entries accessible to anyone, or maintain friendships through online social networks. Each of these transactions leaves digital traces that can be compiled into comprehensive pictures of both individual and group behavior, with the potential to transform our understanding of our lives, organizations, and societies.
(Lazer et al., 2009, p. 721)
Furthermore, they argued that there was an inherent risk in the fact that existing social theories were ‘built mostly on a foundation of one-time “snapshot” data’ and that they therefore may not be fit to explain the ‘qualitatively new perspectives’ on human behaviour offered by the ‘vast, emerging data sets on how people interact’ (Lazer et al., 2009, p. 723). While I agree that social analysis must be re-thought in light of these developments, I am not so sure that it is simply about ‘compiling’ the data, and then being prepared that existing theories may no longer work. Rather, I argue, we should trust a bit more that even though the size and dynamics of the data may be previously unseen, the social patterns that they can lay bare – if adequately analysed – can still largely be interpreted with the help of ‘old’ theories, and with an ‘old’ approach to theorising. After all, theories are not designed to understand particular forms of data, but instead the sociality to which they bear witness.
My point is that data need theory, for considering both the data, the methods, the ethics, and the results of the research. By extension, still, theories may always need to be updated, revised, discarded, or newly invented – but that has always been true. As such, this book is therefore positioned within the broad field of ‘digital sociology’ as outlined by authors such as Deborah Lupton (2014) and Noortje Marres (2017). One strand within the debate about what digital sociology is, and what it entails, relates to the emergence of ‘digital methods’. In general, there is widespread disagreement about what such methods are, and whether there should be a focus on continuity with established social research traditions, or on revolutionary innovation. In a sense, this book can be read as one out of many possible ventures in the direction pointed out by Noortje Marres when she writes:
The digitization of social life and social research opens up anew long-standing questions about the relations between different methodological traditions in social enquiry: what are the defining methods of sociological research? Are some methods better attuned to digital environments, devices and practices than others? Do interpretative