Data Theory. Simon Lindgren

Чтение книги онлайн.

Читать онлайн книгу Data Theory - Simon Lindgren страница 7

Data Theory - Simon Lindgren

Скачать книгу

that are more true than what can be achieved through other data and methods.

      We are no doubt in the midst of an ongoing data explosion, and along with it the development of ‘data science’. Data science is an interdisciplinarily oriented specialisation at the intersection of statistics and computer science, focusing on machine learning and other forms of algorithmic processing of large datasets to ‘liberate and create meaning from raw data’ rather than on hypothesis testing (Efron and Hastie, 2016, p. 451). Data science is a successor to the form of ‘data analysis’ proposed by the statistician John W. Tukey, whose analytical framework focused on ‘looking at data to see what it seems to say’, making partial descriptions and trying ‘to look beneath them for new insights’. In his exploratory vein, Tukey (1977, p. v) also emphasised that this type of analysis was concerned ‘with appearance, not with confirmation’. This focus on mathematical structure and algorithmic thinking, rather than on inferential statistical justification, is a precursor to the flourishing of data science in the wake of datafication.

      All the things that people do online in the context of social media generate vast volumes of sociologically interesting data. Such data have been approached in highly data-driven ways within the field of data science, where the aim is often to get a general picture of some particular social pattern or process. Being data-driven is not a bad thing, but there must always be a balance between data and theory – between information and its interpretation. This is where sociology and social theory come into the picture, as they offer a wide range of conceptual frameworks, theories, that can aid in the analysis and understanding of the large amounts and many forms of social data that are proliferated in today’s world.

      It is my argument that the social research that relies heavily on the computational amassing and processing of data must also have a theoretical sensitivity to it. While purely computational methods are extremely helpful when wrangling the units of information, the meanings behind the messy social data which are generated in this age of datafication can be better untangled if we also make use of the rich interpretive toolkit provided by sociological theories and theorising. The data do not speak for themselves, even though some big data evangelists have claimed that to be the case (Anderson, 2008).

      Big data and data science are partly technological phenomena, which are about using computing power and algorithms to collect and analyse comparatively large datasets of, often, unstructured information. But they are also most prominently cultural and political phenomena that come along with the idea that huge unstructured datasets, often based on social media interactions and other digital traces left by people, when paired with methods like machine learning and natural language processing, can offer a higher form of truth which can be computationally distilled rather than interpretively achieved.

      Pure data science tends to focus very strongly simply on what is researchable. It goes for the issues for which there are data, no matter if those issues have any real-life urgency or not. The last decade has seen parts of the field of data science and parts of the social sciences become entangled in ways that risk a loss of theoretical grounding. In a seminal paper outlining the emerging discipline of ‘computational social science’, David Lazer and colleagues wrote that:

      We live life in the network. We check our e-mails regularly, make mobile phone calls from almost any location, swipe transit cards to use public transportation, and make purchases with credit cards. Our movements in public places may be captured by video cameras, and our medical records stored as digital files. We may post blog entries accessible to anyone, or maintain friendships through online social networks. Each of these transactions leaves digital traces that can be compiled into comprehensive pictures of both individual and group behavior, with the potential to transform our understanding of our lives, organizations, and societies.

      (Lazer et al., 2009, p. 721)

      My point is that data need theory, for considering both the data, the methods, the ethics, and the results of the research. By extension, still, theories may always need to be updated, revised, discarded, or newly invented – but that has always been true. As such, this book is therefore positioned within the broad field of ‘digital sociology’ as outlined by authors such as Deborah Lupton (2014) and Noortje Marres (2017). One strand within the debate about what digital sociology is, and what it entails, relates to the emergence of ‘digital methods’. In general, there is widespread disagreement about what such methods are, and whether there should be a focus on continuity with established social research traditions, or on revolutionary innovation. In a sense, this book can be read as one out of many possible ventures in the direction pointed out by Noortje Marres when she writes:

      The digitization of social life and social research opens up anew long-standing questions about the relations between different methodological traditions in social enquiry: what are the defining methods of sociological research? Are some methods better attuned to digital environments, devices and practices than others? Do interpretative

Скачать книгу