Innovations in Digital Research Methods. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Innovations in Digital Research Methods - Группа авторов страница 13

Innovations in Digital Research Methods - Группа авторов

Скачать книгу

example, the Extreme Science and Engineering Discovery Environment (XSEDE) www.xsede.org/web/guest/gateways-listing

      2 The Changing Social Science Data Landscape

      Kingsley Purdam

      Mark Elliot

      2.1 Introduction

      2.1.1 The Age of Data

      More than a century since the ground-breaking social surveys of Booth in London 1 and Rowntree in York in the UK, and the subsequent development of mass observation methods in the 1930s, we are now in an age of almost overwhelming volumes of data about many people’s attitudes, circumstances and behaviour. Such data extends from people’s views to images of them, their locations and movements, and their communications. The data is very diverse; it includes lifelong health and prescription records, genetic biomarker profiles and family histories, satellite images, digital passports and their use, databases from product warranty forms, consumption transactions, online browsing records, email and web communications, social media, and mobile phone use. As Berners-Lee and Shadbolt (2011:1) highlight, ‘data is the new raw material of the 21st Century’.

      Social science and the societies that it studies have entered the age of data, though not necessarily the age of data access. Nevertheless, access to this data is increasing; for example, administrative record data held by public bodies, including government departments, is being widened.2, 3 The term ‘big data’ has been much used to describe the data revolution and whilst a little simplistic as a concept it moves us forward from Sweeney’s (2001) discussion of the ‘information explosion’, insofar as it captures the growth in the collection and availability of information (for discussions see boyd and Crawford, 2012; Mayer-Schönberger and Cukier, 2013; O’Reilly Radar Team, 2011). Big data denotes volumes of data so large that they are kept in so-called data warehouses, which are digital data storage facilities often cutting across different national borders and data regulation regimes. It is the volume of data (when potentially information about all, or nearly all, of a particular population is included, as opposed to a sample), the variety of the variables, and the speed with which it can be discovered and accessed that open up new opportunities for research and methodological innovation (Mayer-Schönberger and Cukier, 2013; IBM, 2013).

      The term ‘big data’ is used differently by different authors, with some including orthodox or well-established forms of social science data, such as survey responses and focus group transcripts (Elliot et al., 2013).4 The new types of data can have very different origins and structures. Some might be collected primarily for research use, whilst other data might be produced as a secondary outcome to another activity, for example, buying a product online or posting views on a blog. Some of this new data has been around in some form and quantity for some time, but its use in social science research has been limited, perhaps because of access and infrastructural constraints, methodological uncertainties and a lack of interest in, or opportunity for, social research use (Elliot et al., 2013).

      In many ways, conceptually the term ‘big data’ fails to capture the all-encompassing nature of the socio-technical transformation that is upon us. Many people who use the term qualify it by stating that big data is not just about volume but also other features: that data can be captured, updated and analysed in (almost) real-time and that it can be linked through multiple data capture points and processes. However, such characterizations are not sufficient; they still express the notion of data as something we have, whereas the reality and scale of the data transformation is that data is now something we are becoming immersed and embedded in. We are generators of, but are also generated in, the data environment. Our behaviour is increasingly documented and collated. Instead of people being researched, they are the research. Hence, we use the term the age of data to capture the historical phase that large parts of society have now entered, and we use the term data environment (see Elliot et al. 2008 and 2010 for discussion of the term) to capture the reality of the new relationship between people and what is known about them. This can include a focus not only on explaining why something might have happened, but also on what is currently happening and is going to happen.

      If they are going to be used effectively for research, the new data types and large-scale datasets require new approaches to analysis and new skills for social scientists. After all, social science should be capable of producing testable hypotheses using robust research designs and data quality assurance measures even where new types of data are being used. Such data also has its limitations and is not always accessible for social science research use. Moreover, big data does not mean we all have access to the data or that we know everything. There is still a need for purpose-specific data and for approaches based on testing theories.

      In this chapter we consider some examples of the new types of social data, including their formats, content, meanings, and the changing relationship between people’s digital and non-digital identities. We use real world examples to explore how social science might utilize new types of data to understand social phenomena in new ways and from new perspectives. As well as the data itself, we consider access modalities and processes. It is clear that what is happening in the data environment will change not just how we do social science research but who does it, where it is done and, indeed, what research means. However, as a recent consultation (Elliot et al., 2013: 4) on the use of digital data by social scientists highlighted, some concerns have been raised:

      There is more data for social research but can people use it, under what conditions and do they know how to? (Social scientist, stakeholder interview, 2012)

      There is a growth of under-theorised empiricism in social science…uncritical use of data with limitations in coverage or definitions and the steering of research to things that happened to be measured. (Social scientist, survey, respondent, 2012)

      2.1.2 What is Data?

      Data is information or knowledge about an individual, object or event. Data can comprise numerical values, quantities of text, sounds or images, memories or perceptions. Often the concept of data suggests information that has a structure and which has been through some kind of processing.

      Many examples of new types of data have very different and sometimes unstructured formats, for example, tweets or documents released under a Freedom of Information (FOI) request. In order to develop our understanding of the changing data environment, we outline below a typology of different data types. This typology is based on the idea of data as knowledge but also in terms of each data item carrying with it implicit or explicit metadata, that is, data about the data item, such as its origin, ownership, terms of use and coverage. There are a variety of ways to consider the nature of data but here we combine the key issues into a single framework. We draw on work by Elliot et al. (2010) on behalf of the Office for National Statistics (ONS) in the UK, which examined the nature of public data, comparing information that is formally in the public domain, such as public administrative records (e.g., the Electoral Register, share holdings and professional occupation lists) and data that is informally in the public domain, such as that posted on the Internet (e.g., via Facebook and blogs). For a related discussion of what they term datafication, which refers to the process of recording and quantifying behaviour and events for analysis, see Mayer-Schönberger and Cukier (2013: 73).

      We develop our approach here to focus on

Скачать книгу