Innovations in Digital Research Methods. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Innovations in Digital Research Methods - Группа авторов страница 14

Innovations in Digital Research Methods - Группа авторов

Скачать книгу

rather than the actual type of data or whether the data is qualitative or quantitative. The issue of origin is interdependent with issues of data ownership, quality, access and use. A key aspect of this is the law and codes of practice around the recognition of what is ‘personal’ data. Under the UK Statistics and Registration Service Act (2007) (SRSA) personal information is defined as information which relates to and identifies a particular person (including a body corporate)’. Information identifies a particular person if the identity of that person – ‘(a) is specified in the information, (b) can be deduced from the information, or (c) can be deduced from the information taken together with any other published information’.5 The disclosure of personal information by public bodies, such as the ONS, is a criminal offence. For further information see the UK Anonymization Network6 and also a recent report by the Information Commissioner (ICO, 2012).

      In terms of the metadata of origin approach, we propose an eight-point typology based on the type of generation process involved. Given the complexity and changing nature of the data environment, it can be argued that mapping the data generation process is the only stable way of understanding the variety of data and for developing good practice around the use of different data types.

      2.1.3 Data Origin Typology

      1 Orthodox intentional data: Data collected and used with the respondent’s explicit agreement. All so-called orthodox social science data (e.g. survey, focus group or interview data and also data collected via observation) would come into this category. New orthodox methods continue to be developed.

      2 Participative intentional data: In this category data are collected through some interactive process. This includes some new data forms such as crowdsourced data (e.g. the Everyday Sexism project; see http://everydaysexism.com) and is a potential growth area.

      3 Consequential data: Information that is collected as a necessary transaction that is secondary to some (other) interaction (e.g. administrative records, electronic health records, commercial transaction data and data from online game playing all come into this category).

      4 Self-published data: Data deliberately self-recorded and published that can potentially be used for social science research either with or without explicit permission, given the information has been made public (e.g. long-form blogs, CVs and profiles).

      5 Social media data: Data generated through some public, social process that can potentially be used for social science research either with or without permission (e.g. micro-blogging platforms such as Twitter and Facebook, and, perhaps, online game data).

      6 Data traces: Data that is ‘left’ (possibly unknowingly) through digital encounters, such as online search histories and purchasing, which can be used for social science research either by default use agreements or with explicit permission.

      7 Found data: Data that is available in the public domain, such as observations of public spaces, which can include covert research methods.

      8 Synthetic data: Where data has been simulated, imputed or synthesized. This can be derived from, or combined with, other data types.

      We utilize this typology further in our discussions below, including the possible overlaps between the data origin types, and how the different types may be used, but we first focus in more detail on the changing nature of the data environment and social science research.

      2.2 The Social Science Data Present

      2.2.1 The Data Landscape

      It has been clear since the 1980s that the half century either side of the millennium would be characterized by an information revolution (Purdam et al., 2004; Sweeney, 2001). One key aspect of this is the massive increase not just in the amount of data but also in the types of data sources available and in the range of organizations and individuals collecting, storing and using data. For example, it is estimated that in 2014 there are 1.3 billion active Facebook accounts, 0.6 billion active Twitter accounts and 58 million tweets per day (Datablog, 2014).

      The growth in different data types, formats and coverage allow new approaches to social science research and evidence-based policy processes. It is perhaps useful to consider an example: the UK government has launched an initiative to measure the nation’s happiness and well-being using questions in the Integrated Household Survey.7 This intentional data gathering sample survey includes questions such as:

      How satisfied are you with your life nowadays? How happy did you feel yesterday? How anxious did you feel yesterday? To what extent do you feel the things you do in your life are worthwhile?

      The data is collected by professional fieldworkers and the survey takes over a year to complete. The sampling strategy enables inferences to be made to the UK population. At the same time, a university-based project in the UK is measuring happiness by texting a purposive, non-representative sample of volunteers who have signed up to be part of a mobile phone-based study.8 The participants are asked every few days how they feel on a scale and about who they are with, their location and what they are doing. They are also able to submit a photo should they wish to. From this almost real time and repeated response data, happiness maps can be produced. Other research techniques to measure happiness might be to analyse data from Twitter posts for a sense of happiness or to analyse search engine records for evidence of future planning which can be calibrated as proxies for happiness (again, though, based on non-representative samples). See, for example, Preis et al. (2012) who used Google Trends data in a cross-national study of orientation towards the future and optimism. The self-published, consequential and trace data forms are very different from data gathered as part of random sample surveys. However, all these data and methods for measuring happiness have different explanatory power and value.

      The opportunity for social science and policy makers is that citizens are – deliberately or consequentially – creating their own digital archives. This means that data generation with self-published, consequential and trace data is not a distinct (or costly) stage in the research process but is integral to the activity being undertaken. As we discuss below, such data can be collated, visualized and analysed in near real time, and updated continually. Citizens have the tools to document their own lives almost effortlessly and in more detail than ever before through access to monitoring technology and potential access to data about their health, movements and communications. See, for example, the development of so-called life logging and the Quantified Self.9 Data generation can also take the form of crowdsourced data, where collective intelligence and effort in the form of observations, data preparation tasks, idea generation and individual-level data are deposited and uploaded by volunteers, usually via the internet.10 Such data can also be collated automatically using software that captures information, including text and images on websites, to build databases. This can include collecting contact information, such as email and postal addresses, to produce samples for more traditional research methods such as surveys.11

      As social science researchers looking at the wealth of new types of data, we must be mindful of the famous aphorism: ‘the medium is the message’ (McLuhan, 1964). All data collection instruments, as we consider below, are subjective and performative media (although to differing extents).

Скачать книгу