An Introduction to Text Mining. Gabe Ignatow

Чтение книги онлайн.

Читать онлайн книгу An Introduction to Text Mining - Gabe Ignatow страница 12

Автор:
Серия:
Издательство:
An Introduction to Text Mining - Gabe Ignatow

Скачать книгу

or indirectly impacted by the research, there is considerable evidence that even “anonymized” data sets contain personal information that allows the individuals who produced it to be identified. Researchers continue to debate how to adequately protect individuals when working with such data sets (e.g., Narayanan & Shmatikov, 2008, 2009; Sweeney, 2003). These debates are important because they are concerned with the fundamental ethical principle of minimizing harm; the connection between a person’s online data and his or her physical person could possibly lead to psychological, economic, or even physical, harm. Thus, as a researcher, you must consider whether your data can possibly be linked back to the people who produced it and whether there are scenarios in which this link could cause them harm.

      Professional research associations such as the British Psychological Society (www.bps.org.uk/system/files/Public%20files/inf206-guidelines-for-internet-mediated-research.pdf) and American Psychological Association (APA; www.apa.org/science/leadership/bsa/internet/internet-report.aspx) have developed their own reports and ethical guidelines for online research. But because not all professional research associations have developed their own guidelines, it is critical that you submit your research proposal to your IRB for review before collecting or analyzing data.

      Institutional Review Boards

      IRBs are university committees that approve, monitor, and review behavioral and biomedical research involving humans. Within higher education institutions ethical approval is required from a university-level ethics committee for any research involving human participants. IRBs and other university ethics committees continue to develop and revise standards to keep up with evolving social media and big data technologies.

      Since the 1990s, a consensus has emerged that the study of computer-mediated and Internet-based communication often requires that IRBs modify their human subjects principles and research ethics policies. Such modifications are necessary because in online environments it is often impossible to gain the consent of research participants (Sveningsson, 2003), and there is often an expectation of public exposure by users. Researchers and ethics professionals who write and revise university research ethics policies continue to grapple with several issues that we address next, including privacy, informed consent, manipulation of human subjects, and publishing ethics.

      Privacy

      In 1996, the Internet researchers Sudweeks and Rafaeli argued that social scientists should treat “public discourse on computer-mediated communication as just that: public” and that, therefore, “such study is more akin to the study of tombstone epitaphs, graffiti, or letters to the editor. Personal? Yes. Private? No” (p. 121). Sudweeks and Rafaeli’s position may be convenient for the practice of research, but it has proved not to always be sufficient for research using data from contemporary social media platforms. In many cases there is a lack of consensus about whether people who have posted messages on the Internet should be considered “participants” in research or whether research that uses their messages as data should be viewed as involving the analysis of secondary data that already existed in the public domain.

      Some researchers have argued that publicly available data carry no expectation of privacy, while many researchers who have carried out studies of online messages (e.g., Attard & Coulson, 2012; Coulson, Buchanan, & Aubeeluck, 2007) have deemed the data to be in the public domain yet have sought IRB approval from within their own institutions anyway.

      A number of Internet researchers have concluded that where data can be accessed without site membership, such data can be considered as public domain (Attard & Coulson, 2012; Haigh & Jones, 2005, 2007). Therefore, if data can be accessed by anyone, without website registration, it would be reasonable to consider the data to be within the public domain of the Internet.

      There appears to be agreement that websites that require registration and password-protected data should be considered private domain (Haigh & Jones, 2005) because users posting in password-protected websites are likely to have expectations of privacy. Websites that require registration are often copyrighted, which raises a legal issue of ownership of the data and whether posts and messages can be legally and ethically used for research purposes.

      The Cornell–Facebook study is widely seen as having invaded the privacy of Facebook users. Some websites and social media platforms have privacy policies that set expectations for users’ privacy, and these can be used by researchers as guidelines for whether it is ethical to treat the site’s data as in the public domain or else whether informed consent may be required. But in most cases, such guidelines are insufficient and at best provide minimum standards that may not meet the standards set by universities’ IRBs. For example, the European Union has stringent privacy laws that may have been violated by the Facebook study. Adding to the difficulties for researchers attempting to follow privacy laws, it is unclear whether laws governing data protection are the laws in the jurisdiction where research participants reside, the jurisdiction where the researchers reside, the jurisdiction of the IRB, the location of the server, the location where the data are analyzed, or some combination of these.

      Because acquiring users’ textual data from online sources is a passive method of information gathering that generally involves no interaction with the individual about whom data are being collected, for the most part text mining research is not as ethically challenging as experiments and other methods that involve recruiting participants and that may involve deception. Nevertheless, universities’ IRBs are increasingly requiring participant consent (see the next section) in cases where users can reasonably expect that their online discussions will remain private. At the very least, in almost all cases social scientists are required to anonymize (use pseudonyms for) users’ user names and full names.

      It has also been suggested that although publicly available online interactions exist within the public domain, site members may view their online interactions as private. Hair and Clark (2007) have warned researchers that members of online communities often have no expectation that their discussions are being observed and may not be accepting of being observed.

      In order for text mining research using user-generated data to progress, researchers must make several determinations. First, they must use all available evidence to determine whether the data should be considered to be in the public or private domain. Second, if data are in the public domain, the researcher must determine whether users have a reasonable expectation of privacy. In order to make these determinations, researchers should note whether websites, apps, and other platforms require member registration and whether they include privacy policies that specify users’ privacy expectations.

      Informed Consent

      Informed consent refers to the process by which individuals explicitly agree to be participants in a research project based on a comprehensive understanding of what will be required of them. The Belmont Report (discussed previously) identified three elements of informed consent: information, comprehension, and voluntariness. The principle of respect for persons implies that participants should be presented with relevant information in a comprehensible format and then should voluntarily agree to participate. Participants in research projects who have given their informed consent are not expected to be informed of a study’s theories or hypotheses, but they are expected to be informed of what data the researcher will be collecting and what will happen to that data as well as of their rights to withdraw from the research.

      Informed consent is a core principle of human research ethics established in the aftermath of the Second World War. In important cases where the question is deemed vital and consent isn’t possible (or would

Скачать книгу