An Introduction to Text Mining. Gabe Ignatow

Чтение книги онлайн.

Читать онлайн книгу An Introduction to Text Mining - Gabe Ignatow страница 11

Автор:
Серия:
Издательство:
An Introduction to Text Mining - Gabe Ignatow

Скачать книгу

institutional review board. The IRB would have gone through the proposal in detail, making sure that the participants gave a level of informed consent appropriate to the design of the study and the risks of the research. What’s more, the researchers and their colleagues would share a sense of professional ethics, taking into account respect for participants, balancing risks and benefits, and the integrity of the research process itself. The process is slow but careful; it is deliberately, explicitly, extensively public-spirited.

      There were also legal concerns. It is unclear whether the population sample used in the study contained people under 18 years of age or people from outside the United States who may be subject to different levels of scrutiny in a research study than are U.S. citizens.

      Many researchers have defended the study, pointing out that Facebook and other Internet companies routinely perform such studies for their own benefit or as part of social experiments. For instance, Newsweek reported that the popular online dating site OkCupid has engaged in manipulation of user-generated site content for years. The president of OkCupid observed that “if you use the Internet, you’re the subject of hundreds of experiments at any given time, on every site” (Wofford, 2014). Also defending the study, the bioethicist Meyer (2014) argued that the sort of research done on the Facebook data were scientifically important and that the scientific community should not respond to it in such a way as to drive such research underground or discourage companies from joining forces with social scientists in the future.

      Facebook claims to have revised its ethics guidelines since the emotion study was conducted and has proposed that studies now undergo three internal reviews, including one centered on privacy for user data. Regardless of its ultimate effects on social science research, the Facebook study certainly provides an opportunity to carefully consider the ethics of text mining research. In the remainder of this chapter, we consider the most critical ethical issues that were brought to the fore in the Cornell–Facebook study and that must be addressed in any text mining study, including the cornerstones of humans subjects research—respect for persons, beneficence, and justice; ethical guidelines; IRBs; privacy; informed consent; and manipulation. We also review ethical issues involved in authorship and publishing.

      Respect for Persons, Beneficence, and Justice

      One cornerstone of modern research ethics is the Belmont Report, which was commissioned by the U.S. government in response to ethical failures in medical research and published in 1979. Written by a panel of experts, the Belmont Report has three principles that should underlie the ethical conduct of research involving human subjects: respect for persons, beneficence, and justice. These principles were later operationalized into the rules and procedures of the Common Rule, which governs research at U.S. universities (www.hhs.gov/ohrp/regulations-and-policy/regulations/common-rule). In the Belmont Report, respect for persons consists of two principles: that individuals should be treated as autonomous and that individuals with diminished autonomy are entitled to additional protections. This is interpreted to mean that researchers should, if possible, receive informed consent from participants (informed consent is discussed later in the chapter). Beneficence can be understood to mean having the interests of research participants in mind. This principle requires that researchers minimize risks to participants and maximize benefits to participants and society. The principle of justice addresses the distribution of the costs and benefits of research so that one group in society does not bear the costs of research while another group benefits. Issues of justice tend to relate to questions about the selection of participants.

      Today, the Belmont Report continues as an essential reference for IRBs that review research proposals involving human subjects, in order to ensure that the research meets the ethical foundations of the regulations (IRBs are discussed later in this chapter). It also serves as a reference for ethical guidelines developed by professional associations, including associations whose members work with Internet data.

      Ethical Guidelines

      Influenced by the Belmont Report, but also by the special challenges of performing research on human subjects online, many professional associations have published guidelines for ethical decision making in online research. One influential set of guidelines was published by the Association of Internet Researchers (AoIR) in 2002 and again in 2012 (http://aoir.org/ethics). The original 2002 AoIR guidelines discuss issues pertaining to informed consent and the ethical expectations of online users. The group’s more recent 2012 guidelines draw particular attention to three areas that need to be negotiated by researchers using user-generated online data: the concept of human subjects, public versus private online spaces, and data or persons. The 2012 guidelines do not prescribe a set of dos and don’ts but instead recommend a series of questions for researchers to consider when thinking about the ethical dimensions of their study.

      For human subjects, the AoIR guidelines state as a key guiding principle that because “all digital information at some point involves individual persons, consideration of principles related to research on human subjects may be necessary even if it is not immediately apparent how and where persons are involved in the research data.” However, while the term human subject persists as a guiding concept for ethical social research, in Internet research this gets a bit tricky:

      “Human subject” has never been a good fit for describing many internet-based research environments. Ongoing debates among our community of scholars illustrate a diverse, educated range of standpoints on the answers to the question of what constitutes a “human subject.” We agree with other regulatory bodies that the term no longer enjoys the relatively straightforward definitional status it once did. As a community of scholars, we maintain the stance that when considered outside a regulatory framework, the concept of “human subject” may not be as relevant as other terms such as harm, vulnerability, personally identifiable information, and so forth. We encourage researchers to continue vigorous and critical discussion of the concept of “human subject,” both as it might be further specified in internet related research or as it might be supplanted by terms that more appropriately define the boundaries for what constitutes inquiry that might be ethically challenging. (p. 6)

      A second major consideration in the AoIR ethics guidelines is the idea of public versus private data. While privacy is a concept that must include a consideration of expectations and consensus, a “clearly recognizable boundary” between public and private does not exist:

      Individual and cultural definitions and expectations of privacy are ambiguous, contested, and changing. People may operate in public spaces but maintain strong perceptions or expectations of privacy. Or, they may acknowledge that the substance of their communication is public, but that the specific context in which it appears implies restrictions on how that information is—or ought to be—used by other parties. Data aggregators or search tools make information accessible to a wider public than what might have been originally intended. (p. 7)

      The third consideration or tension in the AoIR guidelines is that between data and persons. The report’s authors noted the following:

      The internet complicates the fundamental research ethics question of personhood. Is an avatar a person? Is one’s digital information an extension of the self? In the U.S. regulatory system, the primary question has generally been: Are we working with human subjects or not? If information is collected directly from individuals, such as an email exchange, instant message, or an interview in a virtual world, we are likely to naturally define the research scenario as one that involves a person.

      For example, if you are working with a data set that contains thousands of tweets or Facebook posts, it may appear that your data are far removed from the people who did the actual tweeting or posting. While it may be hard to believe that the people

Скачать книгу