An Introduction to Text Mining. Gabe Ignatow

Чтение книги онлайн.

Читать онлайн книгу An Introduction to Text Mining - Gabe Ignatow страница 10

Автор:
Серия:
Издательство:
An Introduction to Text Mining - Gabe Ignatow

Скачать книгу

to organize and analyze the Twitter data.

      A second thematic analysis study that uses Twitter data is by the mental health researchers Shepherd, Sanders, Doyle, and Shaw (2015). The researchers assessed how Twitter is used by individuals with experience of mental health problems by following the hashtag #dearmentalhealthprofessionals and conducting a thematic analysis to identify common themes of discussion. They found 515 unique communications that were related to the specified conversation. The majority of the material related to four overarching themes: (1) the impact of diagnosis on personal identity and as a facilitator for accessing care, (2) balance of power between professional and service user, (3) therapeutic relationship and developing professional communication, and (4) support provision through medication, crisis planning, service provision, and the wider society.

      Conclusion

      This chapter has addressed the role played by data in social science research and provided an overview of the advantages and limitations of digital data as a way to collect information from people in support of such human-centered research projects. The chapter has overviewed a number of online data sources, with forward pointers to Chapters 5 and 6, which specifically address aspects relevant to data collection and data sampling. Examples of social science research projects that make use of information obtained from digital resources were also provided, mainly as an illustration of the kind of research questions that can be answered with this kind of data; more such examples are provided in the following chapters (specifically in Chapters 10 through 12).

      Key Term

       Unstructured data 19

      Highlights

       Social science research has been traditionally conducted based on surveys, but new computational approaches have enabled the use of unstructured data sources as a way to learn information about people.

       Surveys are structured data sets that include clear, targeted information collected in controlled settings. They have the disadvantage of being expensive to run, which limits the frequency and number of surveys that can be collected for a study.

       Unstructured data sets are very large, “always on” naturally occurring digital resources, which can be used to extract or infer information on people. They have their own disadvantages, which include the fact that the information that can be obtained from these resources is often inexact and incomplete as well as subject to the biases associated with the groups of people who generate these data sources.

       Digital resources can be accessed either as collections available through institutional memberships (e.g., LexisNexis), via APIs provided by various platforms (e.g., Twitter API), or otherwise through scraping and crawling as described in Chapter 6.

      Discussion Questions

       Describe a social science research project that you know of, which has been based on survey data, and discuss how that same research project could be conducted using digital data resources. What kind of resources would you use? What kind of challenges do you expect to run into?

       While digital resources have their own advantages, as discussed in this chapter, there are certain types of information that cannot be collected from such unstructured data. Give examples of such types of information that can be collected only through surveys.

       Now consider a research project in which you would need to combine the benefits of unstructured data (e.g., Twitter) and structured data (e.g., surveys). In other words, your project requires that every subject in your data set provides both their Twitter data as well as responses to a set of surveys. How would you go about collecting such a mixed data set for your project?

      3 Research Ethics

      Learning Objectives

      The goals of Chapter 3 are to help you to do the following:

      1 Identify appropriate ethical guidelines for your research project.

      2 Decide whether your research needs approval from an institutional review board (IRB).

      3 Design a study that protects participants’ privacy, and follow requirements where participation consent is required.

      4 Observe ethical issues in the areas of authorship and publishing.

      Introduction

      In early January 2012, for over a week the news feeds of almost 700,000 Facebook users subtly changed. Researchers were manipulating the content in these users’ news feeds without notifying them. To learn how friends’ emotions affect each other, a team of researchers from Cornell University and Facebook had removed content that contained positive words for one group of users and removed content that contained negative words for another group. The researchers found that users who saw more positive posts tended to write slightly more positive status updates and that users who had been exposed to more negative posts wrote slightly more negative updates.

      The Cornell–Facebook study was published in the Proceedings of the National Academy of Sciences in 2014 (Kramer, Guillory, & Hancock, 2014). It sparked outrage after a blogger claimed the study had used Facebook users as “lab rats.” Following the early criticism from bloggers, the study came in for harsh criticism from both individual researchers and professional research associations. Unlike the advertising that Facebook shows, which aims to alter people’s behavior by encouraging them to buy products and services from Facebook advertisers, the changes to users’ news feeds were made without the users’ knowledge or explicit consent. And yet the question of whether the study was unethical is debatable. While there are no black-and-white answers, understanding the ethical dimensions of the Facebook emotion study can help you to plan your own study so that it will meet the highest possible ethical standards.

      Gorski, a surgeon, researcher, and editor of the blog Science-Based Medicine (https://www.sciencebasedmedicine.org), wrote on his blog in 2014 that the reaction to the Cornell–Facebook study showed a “real culture gap” between social science researchers on the one side and technology companies on the other. At a minimum, he argued, users should have been given the choice to not participate in the study, because it is “absolutely ridiculous to suggest that clicking a box on a website constitutes informed consent” (see the Informed Consent section). Moreno, a professor of medical ethics and health policy at the University of Pennsylvania, also criticized the study for “sending people whose emotional state you don’t know anything about communications that they might find disturbing” (Albergotti & Dwoskin, 2014). Broaddus (2014), a social psychologist at the Medical College of Wisconsin, noted a lack of transparency as an issue in the study. Grimmelmann, a law professor at the University of Maryland, pointed out the following in a May 2015 Slate article:

      If it had been carried out in a university lab by university faculty on volunteers they recruited, the researchers would almost certainly have drawn up a detailed description

Скачать книгу