An Introduction to Text Mining. Gabe Ignatow

Чтение книги онлайн.

Читать онлайн книгу An Introduction to Text Mining - Gabe Ignatow страница 5

Автор:
Серия:
Издательство:
An Introduction to Text Mining - Gabe Ignatow

Скачать книгу

of newsworthiness.

      In another CDA study, Baker and his colleagues (2008) analyzed a 140-million-word corpus of British news articles about refugees, asylum seekers, immigrants, and migrants. They used collocation and concordance analysis (see Appendix F) to identify common categories of representation of refugees, asylum seekers, immigrants, and migrants. They also discussed how collocation and concordance analysis can be used to direct researchers to representative texts in order to carry out qualitative analysis.

      Research in the Spotlight

      Combining Critical Discourse Analysis and Corpus Linguistics

      Baker, P., Gabrielatos, C., Khosravinik, M., Krzyzanowski, M., Mcenery, T., & Wodak, R. (2008). A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press. Discourse & Society, 19(3), 273–306.

      In this critical discourse analysis (CDA) study, the linguist Baker and his colleagues analyzed a 140-million-word corpus of British news articles about refugees, asylum seekers, immigrants, and migrants. The authors used collocation and concordance analysis (see Appendix F) to identify common categories of representations of the four groups. The authors also discuss how collocation and concordance analysis can be used to direct researchers to representative texts in order to carry out qualitative analysis.

      Specialized software used:

      WordSmith

       www.lexically.net/wordsmith

      Content Analysis

      Content analysis adopts a quantitative, scientific approach to text analysis. Unlike CDA, content analysis is generally focused on texts themselves rather than texts’ relations to their social and historical contexts. One of the classic definitions of content analysis defines it as “a research technique for the objective, systematic-quantitative description of the manifest content of communication” (Berelson, 1952, p. 18). At a practical level, content analysis involves the development of a coding frame that is applied to textual data. It mainly consists of breaking down texts into pertinent units of information in order to permit subsequent coding and categorization.

      Krippendorff’s (2013) classic textbook Content Analysis is the standard reference for work in this area. Many of the research design principles and sampling techniques covered in Chapter 5 of this textbook are shared with content analysis, although Krippendorff’s book goes into much greater detail on statistical sampling of texts and units of texts, as well as on statistical tests of interrater reliability.

      Foucauldian Analysis

      The philosopher and historian Foucault (1973) developed an influential conceptualization of intertextuality that differs significantly from Fairclough’s conceptualization in CDA. Rather than identifying the influence of external discourses within a text, for Foucault the meaning of a text emerges in reference to discourses with which it engages in dialogue. These engagements may be explicit or, more often, implicit. In Foucauldian intertextual analysis, the analyst must ask each text about its presuppositions and with which discourses it dialogues. The meaning of a text therefore derives from its similarities and differences with respect to other texts and discourses and from implicit presuppositions within the text that can be recognized by historically informed close reading.

      Foucauldian analysis of texts is performed in many theoretical and applied research fields. For instance, a number of studies have used Foucauldian intertextual analysis to analyze forestry policy (see Winkel, 2012, for an overview). Researchers working in Europe (e.g., Berglund, 2001; Franklin, 2002; Van Herzele, 2006), North America, and developing countries (e.g., Asher & Ojeda, 2009; Mathews, 2005) have used Foucauldian analysis to study policy discourses regarding forest management, forest fires, and corporate responsibility.

      Another example of Foucauldian intertextual analysis is a sophisticated study of the professional identities of nurses by Bell, Campbell, and Goldberg (2015). Bell and colleagues argued that nurses’ professional identities should be understood in relation to the identities of other occupational categories within the health care field. The authors collected their data from PubMed, a medical research database. Using PubMed’s own user interface, the authors acquired the abstracts for research papers that used the terms service or services in the abstract or key words for a period from 1986 to 2013. The downloaded abstracts were added to an SQLite database, which was used to generate comma-separated values (CSV) files with abstracts organized into 3-year periods. The authors then spent approximately 6 weeks of full-time work, manually checking the data for duplicates and other errors. The final sample included over 230,000 abstracts. Bell and colleagues then used the text analysis package Leximancer (see Appendix C) to calculate frequency and co-occurrence statistics for all concepts in the abstracts (see also Appendix F). Leximancer also produced concept maps (see Appendix G) to visually represent the relationships between concepts. The authors further cleaned their data after viewing these initial concept maps and finding a number of irrelevant terms and then used Leximancer to analyze the concept of nursing in terms of its co-occurrence with other concepts.

      Analysis of Texts as Social Information

      Another category of text analysis treats texts as reflections of the practical knowledge of their authors. This type of analysis is prevalent in grounded theory studies (see Chapter 4) as well as in applied studies of expert discourses. Interest in the informative analysis of texts is due in part to its practical value, because user-generated texts can potentially provide analysts with reliable information about social reality. Naturally, the quality of information about social reality that is contained in texts varies according to the level of knowledge of each individual who has participated in the creation of the text, and the information that subjects provide is partial insofar as it is filtered by their own particular point of view.

      An example of analysis of texts as social information is a 2012 psychological study by Colley and Neal on the topic of organizational safety. Starting with small representative samples of upper managers, supervisors, and workers in an Australian freight and passenger rail company, Colley and Neal conducted open-ended interviews with members of the three groups. These were transcribed and analyzed using Leximancer (see Appendix C) for map analysis (see also Appendix G). Comparing the concept maps produced for the three groups revealed significant differences between the “safety climate schema” of upper managers, supervisors, and workers.

      Challenges and Limitations of Using Online Data

      Having introduced text mining and text analysis, in this section we review some lessons that have been learned from other fields about how best to adapt social science research methods to data from online environments. This section is short but critically important for students who plan to perform research with data taken from social media platforms and websites.

      Methodologies

Скачать книгу