An Introduction to Text Mining. Gabe Ignatow

Чтение книги онлайн.

Читать онлайн книгу An Introduction to Text Mining - Gabe Ignatow страница 2

An Introduction to Text Mining - Gabe Ignatow

Скачать книгу

and returning to them often as they work their way through the early chapters and begin to design their own research projects (see Chapter 5).

      If you think of your text mining research project as a house, then the chapters in Part I are instructions for building the foundation. Just as a house with a flaw in its foundation will not last long, a research project with a shaky logical foundation or questionable ethics may look good at the start, but it is inevitable that at some point its flaws will be exposed. Chapter 5 on research design provides architectural instruction for building the framework of your house. Designing a research project that can address, and perhaps conclusively answer, a research question or questions is a challenging task, and it is useful to know the kinds of research designs that have a track record of success in research using text mining tools and methodologies. Parts III through V survey text mining and analysis methodologies, the equivalent of proven house-building methods. Appendix A provides a partial survey of online sources of textual data, which is the raw material of your research project. Appendices B through G provide, as it were, a survey of the practical tools that are available for house construction, from hand tools to heavy-duty machinery. While setting the foundation, designing the house, and choosing a construction method, it is a good idea to be aware of the types of practical tools that are available and within budget so that your project can reach a successful conclusion. Appendices H and I, as well as the Glossary, provide handy summaries of web resources, statistical tools, and key terms.

      Additional resources for instructors using An Introduction to Text Mining are also provided. Editable, chapter-specific Microsoft¯ PowerPoint¯ slides, as well as assignments and activities created by the authors, are available for download at: http:/

      Note to the Reader

      An Introduction to Text Mining grew out of our earlier SAGE methods guidebook Text Mining, which is a shorter volume intended to serve as a practical guidebook for graduate students and professional researchers. The two books share both a core mission and structure. Their mission is to enable readers to make better informed decisions about research projects that use text mining and text analysis methodologies. And they both survey text mining tools developed in multiple disciplines within the social sciences, humanities, and computer science.

      Where Text Mining was intended for advanced students and researchers, the current volume is a dedicated undergraduate or first-year graduate textbook intended for use in social science and data science courses. This book is thus longer than Text Mining, as it includes new material related to ethical and epistemological considerations in text-based research. There is a new chapter on how to write text-based social science research papers. And there are appendices that list and review data sources and software for preparing, cleaning, organizing, analyzing, and visualizing patterns in texts. Although these appendices were intended for students in undergraduate courses we suspect that they will prove valuable for experienced researchers as well.

      GI and RM

      About the Authors

Image 2

      Gabe Ignatowis an associate professor of sociology at the University of North Texas (UNT), where he has taught since 2007. His research interests are in the areas of sociological theory, text mining and analysis methods, new media, and information policy. Gabe’s current research involves working with computer scientists and statisticians to adapt text mining and topic modeling techniques for social science applications. Gabe has been working with mixed methods of text analysis since the 1990s and has published this work in the following journals: Social Forces, Sociological Forum, Poetics, the Journal for the Theory of Social Behaviour, and the Journal of Computer-Mediated Communication. He is the author of over 30 peer-reviewed articles and book chapters and serves on the editorial boards of the journals Sociological Forum, the Journal for the Theory of Social Behaviour, and Studies in Media and Communication. He has served as the UNT Department of Sociology’s graduate program codirector and undergraduate program director and has been selected as a faculty fellow at the Center for Cultural Sociology at Yale University. He is also a cofounder and the CEO of GradTrek, a graduate degree search engine company.

Image 3

      Rada Mihalceais a professor of computer science and engineering at the University of Michigan. Her research interests are in computational linguistics, with a focus on lexical semantics, multilingual natural language processing, and computational social sciences. She serves or has served on the editorial boards of the following journals: Computational Linguistics, Language Resources and Evaluation, Natural Language Engineering, Research on Language and Computation, IEEE Transactions on Affective Computing, and Transactions of the Association for Computational Linguistics. She was a general chair for the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL, 2015) and a program cochair for the Conference of the Association for Computational Linguistics (2011) and the Conference on Empirical Methods in Natural Language Processing (2009). She is the recipient of a National Science Foundation CAREER award (2008) and a Presidential Early Career Award for Scientists and Engineers (2009). In 2013, she was made an honorary citizen of her hometown of Cluj-Napoca, Romania.

Part I Foundations

      1 Text Mining and Text Analysis

      Learning Objectives

      The goals of Chapter 1 are to help you to do the following:

      1 Familiarize yourself with a variety of research projects accomplished using text mining tools.

      2 Address different research questions using text mining tools.

      3 Differentiate between text mining and text analysis methodologies.

      4 Compare major theoretical and methodological approaches to both text mining and text analysis.


      Text mining is an exciting field that encompasses new research methods and software tools that are being used across academia as well as by companies and government agencies. Researchers today are using text mining tools in ambitious projects to attempt to predict everything from the direction of stock markets (Bollen, Mao, & Zeng, 2011) to the occurrence of political protests (Kallus, 2014). Text mining is also commonly used in marketing research and many other business applications as well as in government and defense work.

      Over the past few years, text mining has started to catch on in the social sciences, in academic disciplines as diverse as anthropology (Acerbi, Lampos, Garnett, & Bentley, 2013; Marwick, 2013), communications (Lazard, Scheinfeld, Bernhardt, Wilcox, & Suran, 2015), economics (Levenberg, Pulman, Moilanen, Simpson, & Roberts, 2014), education (Evison, 2013), political science (Eshbaugh-Soha, 2010; Grimmer & Stewart, 2013), psychology (Colley & Neal, 2012; Schmitt, 2005), and sociology (Bail, 2012; Heritage & Raymond, 2005; Mische, 2014).

Скачать книгу