Natural Language Processing for Social Media. Diana Inkpen
Чтение книги онлайн.
Читать онлайн книгу Natural Language Processing for Social Media - Diana Inkpen страница 8
As we mentioned, standard NLP approaches applied to social media data are therefore confronted with difficulties due to non-standard spelling, noise, limited sets of features, and errors. Therefore some NLP techniques, including normalization, term expansion, improved feature selection, and noise reduction, have been proposed to improve clustering performance in Twitter news [Beverungen and Kalita, 2011]. Identifying proper names and language switch in a sentence would require rapid and accurate name entity recognition and language detection techniques. Recent research efforts focus on the analysis of language in social media for understanding social behavior and building socially aware systems. The goal is the analysis of language with implications for fields such as computational linguistics, sociolinguistics, and psycholinguistics. For example, Eisenstein [2013a] studied the phonological variation and factors when transcribed into social media text.
Several workshops organized by the Association for Computational Linguistics (ACL) and special issues in scientific journals dedicated to semantic analysis in social media show how active this research field is. We enumerate some of them here (we also mentioned them in the Preface):
• The EACL 2014 Workshop Language Analysis in Social Media (LASM 2014)4
• The NAACL/HLT 2013 Workshop on Language Analysis in Social Media (LASM 2013)5
• The EACL 2012 Workshop on Semantic Analysis in Social Media (SASM 2012)6
• The NAACL/HLT 2012 Workshop on Language in Social Media (LSM 2012)7
• The ACL/HLT 2011 Workshop on Language in Social Media (LSM 2011)8
• The WWW 2015 Workshop on Making Sense of Microposts9
• The WWW 2014 Workshop on Making Sense of Microposts10
• The WWW 2013 Workshop on Making Sense of Microposts11
• The WWW 2012 Workshop on Making Sense of Microposts12
• The ESWC 2011 Workshop on Making Sense of Microposts13
• The COLING 2014 Workshop on Natural Language Processing for Social Media (SocialNLP)14
• The IJCNLP 2013 Workshop on Natural Language Processing for Social Media (SocialNLP)15
In this book, we will cite many papers from conferences such as ACL, WWW, etc.; many workshop papers from the above-mentioned workshops and more; several books; and many journal papers from various relevant journals.
1.4 SEMANTIC ANALYSIS OF SOCIAL MEDIA
Our goal is to focus on innovative NLP applications (such as opinion mining, information extraction, summarization, and machine translation), tools, and methods that integrate appropriate linguistic information in various fields such as social media monitoring for healthcare, security and defense, business intelligence, and politics. The book contains four major chapters.
• Chapter 1: This chapter highlights the need for applications that use social media messages and meta-data. We also discuss the difficulty of processing social media data vs. traditional texts such as news articles and scientific papers.
• Chapter 2: This chapter discusses existing linguistic pre-processing tools such as tokenizers, part-of-speech taggers, parsers, and named entity recognizers, with a focus on their adaptation to social media data. We briefly discuss evaluation measures for these tools.
• Chapter 3: This chapter is the heart of the book. It presents the methods used in applications for semantic analysis of social network texts, in conjunction with social media analytics as well as methods for information extraction and text classification. We focus on tasks such as: geo-location detection, entity linking, opinion mining and sentiment analysis, emotion and mood analysis, event and topic detection, summarization, machine translation, and other tasks. They tend to pre-process the messages with some of the tools mentioned in Chapter 2 in order to extract the knowledge needed in the next processing levels. For each task, we discuss the evaluation metrics and any existing test datasets.
• Chapter 4: This chapter presents higher-level applications that use some of the methods from Chapter 3. We look at: healthcare applications, financial applications, predicting voting intentions, media monitoring, security and defense applications, NLP-based information visualization for social media, disaster response applications, NLP-based user modeling, and applications for entertainment.
• Chapter 5: This chapter discusses chapter complementary aspects such as data collection and annotation in social media, privacy issues in social media, spam detection in order to avoid spam in the collected datasets, and we describe some of the existing evaluation benchmarks that make available data collected and annotated for various tasks.
• Chapter 6: The last chapter summarizes the methods and applications described in the preceding chapters. We conclude with a discussion of the high potential for research, given the social media analysis needs of end-users.
As mentioned in the Preface, the intended audience of this book is researchers that are interested in developing tools and applications for automatic analysis of social media texts. We assume that the readers have basic knowledge in the area of natural language processing and machine learning. Nonetheless, we will try to define as many notions as we can, in order to facilitate the understanding for beginners in these two areas. We also assume basic knowledge of computer science in general.
1.5 SUMMARY
In this chapter, we reviewed the structure of social network and social media data as the collection of textual information on the Web. We presented semantic analysis in social media as a new opportunity for big data analytics and for intelligent applications. Social media monitoring and analyzing of the continuous flow of user-generated content can be used as an additional dimension which contains valuable information that would not have been available from traditional media and newspapers. In addition, we mentioned the challenges with social media data, which are due to their large size, and to their noisy, dynamic, and unstructured nature.
1
http://people.eng.unimelb.edu.au/tbaldwin/pubs/starsem2014.pdf
2
http://www.statista.com/
3
http://www.cision.com/uk/files/2013/10/social-journalism-study-2013.pdf
4
https://aclweb.org/anthology/W/W14/#1300
5
https://aclweb.org/anthology/W/W13/#1100
6
https://aclweb.org/anthology/W/W12/#2100
7
https://aclweb.org/anthology/W/W12/#2100