Natural Language Processing for Social Media. Diana Inkpen

Чтение книги онлайн.

Читать онлайн книгу Natural Language Processing for Social Media - Diana Inkpen страница 16

Natural Language Processing for Social Media - Diana  Inkpen Synthesis Lectures on Human Language Technologies

Скачать книгу

the tools. The second way of adapting the tools is to re-train them on annotated social media data. This significantly improves the performance, although the amount of annotated data available for retraining is still small. Further development of annotated data sets for social media data is needed in order to reach very high levels of performance.

      In the next chapter, we will look at advanced methods for various NLP tasks for social media texts. These tasks use as components some of the tools discussed in this chapter.

image

      Figure 2.8: Accuracies on the character-based n-gram Naïve Bayes classifiers for the six divisions/groups [Sadat et al., 2014a].

       1 https://sites.google.com/site/empirist2015/

      2The F-score usually gives the same weight to precision and to recall, but it can weight one of them more when needed for an application.

       3 http://www.comp.leeds.ac.uk/ccalas/tagsets/upenn.html

      4This data set is available at http://code.google.com/p/ark-tweet-nlp/downloads/list.

      5A bracketing is a pair of matching opening and closing brackets in a linearized tree structure.

       6 http://www.ark.cs.cmu.edu/TweetNLP/#tweeboparser_tweebank

       7 http://www.cs.technion.ac.il/~gabr/resources/data/ne_datasets.html

      8LDA is a method that assumes a number of hidden topics for a corpus, and discovers a cluster of words for each topic, with associated probabilities. Then, for each document, LDA can estimate a probability distribution over the topics. The topics—word clusters—do not have names, but names can be given, for example, by choosing the word with the highest probability in each cluster.

       9 http://nlp.stanford.edu/downloads/

       10 http://opennlp.apache.org/

       11 http://nlp.lsi.upc.edu/freeling/

       12 http://nltk.org/

       13 http://gate.ac.uk/

       14 http://php-nlp-tools.com/

       15 https://gate.ac.uk/wiki/twitie.html

       16 http://www.ark.cs.cmu.edu/TweetNLP/

       17 https://github.com/aritter/twitter_nlp

       18 https://github.com/saffsd/langid.py

       19 http://blog.mikemccandless.com/2011/10/accuracy-and-performance-of-googles.html

       20 http://www.google.com/chrome

       21 https://code.google.com/p/language-detection/

       22 https://github.com/martin-majlis/YALI

       23 http://odur.let.rug.nl/~vannoord/TextCat/

       24 https://github.com/shuyo/ldig

       25 http://en.wikipedia.org/wiki/Trie

       26 http://www.win.tue.nl/~mpechen/projects/smm/

       27 http://people.eng.unimelb.edu.au/tbaldwin/data/lasm2014-twituser-v1.tgz

       28 http://en.wikipedia.org/wiki/Geographic_distribution_of_Arabic#Population

      29We will describe the concept of Naïve Bayes classifiers in detail in this section because they tend to work well on textual data and they are fast in terms of training and testing time.

      CHAPTER 3

       Semantic Analysis of Social Media Texts

      In this chapter, we discuss current NLP methods for social media applications that aim at extracting useful information from social media data. Examples of such applications are geolocation detection, opinion mining, emotion analysis, event and topic detection, summarization, machine translation, etc. We survey the current techniques, and we briefly define the evaluation measures used for each application, followed by examples of results.

      Section 3.2 presents geo-location detection techniques. Section 3.3 discusses entity linking and disambiguation, a task that links detected entities to a database of known entities. Section 3.4 discusses the methods for opinion mining and sentiment analysis, including emotion and mood analysis. Section 3.5 presents event and topic detection. Section 3.6 highlights the various issues in automatic summarization in social media. Section 3.7 presents the adaptation of statistical machine translation for social media text. Section 3.8 summarizes this chapter.

      One of the important topics in semantic analysis in social media is the identification of geolocation information for social content such as blog posts or tweets. By geo-location we mean a real location in the world, such as a region, or a city, or a point described by longitude and latitude. Automatic detection of event location for individuals or group of individuals with common interests is important for marketing purposes, and also for detecting potential threats to public safety.

      Конец ознакомительного фрагмента.

      Текст предоставлен ООО «ЛитРес».

      Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.

      Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной,

Скачать книгу