Domain-Sensitive Temporal Tagging. Jannik Strötgen

Чтение книги онлайн.

Читать онлайн книгу Domain-Sensitive Temporal Tagging - Jannik Strötgen страница 4

Domain-Sensitive Temporal Tagging - Jannik Strötgen Synthesis Lectures on Human Language Technologies

Скачать книгу

The decisions of a temporal tagger can be categorized using the confusion matrix

       3.2 Evaluation example

       3.3 Overview of temporal tagging research competitions

       3.4 English news corpora annotated with temporal expressions

       3.5 Non-English temporally annotated corpora containing news articles

       4.1 Non-news corpora containing manual annotations of temporal expressions

       4.2 Characteristics, challenges, and examples of the four domains that can be distinguished in the area of temporal tagging

       4.3 Statistics of the four temporally annotated corpora

       4.4 Six challenges that have to be addressed by a domain-sensitive temporal tagger

       5.1 Comparison of the temporal taggers TIPSem, HeidelTime, SUTime, and UWTime

       5.2 The official English TempEval-3 results of three HeidelTime versions, SUTime, and TIPSem

       5.3 The official Spanish TempEval-3 results of HeidelTime and TIPSemB

       5.4 Evaluation results of UWTime and HeidelTime 2.0 on the English TempEval-3 test data reported after the competition

       5.5 UWTime’s and HeidelTime’s evaluation results on the WikiWars corpus

       5.6 Evaluating HeidelTime, UWTime, SUTime, and TIPSemB on corpora of four different domains

       5.7 Comparison of HeidelTime’s manually developed resources and automatically created resources

       Preface

      Time matters! Whatever document we read, be it a news article, biography, some microblog, or a patient’s record, to name but a few examples, temporal information embedded in the documents typically helps us determine the course of events and actions, to correlate events, and eventually to get an overview of the documents’ content. Driven by the continuously increasing amount of textual data that is available on the Web, in electronic archives, and Intranet document repositories the computer-supported analysis and exploration of textual data has become a necessity and also a challenge in numerous application domains. Named Entity Recognition (NER), that is, the task of information extraction that aims at detecting and classifying elements in some text into predefined classes, such as locations, persons, organizations, and temporal expressions, has become a cornerstone of tools and techniques that help to address this challenge.

      Only in the past two decades has the topic of temporal tagging as a specific type of NER task become a major focus in research and development. Temporal tagging addresses the extraction, classification, and normalization of temporal expressions that occur in text documents, and it is the prerequisite for temporal information extraction. By now, the important role of temporal tagging has been well recognized in application domains such as text summarization, question answering, information retrieval, and topic detection and tracking. In these applications of temporal tagging, results can be as simple as the fully automated construction of a timeline of events detected in a document’s content and can be as complex as revealing the temporal discourse structure in documents.

      To date, there is no book that provides a comprehensive overview of the various methods, tools, evaluation competitions, and challenges the tasks of temporal tagging are faced with in the presence of diverse types of textual data and application domains. This book aims at closing this gap. Starting from the very fundamental role and concepts of time in documents, it provides an up-to-date overview of annotation standards, techniques, and competitions for evaluating the quality of temporal taggers, annotated corpora (including non-English texts) used for evaluations and developments, as well as a detailed overview of temporal taggers.

      As the title indicates, this book focuses particularly on temporal tagging of documents from different domains, including text data different from the well-studied domain of news articles. For this, we discuss the challenges and approaches temporal taggers have to consider when processing news-style, narrative-style, colloquial-style, and so-called autonomic-style documents, the latter covering documents that contain many temporal expressions that cannot be normalized to real points in time, but only according to some local or autonomic time frame. Examples of autonomic-style documents are specific types of scientific texts and literary works.

      We believe that this book provides researchers, practitioners, and developers a valuable resource for designing and improving temporal tagging techniques and tools, or just for applying them in a useful manner as part of more complex text analysis and exploration pipelines. While publicly available temporal taggers already provide sophisticated output for several application scenarios, there is still a lot of work in this area ahead of us. This book aims at providing a solid foundation on which such work can be built.

      Jannik Strötgen and Michael Gertz

      Saarbrücken, Germany and Heidelberg, Germany

      July 2016

       Acknowledgments

      This book gives an in-depth overview of methods, tools, and techniques of temporal tagging in different domains. Based on the number of publications and evaluation competitions, the past few years clearly show that this field is taking on an enormous interest in the research community and industry. We thus would like to thank all researchers who actively contribute new ideas to this field, organize evaluation competitions, and provide temporal tagging tools and resources for other researchers and the public.

      Although this book is about temporal tagging in general and not just about our temporal tagger HeidelTime, we want to take the opportunity to thank all contributors of HeidelTime for their great work and many users for helpful feedback to further improve the tool. We also would like to thank the many students at Heidelberg University who contributed in the form of student projects, and bachelor and master theses.

      In particular, we thank Anne-Lyse Minard and Steven Bethard for their valuable reviews of the draft of this book. They put a lot of effort into the reviews and provided numerous valuable comments as well as suggestions to significantly improve the book. Finally, we want to thank the series editor Graeme Hirst for his great support and his instant replies to all our questions. It is time for a big thank you!

      CHAPTER 1

       Introduction

      Temporal tagging is a specific task in natural language processing (NLP), in which temporal expressions are extracted from text documents and normalized to some standard format. Since temporal expressions are prevalent in many types of documents and because temporal information is an important dimension in any information space, applications of several domains can benefit from the output of temporal taggers.

      This book covers the topic of temporal tagging and is structured as follows. In this chapter, we describe the task of temporal tagging, and then present some examples of NLP and NLP-related application scenarios in which temporal information can be exploited to provide more meaningful and useful results. In Chapter 2, we provide background knowledge and cover basic concepts related to temporal information. The foundations of temporal tagging are described in

Скачать книгу