Domain-Sensitive Temporal Tagging. Jannik Strötgen

Чтение книги онлайн.

Читать онлайн книгу Domain-Sensitive Temporal Tagging - Jannik Strötgen страница 5

Domain-Sensitive Temporal Tagging - Jannik Strötgen Synthesis Lectures on Human Language Technologies

Скачать книгу

tagging of different types of documents and thus domain-sensitive temporal tagging are explained in Chapter 4. An overview of existing techniques and tools for temporal tagging including our own system HeidelTime is provided in Chapter 5. Finally, future research directions are discussed in Chapter 6. However, to guarantee the correct understanding of two important terms frequently used in this book, we start with defining the concepts “temporal expression” and “value of a temporal expression”.

      • A temporal expression is either an expression referring to a date or time of any granularity (e.g., “March 11, 2007”, “yesterday”, “June 2016”, “20th century”, “9 pm”), an expression referring to a duration (e.g., “three years”, “several months”), or an expression referring to the periodical aspect of an event (e.g., “every Monday”, “twice a week”).

      • The value (of a temporal expression) covers the (most important) semantics of the temporal expression in a standard format, that is, the normalized information of the expression.

      Examples of and more details about different types of temporal expressions and annotation standards for temporal expressions will be covered later in this book, but these definitions are crucial to understand the task of temporal tagging, which is defined and explained next.

      Temporal tagging addresses the extraction, classification, and normalization of temporal expressions occurring in text documents. It is a prerequisite of the full task of temporal annotation (temporal information extraction), which concerns the detection and interpretation of temporal expressions, events, and temporal relations between events and between temporal expressions and events [Verhagen et al., 2009]. However, temporal tagging is not only valuable in the context of temporal information extraction, but also in many research areas and application scenarios as will be detailed in Section 1.2.

      In general, temporal tagging can be considered as a specific type of named entity recognition and normalization. Although the three standard named entity types are person, organization, and location [Nadeau and Sekine, 2007], “the notion of named entity is commonly extended to include things that are not entities per se, but nevertheless have practical importance and do have characteristic signatures that signal their presence” [Jurafsky and Martin, 2008, p. 762]. Thus, further types of information are sometimes also covered under the named entity umbrella, for example, genes and proteins, numbers, and temporal expressions.

      The classical tasks of named entity recognition (NER) tools are to identify the spans of named entities in texts and to classify the extracted named entities into pre-defined classes of entities. Thus, the normalization of entities to a unique identifier or some value in a standard format is only performed if the named entities’ normalization—depending on the type of entity also referred to as disambiguation, linking, or resolution—is addressed, too. In contrast, a temporal tagger identifies the spans of temporal expressions in texts and normalizes the expressions according to some standard format. Depending on the annotation specifications, expressions are also sometimes classified according to their type, e.g., whether an expression is a date (e.g., May 3, 2009) or a duration (e.g., three days). However, this classification of temporal expressions can be considered as a part of the normalization process and thus, one can specify the two subtasks of temporal tagging as follows.

      • Extraction: given a text, determine the spans of all temporal expressions.

      • Normalization: given a text and a set of extracted temporal expressions, assign the temporal semantics to each expression in the form of normalized values in a standard format that adheres to some annotation specification.

      Figure 1.1 illustrates the two tasks of a temporal tagger. Given a text document (left), determine the temporal expressions (middle), and assign a normalized value in a standard format to each identified temporal expression (right). In Chapter 3, we will give an overview of existing annotation standards for temporal expressions. These define what should be considered as a temporal expression and how temporal expressions are to be normalized. Before that, however, we will first outline some application scenarios in which temporal expressions can be exploited, and then have a closer look at the concept of time in Chapter 2.

      For well-known NLP tasks such as named entity recognition (NER), there are many motivating application scenarios described in the literature. In the following, to illustrate the utility of temporal tagging, we present some use cases, in which applications can easily exploit extracted and normalized temporal information and benefit from the output of temporal taggers and thus from the value of temporal information in general.

      Figure 1.1: The two tasks of temporal tagging: extraction and normalization.

       TEMPORAL TAGGING FOR INFORMATION EXTRACTION

      In many text documents, events play an important role. Typically, events happen at some specific time and some specific place [Strötgen and Gertz, 2012a]. The importance of temporal information when organizing and summarizing extracted events is intuitive: given a text document with event mentions, the chronological ordering of the described events obviously benefits from normalized temporal expressions. Similar to temporal information, geographic information is also important in this context. However, the geographic aspect of events is out of the scope of this book.

      As illustrated in Figure 1.2, many documents do not mention events in a chronological order. Typically, sections about specific topics are used and contain temporally overlapping content. Further examples are biographies that often contain temporally overlapping sections about, for instance, “private life” and “professional life”, and news articles that report on recent happenings before referring to events that have happened in the past. An example of such a news article is shown in Figure 1.3.

      Similar to the task of summarizing and ordering events extracted from documents, temporal fact extraction also requires temporal tagging output. For instance, when collecting facts for a knowledge base, it should be taken into account that most facts are not static but either evolve with time or are valid only during a particular time period [Kuzey and Weikum, 2012]. For instance, “Bill ClintonholdsPoliticalPositionPresident of the United States” is a correct fact but only valid for a specific time period.

      While extracting events and temporal relations from single documents has a rather long tradition and was, for instance, addressed in the TempEval competitions at SemEval 2007 [Verhagen et al., 2007], 2010 [Verhagen et al., 2010], and 2013 [UzZaman et al., 2013], research was more recently extended to perform cross-document temporal relation extraction, as in the Timeline task of SemEval 2015 [Minard et al., 2015].1 A further indication of the importance of temporal tagging in the context of information extraction is the fact that at the 2015 SemEval competition, in addition to the Timeline task, three additional shared tasks were organized, in which extracted and normalized temporal expressions are a prerequisite

Скачать книгу