Domain-Sensitive Temporal Tagging. Jannik Strötgen
Чтение книги онлайн.
Читать онлайн книгу Domain-Sensitive Temporal Tagging - Jannik Strötgen страница 7
In contrast to entity-oriented search and GeoTime, which both directly benefit from extracted and normalized temporal expressions, time-related question answering often deals with more complex temporal phenomena [Pustejovsky et al., 2005]. Then, temporal tagging on its own is not sufficient but temporal reasoning is often necessary, for example, to answer questions of the form “did event x happen before event y?”. To be able to automatically answer such questions, the full task of temporal information extraction is required—including the subtasks of temporal tagging, event extraction, and temporal relation extraction. In the recent QA TempEval challenge at SemEval 2015, in which temporal information extraction systems were to be developed, the systems were evaluated solely based on how well they perform in answering such time-related questions for which temporal reasoning is important [Llorens et al., 2015]. In Chapter 3, we will detail how temporal taggers can be evaluated in general.
TEMPORAL TAGGING FOR SUMMARIZATION
While the value of temporal tagging for the above examples is quite straightforward, there are further application scenarios, in which temporal tagging can provide more indirect benefits. An example of such an application scenario is the document summarization task.
In the text summarization community, it is well known that coreference resolution is valuable to create better text summaries [Azzam et al., 1999, Steinberger et al., 2007]. Similar to coreference relations between (proper) nouns and pronouns, the relations between temporal expressions could also be taken into account to improve summaries. Assume the document that is to be summarized contains the following two sentences consecutively:
• s1 = 〈In 2010, something unimportant happened.〉
• s2 = 〈One year later, something important happened.〉
Obviously, good document summarizations should contain important information, that is, in our example s2 should be part of the summary but s1 should not be contained in the summary. However, without proper context information, the semantics of s2 is unclear due to the ambiguity of “One year later”. To fully understand s2, the reader requires a reference time to resolve the relative temporal expression. Unfortunately, this reference time is part of s1 (“2010”).
One solution to address this issue is to include both sentences in a summary. However, this results in a summary containing unimportant content, so that a better approach is to exploit the information provided by a temporal tagger (in s2 that “One year later” refers to 2011). In this way, the unimportant sentence s1 could be skipped, and s2 could be part of the summary in a slightly modified way, for instance, starting with “One year later (2011), something important happened”. Note that even for the first solution, some information about occurring temporal expressions is necessary, namely that s1 contains the reference time of s2.
1.3 SUMMARY OF THE CHAPTER
In the context of temporal tagging, two tasks can be distinguished: extraction and normalization of temporal expressions. In several NLP-related research areas, and thus in many applications, temporal tagging output can be exploited to improve the approaches. Note that for almost all applications and research topics exploiting temporal information, the normalization subtask is highly crucial.
1Timeline: Cross-Document Event Ordering, http://alt.qcri.org/semeval2015/task4/
[last accessed: Nov 9, 2015].
2Question Answering TempEval, http://alt.qcri.org/semeval2015/task5/
[last accessed: Nov 9, 2015].
3Clinical TempEval, http://alt.qcri.org/semeval2015/task6/
[last accessed: Nov 9, 2015].
4Diachronic Text Evaluation, http://alt.qcri.org/semeval2015/task7/
[last accessed: Nov 9, 2015].
CHAPTER 2
The Concept of Time
In the previous chapter, we already have implicitly exploited some characteristics of temporal information to explain the motivating examples. Now, we formulate the key characteristics of temporal information in a precise manner (Section 2.1). Then, we highlight the differences between multiple types of temporal expressions occurring in textual documents (Section 2.2) and analyze their possible textual realizations (Section 2.3).
2.1 KEY CHARACTERISTICS OF TEMPORAL INFORMATION
There are three key characteristics of temporal information that make this kind of information highly valuable for many search and exploration tasks. They can be formulated as follows [Alonso et al., 2011].
TEMPORAL INFORMATION IS WELL DEFINED
Given two points in time or two time intervals, the temporal relationship between them can always be determined, for example, as before or identical. In general, the relationship can be assumed to be one of the temporal relations defined by Allen [1983] in the context of temporal reasoning. In addition to the equality relation, there are six symmetrical relations, namely before, meets, overlaps, during, starts, and finishes [Allen, 1983]. In Figure 2.1, these relations are visualized following Allen’s presentation.
Figure 2.1: Temporal information is well-defined so that one of the relations defined by Allen [1983] holds between any intervals X and Y. Note that all relations except the equality relation are symmetric so that in total there are 13 possible relations between X and Y.
TEMPORAL INFORMATION CAN BE NORMALIZED
Regardless of the terms used and even of the languages used, two temporal expressions referring to the same semantics can be normalized to the same value in some standard format. Thus, temporal information can be considered as term- and language-independent. Understanding how temporal expressions can be normalized is one important step toward realizing how temporal information can be exploited in all kinds of application and research scenarios. While we will discuss the details when introducing annotation standards for temporal information in Section 3.1, an example with different temporal expressions carrying the same meaning is shown in Figure 2.2. Note that the expressions are uttered at various reference times (tref) and are normalized to the same value on the timeline t.
Figure 2.2: Temporal information can be normalized; the expressions uttered at various times tref have the same value in standard format (2015-10-12
). Note that explicit expressions such as “October