Domain-Sensitive Temporal Tagging. Jannik Strötgen
Чтение книги онлайн.
Читать онлайн книгу Domain-Sensitive Temporal Tagging - Jannik Strötgen страница 11
ANNOTATION SPECIFICATIONS FOR OTHER LANGUAGES THAN ENGLISH
While annotation standards have mostly focused on English or have been developed with the assumption of being rather language-independent, more recently, more and more effort was put into developing language-specific annotation specifications that capture language characteristics. Obviously, most of the language-specific adaptations deal with specifying extents of temporal expressions. For instance, TimeML specifies that determiners are typically included and prepositions are excluded of the extents of temporal expressions (e.g., in <TIMEX3>the year 2000</TIMEX3>
). In other languages, however, contractions are sometimes used with prepositions and determiners (e.g., in German “in dem” can be contracted to “im” and thus the respective German phrase could be annotated either as <TIMEX3>im Jahr 2000</TIMEX3>
or as im <TIMEX3> Jahr 2000</TIMEX3>
). For this, there is a need for a decision whether to include both or neither of them in the extents of temporal expressions.
Furthermore, the set of possible normalization values for the temporal expressions’ attributes have to be extended. For instance, while the original TimeML TIMEX3 attribute value has a possible value to specify a quarter of a year using “Q”, e.g., in 2015-Q1
and 2015-Q2
for the first and second quarter of the year 2015, respectively, it does not contain possible values to specify the three four-month periods of a year. While this is quite logical since references to quarters of years are frequent in English, references to the three four-month periods are not. However, when being faced with other languages, such expressions occur frequently. For instance, in Spanish the phrase <TIMEX3>el primer cuatrimestre</TIMEX3>
refers to the first four-month period of a year. Obviously, it should be possible to normalize such expressions accordingly.
Language-specific annotation guidelines and specifications following the English TimeML have been developed for several languages. Often, they have been developed in the context of some research competitions or together with a manually annotated corpus, which will be surveyed in Section 3.3 and Section 3.4, respectively. These efforts resulted in annotation guidelines and specifications, with some of them being very sophisticated, e.g., those for French [Bittar et al., 2011], Spanish [Saurí and Badia, 2012a, Saurí et al., 2010], and Italian (Ita-TimeML) [Caselli, 2010, Caselli et al., 2011]. It is interesting to note that many of the adaptations to the guidelines and specifications do not concern the annotations of temporal expressions but other parts of TimeML.
For Portuguese [Costa and Branco, 2012] and Romanian [Forascu and Tufis, 2012], English TimeML-annotated data was translated and the annotations were aligned. The authors of both works report that modifications to the original TimeML annotations were sometimes necessary due to language differences, but mostly concerned events and temporal relations, that is, not temporal expression annotations. First steps toward TimeML-compliant annotation specifications for further languages have been taken without focusing on temporal expressions, e.g., for Turkish [Seker and Diri, 2010]. For some languages, annotation efforts concentrated on TIMEX3 annotations only, e.g., for Vietnamese and Arabic [Strötgen et al., 2014a], Croatian [Skukan et al., 2014] and Turkish [Küçük and Küçük, 2015]. These, however, did not result in language-specific annotation specifications but have been carried out by following the English annotation guidelines for TIMEX3 annotations as closely as possible.
HANDLING THE UNCERTAINTY OF TEMPORAL EXPRESSIONS
According to both standards, TIDES TIMEX2 and TimeML with TIMEX3 tags, temporal expressions referring to points on timelines of any granularity are associated with a single value attribute. For instance, <TIMEX>the year 2000</TIMEX>, <TIMEX>March 2000</TIMEX>
, and <TIMEX>March 11, 2000</TIMEX>
are normalized to 2000, 2000-03
, and 2000-03-11
, respectively. As pointed out by Berberich et al. [2010], such temporal expressions carry some amount of uncertainty if they occur in a specific context. For instance, in the phrase “the FIFA world cup final 1998”, the final took place on a particular day and not during the whole year.
Thus, they suggest handling each date and time expression as a four-tuple with lower bounds (l) and upper bounds (u) for the begin and end times to cover this uncertainty, i.e., as 〈beginl, beginu, endl, endu〉. For single temporal expressions the lower bounds are identical and the upper bounds are identical (e.g., the four-tuple representation of <TIMEX>May 2000</TIMEX>
is 〈2000-05-01, 2000-05-31, 2000-05-01, 2000-05-31
〉). For interval expressions, the four values are different, e.g., 〈2000-03-01, 2000-03-31, 2001-05-01, 2001-05-31
〉 for <TIMEX>March 2000 to May 2001</TIMEX>
.
When strictly following TimeML, the phrase “March 2000 to May 2001” is to be annotated as two date expressions (<TIMEX>March 2000</TIMEX> and <TIMEX>May 2001</TIMEX>
) and a duration expression as abstract tag with the value attribute covering the length of the interval (1 year and 3 months). In addition, the begin and end of the interval are covered by the beginpoint and endpoint attributes normalized as 2000-03
and 2001-05
. However, as pointed out above, these empty TIMEX tags are often ignored and thus the duration information about complex temporal expressions is typically not covered.
It is worth mentioning that such a crisp annotation of temporal expressions using the four-tuple representation is not always possible due to the fuzziness of language. For instance, temporal expressions with modifiers are more difficult to interpret. In such cases, TimeML makes use of the modifier attribute in addition to the value attribute, e.g., <TIMEX> the beginning of 2000</TIMEX>
has a value attribute of 2000
and a modifier attribute of START
. Thus, the annotation is left fuzzy on purpose. A direct resolution to the four-tuple representation is also difficult. Of course, due to the fuzziness one could assign the same four values as if there was no modifier. However, it is obvious that parts of the year are not part of “the beginning of 2000” and specifying the boundary is difficult, if not impossible. The boundary might also depend on when the expression is uttered. If the time of utterance is March 2000, then it is likely that March is not included in the time referred to as “the beginning of 2000”. In contrast, March might be included if the expression is uttered in 2002. The upper bound of the end time can thus not be determined at all.
SUMMARY
TIDES TIMEX2 and TimeML annotation standards are widely accepted in the research community. Depending on particular use cases, they are sometimes extended—as by Berberich et al. [2010] in the context of temporal information retrieval—to better cover the requirements of applications. Due to a lot of research on temporal relation extraction, TimeML is more widely used than TIDES TIMEX2 annotations.
Whenever one is faced with the task of temporal tagging, annotation specifications are required so that normalized information can be correctly interpreted. In addition, since almost all works in the area of temporal tagging are following one of the two standards, it is crucial to follow these annotation specifications when developing a temporal tagger. Otherwise, existing manually annotated corpora cannot be used for evaluations and no meaningful comparison to existing approaches is possible.
Based on both standards, several research competitions have been organized, and several corpora have been manually annotated to be used as benchmarks. In the following sections, we survey temporal tagging research competitions and present an overview of existing annotated corpora. As different measures have been used in the research competitions to evaluate temporal tagging performance, we first describe how temporal taggers can be evaluated and what issues have to be taken into consideration.
3.2 EVALUATING TEMPORAL TAGGERS
In general, as for many natural