Linked Lexical Knowledge Bases. Iryna Gurevych

Чтение книги онлайн.

Читать онлайн книгу Linked Lexical Knowledge Bases - Iryna Gurevych страница 8

Linked Lexical Knowledge Bases - Iryna Gurevych Synthesis Lectures on Human Language Technologies

Скачать книгу

many systems using WordNet rely on the sense ordering; see also examples in Chapter 4.

      Information Types The lexical information types prevailing in wordnets can be summarized as follows.

      • Sense definition—Wordnets provide sense definitions at the synset level, i.e., all senses in a synset share the same sense definition.

      • Sense examples—These are provided for individual senses.

      • Sense relations—Most sense relations in wordnets are given at the synset level, i.e., all senses in a synset participate in such a relation.

      – A special case in wordnets is synonymy, because it is represented via synsets, rather than via relations between senses.

      – Most other sense relations are given on the synset level, e.g., hyponymy.

      – Few sense relations are defined between senses, e.g., antonymy, which does not always generalize to all members of a synset.

      • Syntactic behavior—The degree of detail regarding the syntactic behavior varies from wordnet to wordnet. While the Princeton WordNet only distinguishes between few subcat frames, the German wordnet GermaNet distinguishes between about 200 very detailed subcat frames.

      • Related forms—The Princeton WordNet is rich in information about senses that are related via morphological derivation. Not all wordnets provide this information type.

      LKBs modeled according to the theory of frame semantics [Fillmore, 1982] focus on word senses that evoke certain scenes or situations, so-called frames which are schematic representations of these. For instance, the “Killing” frame specifies a scene where “A Killer or Cause causes the death of the Victim.” It can be evoked by verbs such as assassinate, behead, terminate or nouns such as liquidation or massacre.

      The participants of these scenes (e.g., “Killer” and “Victim” in the “Killing” frame example), as well as other important elements (e.g., “Instrument” as “The device used by the Killer to bring about the death of the Victim” or “Place” as “The location where the death took place”) constitute the semantic roles of the frame (called frame elements in frame semantics), and are typically realized in a sentence along with the frame-evoking element, as in: Someone[Killer] tried to KILL him[Victim] with a parcel bomb[Instrument].

      The inventory of semantic roles used in FrameNet is very large and subject to further extension as FrameNet grows. Many semantic roles have frame-specific names, such as the “Killer” semantic role defined in the “Killing” frame.

      Frames are the main organizational unit in framenets: they contain senses (represented by their lemma) that evoke the same frame. The majority of the frame-evoking words are verbs and other predicate-like lexemes: they can naturally be represented by frames, since predicates take arguments which can be characterized both syntactically (e.g., subject, direct object) and semantically via their semantic role.

      There are semantic relations between frames (e.g., the “Is_Causative_of” relation between “Killing” and “Death” or the “Precedes” relation between “Being_born” and “Death” or “Dying”), and also between frame elements.

      The English FrameNet [Baker et al., 1998, Ruppenhofer et al., 2010] was the first frame-semantic LKB and it is the most well-known one. Version 1.6 of FrameNet contains 1,205 frames. In FrameNet, senses are called lexical units. FrameNet does not provide explicit information about the syntactic behavior of word senses. However, the sense examples are annotated with syntactic information (FrameNet annotation sets) and from these annotations, subcat frames can be induced.

      FrameNet is particularly rich in sense examples, which are selected based on lexicographic criteria, i.e., the sense examples are chosen to illustrate typical syntactic realizations of the frame elements. The sense examples are enriched with annotations of the frame and its elements, and thus provide information about the relative frequencies of the syntactic realizations of a particular frame element. For example, for the verb kill, a noun phrase with the grammatical function object is the most frequently used syntactic realization of the “Victim” role.

      Framenets in Other Languages The English FrameNet has spawned the construction of framenets in multiple other languages. For example, there are framenets for Spanish6 [Subirats and Sato, 2004], Swedish7 [Friberg Heppin and Toporowska Gronostaj, 2012], and Japanese8 [Ohara, 2012]. For Danish, there is an ongoing effort to build a framenet based on a large-scale valency LKB that is manually being extended by frame-semantic information [Bick, 2011]. For German, there is a corpus annotated with FrameNet frames called SALSA [Burchardt et al., 2006].

      Information Types The following information types in the English FrameNet are most salient.

      • Sense definition—For individual senses, FrameNet provides sense definitions, either taken from the Concise Oxford Dictionary or created by lexicographers. Furthermore, there is a sense definition for each frame, which is given by a textual description and shared by all senses in a frame.

      • Sense examples—FrameNet is particularly rich in sense examples which are selected based on lexicographic criteria.

      • Sense relations—FrameNet specifies sense relations on the frame level, i.e., all senses in a frame participate in the relation.

      • Predicate argument structure information—Semantic roles often have frame-specific names and are specified via a textual description. Some frame elements are further characterized via their semantic type, thus selectional preference information is provided as well.

      Most of the early work on LKBs for NLP considered valency as a central information type, because it was essential for deep syntactic and semantic parsing with broad-coverage hand-written grammars (e.g., Head-Driven Phrase Structure Grammar [Copestake and Flickinger], or Lexical Functional Grammar as in the ParGram project [Sulger et al., 2013]). Valency is a lexical property of a word to require certain syntactic arguments in order to be used in well-formed phrases or clauses. For example, the verb assassinate requires not only a subject, but also an object: *He assassinated. vs. He assassinated his colleague. Valency information is also included in MRDs, but often represented ambiguously and thus is hard to process automatically. Therefore, a number of valency LKBs have been built specifically for NLP applications. These LKBs use subcat frames to represent valency information.

      It is important to note that subcat frames are a lexical property of senses, rather than words. Consider the following example of the two senses of see and their sense-specific subcat frames (1) and (2): subcat frame (1) is only valid for the see—“interpret in a particular way” sense, but not for the see—“perceive with the eyes” sense:

      see—“interpret in a particular way:”

      subcat frame (1): (arg1:subject(nounPhrase),arg2:prepositionalObject(asPhrase))

      sense example: Some historians see his usurpation as a panic response to growing insecurity.

      see—“perceive with the eyes:”

      subcat

Скачать книгу