Linked Lexical Knowledge Bases. Iryna Gurevych
Чтение книги онлайн.
Читать онлайн книгу Linked Lexical Knowledge Bases - Iryna Gurevych страница 9
sense example: Can you see the bird in that tree?
Subcat frames contain language-specific elements, even though some of their elements may be valid cross-lingually. For example, there are certain properties of syntactic arguments in English and German that correspond (both English and German are Germanic languages and hence closely related), while other properties, mainly morphosyntactic ones, diverge [Eckle-Kohler and Gurevych, 2012]. Examples of such divergences include the overt case marking in German (e.g., for the dative case) or the fact that the ing-form in English verb phrase complements is sometimes realized as zu-infinitive in German.
According to many researchers in linguistics, different subcat frames of a lexeme are associated with different but related meanings, an analysis which is called the “multiple meaning approach” by Hovav and Levin [2008].9 The multiple meaning approach gives rise to different senses, i.e., pairs of lexeme and subcat frame. Hence, valency LKBs provide an implicit characterization of senses via subcat frames, which can be considered as abstractions of sense examples. Sense examples illustrating a lexeme in a particular subcat frame (e.g., extracted from corpora) might be provided in addition. However, valency LKBs do not necessarily assign unique identifiers to senses, or group (nearly) synonymous senses into entries (as MRDs do).
Examples of Valency Lexicons COMLEX Syntax is an English valency LKB providing detailed subcat frames for about 38,000 headwords [Grishman et al., 1994]. Another well-known valency LKB is CELEX, which covers English, as well as Dutch and German. The PAROLE project (Preparatory Action for Linguistic Resources Organization for Language Engineering), initiated the creation of valency LKBs in 12 European languages (Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Italian, Portuguese, Spanish and Swedish), which have all been built on the basis of corpora. However, the resulting LKBs are much smaller. For example, the Spanish PAROLE lexicon contains syntactic information for only about 325 verbs [Villegas and Bel, 2015].
There are many valency LKBs in languages other than English. For German, an example of a large-scale valency LKB is IMSLex-Subcat, a broad-coverage subcategorization lexicon for German verbs, nouns and adjectives, covering about 10,000 verbs, 4,000 nouns, and 200 adjectives [Eckle-Kohler, 1999, Fitschen, 2004]. For verbs, about 350 different subcat frames are distinguished. IMSLex-Subcat was semi-automatically created: the subcat frames were automatically extracted from large newspaper corpora, and manually filtered afterward.
Information Types In summary, the following lexical information types are salient for valency LKBs.
• Syntactic behavior—Valency LKBs provide lexical-syntactic information on predicate-like words by specifying their syntactic behavior via subcat frames.
• Sense examples—For individual pairs of lexeme and subcat frame, sense examples might be given as well.
1.1.4 VERBNETS
According to Levin [1993], verbs that share common syntactic argument alternation patterns also have particular meaning components in common, thus they can be grouped into semantic verb classes. Consider as an example verbs participating in the dative alternation, e.g., give and sell. These verbs can realize one of their arguments syntactically either as a noun phrase or as a prepositional phrase with to, i.e., they can be used with two different subcat frames:
Martha gives (sells) an apple to Myrna.
Martha gives (sells) Myrna an apple.
Verbs having this alternation behavior in common can be grouped into a semantic class of verbs sharing the particular meaning component “change of possession,” thus this shared meaning component characterizes the semantic class.
The most well-known verb classification based on the correspondence between verb syntax and verb meaning is Levin’s classification of English verbs [Levin, 1993]. Recent work on verb semantics provides additional evidence for this correspondence of verb syntax and meaning [Hartshorne et al., 2014, Levin, 2015].
The English VerbNet [Kipper et al., 2008] is a broad-coverage verb lexicon based on Levin’s classification covering about 3,800 verb lemmas. VerbNet is organized in about 270 verb classes based on syntactic alternations. Verbs with common subcat frames and syntactic alternation behavior that also share common semantic roles are grouped into VerbNet classes, which are hierarchically structured to represent information about related subcat frames.
VerbNet not only includes the verbs from the original verb classification by Levin, but also more than 50 additional verb classes [Kipper et al., 2006] automatically acquired from corpora [Korhonen and Briscoe, 2004]. These classes cover many verbs taking non-finite verb phrases and subordinate clauses as complements, which were not included in Levin’s original classification. VerbNet (version 3.1) lists 568 subcat frames specifying syntactic types and semantic roles of the arguments, as well as selectional preferences, and syntactic and morpho-syntactic restrictions on the arguments.
Although it might often be hard to pin down what the shared meaning components of VerbNet classes really are, VerbNet has successfully been used in various NLP tasks, many of them including the subtask of mapping syntactic chunks of a sentence to semantic roles [Pradet et al., 2014]; see also Chapter 6.1 for an example.
Verbnets in Other Languages While the importance of having a verbnet-like LKB in less-resourced languages has been widely recognized, there have rarely been any efforts to build such high-quality verbnets as the English one. Most previous work explored fully automatic approaches to transfer the English VerbNet to another language, thus introducing noise. Semi-automatic approaches are also often based on translating the English VerbNet into another language.
Most importantly, many of the detailed subcat frames available for English, as well as the syntactic alternations, cannot be carried over to other languages, since valency is largely language-specific (e.g., [Scarton and Aluísio, 2012]). Therefore, the development of high-quality verbnets in languages other than English requires the existence of a broad-coverage valency lexicon as a prerequisite. For this reason, valency lexicons, especially tools for their (semi-)automatic construction, are still receiving considerable attention.
A recent example of a high-quality verbnet in another language is the French verbnet (covering about 2,000 verb lemmas) [Pradet et al., 2014] which has been built semi-automatically from existing French resources (thus also including subcat frames) combined with a translation of the English VerbNet verbs.
Information Types We summarize the main lexical information types for senses present in the English VerbNet.
• Sense definition—Verbnets do not provide textual sense definitions. A verb sense is defined extensionally by the set of verbs forming a VerbNet class; the verbs share common subcat frames, as well as semantic roles and selectional preferences of their arguments.
• Sense relations—The verb classes in verbnets are organized hierarchically and the subclass relation is therefore defined on the verb class level.
• Syntactic behavior—VerbNet lists detailed subcat frames for verb senses.
• Predicate argument structure information—In the English VerbNet, each individual verb sense is characterized by a semi-formal semantic predicate based on the event decomposition of Moens and Steedman [1988]. Furthermore, the semantic arguments of a verb are characterized by their semantic role and linked to their syntactic counterparts in the subcat frame. Most semantic arguments are additionally characterized by their semantic type (i.e., selectional preference information).