Linked Lexical Knowledge Bases. Iryna Gurevych

Чтение книги онлайн.

Читать онлайн книгу Linked Lexical Knowledge Bases - Iryna Gurevych страница 6

Linked Lexical Knowledge Bases - Iryna Gurevych Synthesis Lectures on Human Language Technologies

Скачать книгу

for NLP. In Chapter 6 we briefly present multilingual applications, and computer-aided translation in particular, and show how they benefit from linked multilingual resources. Finally, in Chapter 7, we supplement our considerations of LLKB applications by discussing the enabling technologies, i.e., how LLKBs can be accessed via user interfaces and application programming interfaces. Based on the discussion of access paths for single resources, we describe how interfaces for current complex linked resources have evolved to cater to the needs of researchers and end users.

      Chapter 8 concludes this book and points out directions for future work.

       TYPOGRAPHIC CONVENTIONS

      • Newly introduced terms and example lemmas are typed in italics.

      • Synsets (groups of synonymous words) are enclosed by curly brackets, e.g., {car, automobile}.

      • Concepts are typed in small caps, e.g., STREET VEHICLE WITH FOUR WHEELS.

      • Relations between senses are written as pairs in parentheses, e.g., (car, vehicle).

      • Classes of the Lexical Markup Framework (LMF) standard are printed in a monospace font starting with an upper case letter (e.g., LexicalEntry).

      • LMF data categories are printed in a monospace font starting with a lower case letter (e.g., partOfSpeech).

      We acknowledge support by the Volkswagen Foundation as part of the Lichtenberg-Professorship Program under grant No. I/82806, by the German Institute for Educational Research (DIPF), and by the German Research Foundation under grant No. GU 798/17-1. We also thank our colleagues and students for their contributions to this book.

      Iryna Gurevych, Judith Eckle-Kohler, and Michael Matuschek

      July 2016

       4 http://www.meta-share.eu

       5 http://www.resourcebook.eu

       Acknowledgments

      …Mentors matter! The authors of the book are very grateful to each and everyone who generously offered their guidance, support, advice, strategic feedback and valuable insights of all kinds during our professional careers. This helped us grow, learn, identify and accomplish the right goals, including this very book.

      Iryna Gurevych, Judith Eckle-Kohler, and Michael Matuschek

      July 2016

      CHAPTER 1

       Lexical Knowledge Bases

      In this chapter we give an overview of different types of lexical knowledge bases that are used in natural language processing (NLP). We cover widely known expert-built Lexical Knowledge Bases (LKBs), and also collaborative LKBs, i.e., those created by a large community of layman collaborators. First we define our terminology, then we give a broad overview of various kinds of LKBs that play an important role in NLP. For particular resource-specific details, we refer the reader to the respective reference publications.

      Definition Lexical Knowledge Base: Lexical knowledge bases (LKBs) are digital knowledge bases that provide lexical information on words (including multi-word expressions) of a particular language.1 By word, we mean word form, or more specifically, the canonical base word form which is called lemma. For example, write is the lemma of wrote. Most LKBs provide lexical information for lemmas. A lexeme is a word in combination with a part of speech (POS), such as noun, verb or adjective. The majority of LKBs specify the part of speech of the lemmas listed, i.e., provide lexical information on lexemes.

      The pairings of lemma and meaning are called word senses or just senses. We use the terms meaning and concept synonymously in this book to refer to the possibly language-independent part of a sense. Each sense is typically identified by a unique sense identifier. For example, there are two meanings of the verb write which give rise to two different senses:2 (write, “to communicate with someone in writing”) and (write, “to produce a literary work”). Accordingly, a LKB might use identifiers, such as write01 and write02 to distinguish between the former and the latter sense. The set of all senses listed in a LKB is called its sense inventory.

      Depending on their particular focus, LKBs can contain a variety of lexical information, including morphological, phonetic, syntactic, semantic, and pragmatic information. This book focuses on LKBs that provide lexical information on the word sense level, i.e., information that is sensitive to the meaning of a word and is therefore attached to a pairing of lemma and meaning rather than to the lemma itself. Not included in our definition are LKBs that only provide morphological information about the inflectional and derivational properties of words.

      The following list provides an overview of the main lexical information types distinguished at the level of word senses.

      • Sense definition—A definition of the sense in natural language (also called gloss) meant for human interpretation; for example, “to communicate with someone in writing” is a sense definition for the sense write01 given above.

      • Sense examples—Example sentences which illustrate the sense in context; for example, He wrote her an email. is a sense example of the sense write01.

      • Sense relations—Lexical-semantic relations to other senses. We list the most salient ones.

      – Synonymy connects senses which are lexically different but share the same meaning. Synonymy is reflexive, symmetrical, and transitive. For example, the verbs change and modify are synonyms3 as they share the meaning “cause to change.”

      Some resources such as WordNet subsume synonymous senses into synsets. However, for the linking algorithms presented in this book, we will usually not distinguish between sense and synset, as for most discussions and experiments in this particular context they can be used interchangeably.

      – Antonymy is a relation in which the source and target sense have opposite meanings (e.g., tall and small).

      – Hyponymy denotes a semantic relation where the target sense has a more specific meaning than the source sense (e.g., from limb to arm).

      – Hypernymy is the inverse relation of hyponymy and thus denotes a semantic relation in which the target sense has a more general meaning than the source sense.

      • Syntactic behavior—Lexical-syntactic properties, such as the valency of verbs, i.e., the number and type of syntactic arguments a verb takes; for example, the verb change (“cause to change”) can take a noun phrase subject and a noun phrase object as syntactic arguments, as in: She[subject] changed the rules[object].

      In LKBs, valency is represented by subcategorization frames (short: subcat frames). They specify syntactic arguments of verbs, but also of other predicate-like lexemes that can take

Скачать книгу