Statistical Significance Testing for Natural Language Processing. Rotem Dror

Чтение книги онлайн.

Читать онлайн книгу Statistical Significance Testing for Natural Language Processing - Rotem Dror страница 3

Statistical Significance Testing for Natural Language Processing - Rotem Dror Synthesis Lectures on Human Language Technologies

Скачать книгу

       Contents

       Preface

       Acknowledgments

       1 Introduction

       2 Statistical Hypothesis Testing

       2.1 Hypothesis Testing

       2.2 P-Value in the World of NLP

       3 Statistical Significance Tests

       3.1 Preliminaries

       3.2 Parametric Tests

       3.3 Nonparametric Tests

       4 Statistical Significance in NLP

       4.1 NLP Tasks and Evaluation Measures

       4.2 Decision Tree for Significance Test Selection

       4.3 Matching Between Evaluation Measures and Statistical Significance Tests

       4.4 Significance with Large Test Samples

       5 Deep Significance

       5.1 Performance Variance in Deep Neural Network Models

       5.2 A Deep Neural Network Comparison Framework

       5.3 Existing Methods for Deep Neural Network Comparison

       5.4 Almost Stochastic Dominance

       5.5 Empirical Analysis

       5.6 Error Rate Analysis

       5.7 Summary

       6 Replicability Analysis

       6.1 The Multiplicity Problem

       6.2 A Multiple Hypothesis Testing Framework for Algorithm Comparison

       6.3 Replicability Analysis with Partial Conjunction Testing

       6.4 Replicability Analysis: Counting

       6.5 Replicability Analysis: Identification

       6.6 Synthetic Experiments

       6.7 Real-World Data Applications

       6.7.1 Applications and Data

       6.7.2 Statistical Significance Testing

       6.7.3 Results

       6.7.4 Results Summary and Overview

       7 Open Questions and Challenges

       8 Conclusions

       Bibliography

       Authors’ Biographies

      The field of Natural Language Processing (NLP) has made substantial progress in the last two decades. This progress stems from multiple sources: the data revolution that has made abundant amounts of textual data from a variety of languages and linguistic domains available, the development of increasingly effective predictive statistical models, and the availability of hardware that can apply these models to large datasets. This dramatic improvement in the capabilities of NLP algorithms carries the potential for a great impact.

      The extended reach of NLP algorithms has also resulted in NLP papers giving more and more emphasis to the experiment and result sections by showing comparisons between multiple algorithms on various datasets from different languages and domains. It can be safely argued that the ultimate test for the quality of an NLP algorithm is its performance on well-accepted datasets, sometimes referred to as “leader-boards”. This emphasis on empirical results highlights the role of statistical significance testing in NLP research: If we rely on empirical evaluation to validate our hypotheses and reveal the correct language processing mechanisms, we better be sure that our results are not coincidental.

      The goal of this book is to discuss the main aspects of statistical significance testing in NLP. Particularly, we aim to briefly summarize the main concepts so that they are readily available to the interested researcher, address the key challenges of hypothesis testing in the context of NLP tasks and data, and discuss open issues and the main directions for future work.

      We start with two introductory chapters that present the basic concepts of statistical significance testing: Chapter 2 provides a brief presentation of the hypothesis

Скачать книгу