The Statistical Analysis of Doubly Truncated Data. Prof Carla Moreira

Чтение книги онлайн.

Читать онлайн книгу The Statistical Analysis of Doubly Truncated Data - Prof Carla Moreira страница 6

The Statistical Analysis of Doubly Truncated Data - Prof Carla Moreira

Скачать книгу

doubly truncated data, and also raise awareness about the implications of double truncation on inferential procedures.

      Over these years, several researchers have collaborated with us in the fascinating adventure of investigating double truncation. Among them, we would like to mention Ingrid Van Keilegom, Micha Mandel, Rebecca Betensky, Luis Meira‐Machado and Roel Braekers. We have enjoyed co‐authoring a number of research papers with them. We also learned a lot about double truncation by studying real data problems posed by applied researchers; here we thank María José Bento, David Keith Simon, Zhi‐Sheng Ye, Ana Cristina Santos and Henrique Barros for fruitful discussions and cooperation.

      This book aims to serve as a companion for those ones interested in learning about doubly truncated data analysis and inference, presenting a wide range of tools for estimating distribution and regression models. All the methods presented in this book are accompanied by real data and simulated examples and, at the end of each chapter, the reader will find the do‐it‐yourself code, mostly based on the DTDA package. This book is not written with the aim of being just read: its main purpose is to invite the reader to think, explore and experience.

      This volume is also self‐contained, providing a general overview on the main results. Further technical details and some omitted proofs can be consulted in the original references. It is also in our intention to leave several take‐home messages. First, that the correction of the potential sampling bias arising from double truncation may be critical in estimation and inference. Second, that, even when the Efron–Petrosian estimator is conceptually complicated and its asymptotic theory may be overwhelming, its practical application is relatively simple from the available software packages and the good performance of resampling algorithms. Third, that external information on the sampling bias should be used whenever available, since the Efron–Petrosian estimator may be very noisy or even non‐existing, particularly when the sample size is small to moderate.

      We frankly hope that the reader will enjoy (and experience!) the book, at least as much as we have enjoyed writing it! Comments and suggestions from the readers on this edition are welcome; please send them to [email protected] to help us to improve the book.

      Parts of this book were written while the authors were supported by the Grants MTM2017‐89422‐P (MINECO/AEI/FEDER, UE) (first author), UIDB/00013/2020 and UIDP/00013/2020 (second author), and MTM2016‐76969‐P (MINECO/AEI/FEDER, UE) (third author). This is acknowledged.

May 2021 Jacobo de Uña‐Álvarez, Carla Moreira and Rosa M. CrujeirasVigo, V. N. Famalicão and Santiago de Compostela

      1.1 Random Truncation

      Random truncation generally refers to a situation in which a number of individuals of the target population cannot be sampled because a certain random event precludes them. When this random event is unrelated to the variables of interest standard statistical methods apply, with the only inconvenience of using a smaller sample size. In many practical cases, however, the truncation event is related to the variables under study, and specific methods to overcome the sampling bias must be considered.

      This book is focused on random truncation phenomena that arise (usually, but not only) when sampling time‐to‐event data. That is, the variable of interest is the time

elapsed from a well‐defined origin to another well‐defined end point. In this setting, a truncated sample of
is a set of independent and identically distributed (iid) random variables
with the conditional distribution of
given
, where
is a random set. Since the truncation event
is obviously related to
, standard statistical methods applied to the truncated sample may be systematically biased. For example, the ordinary empirical cumulative distribution function (ecdf) of
at point
,
, converges to
rather than to the target cumulative distribution function (cdf)
. This problem has received remarkable attention since the seminal paper by Turnbull (1976). Special forms of truncation when sampling time‐to‐event data are reviewed in Sections 1.2 and 1.3.

      Time‐to‐event data are relevant in fields like Survival Analysis and Reliability Engineering, in which random truncation often occurs. Random truncation is found in Astronomy too, where

represents the luminosity of an stellar object that is subject to observation limits. Examples from these areas will be introduced and analysed throughout this book.

      1.2.1 Left‐truncation

is defined as a random variable
such that
is observed only when
, determining the random set

Скачать книгу