The Statistical Analysis of Doubly Truncated Data. Prof Carla Moreira
Чтение книги онлайн.
Читать онлайн книгу The Statistical Analysis of Doubly Truncated Data - Prof Carla Moreira страница 6
Over these years, several researchers have collaborated with us in the fascinating adventure of investigating double truncation. Among them, we would like to mention Ingrid Van Keilegom, Micha Mandel, Rebecca Betensky, Luis Meira‐Machado and Roel Braekers. We have enjoyed co‐authoring a number of research papers with them. We also learned a lot about double truncation by studying real data problems posed by applied researchers; here we thank María José Bento, David Keith Simon, Zhi‐Sheng Ye, Ana Cristina Santos and Henrique Barros for fruitful discussions and cooperation.
Nowadays, there is a considerable statistical community doing research on exploratory and inferential methods for doubly truncated data, partly motivated by new emerging applications in Biomedicine, Economics and Engineering, among other fields. At the time of writing the activity in this area of research is much more intense than ever before, as is evident from the number of papers on the topic published in the last couple of years. And the interest in double truncation is growing faster and faster!
This book aims to serve as a companion for those ones interested in learning about doubly truncated data analysis and inference, presenting a wide range of tools for estimating distribution and regression models. All the methods presented in this book are accompanied by real data and simulated examples and, at the end of each chapter, the reader will find the do‐it‐yourself code, mostly based on the DTDA
package. This book is not written with the aim of being just read: its main purpose is to invite the reader to think, explore and experience.
This volume is also self‐contained, providing a general overview on the main results. Further technical details and some omitted proofs can be consulted in the original references. It is also in our intention to leave several take‐home messages. First, that the correction of the potential sampling bias arising from double truncation may be critical in estimation and inference. Second, that, even when the Efron–Petrosian estimator is conceptually complicated and its asymptotic theory may be overwhelming, its practical application is relatively simple from the available software packages and the good performance of resampling algorithms. Third, that external information on the sampling bias should be used whenever available, since the Efron–Petrosian estimator may be very noisy or even non‐existing, particularly when the sample size is small to moderate.
We frankly hope that the reader will enjoy (and experience!) the book, at least as much as we have enjoyed writing it! Comments and suggestions from the readers on this edition are welcome; please send them to [email protected]
to help us to improve the book.
Parts of this book were written while the authors were supported by the Grants MTM2017‐89422‐P (MINECO/AEI/FEDER, UE) (first author), UIDB/00013/2020 and UIDP/00013/2020 (second author), and MTM2016‐76969‐P (MINECO/AEI/FEDER, UE) (third author). This is acknowledged.
May 2021 | Jacobo de Uña‐Álvarez, Carla Moreira and Rosa M. CrujeirasVigo, V. N. Famalicão and Santiago de Compostela |
1 Introduction
1.1 Random Truncation
Random truncation generally refers to a situation in which a number of individuals of the target population cannot be sampled because a certain random event precludes them. When this random event is unrelated to the variables of interest standard statistical methods apply, with the only inconvenience of using a smaller sample size. In many practical cases, however, the truncation event is related to the variables under study, and specific methods to overcome the sampling bias must be considered.
This book is focused on random truncation phenomena that arise (usually, but not only) when sampling time‐to‐event data. That is, the variable of interest is the time
elapsed from a well‐defined origin to another well‐defined end point. In this setting, a truncated sample of is a set of independent and identically distributed (iid) random variables with the conditional distribution of given , where is a random set. Since the truncation event is obviously related to , standard statistical methods applied to the truncated sample may be systematically biased. For example, the ordinary empirical cumulative distribution function (ecdf) of at point , , converges to rather than to the target cumulative distribution function (cdf) . This problem has received remarkable attention since the seminal paper by Turnbull (1976). Special forms of truncation when sampling time‐to‐event data are reviewed in Sections 1.2 and 1.3.Time‐to‐event data are relevant in fields like Survival Analysis and Reliability Engineering, in which random truncation often occurs. Random truncation is found in Astronomy too, where
represents the luminosity of an stellar object that is subject to observation limits. Examples from these areas will be introduced and analysed throughout this book.
1.2 One‐sided Truncation
1.2.1 Left‐truncation
Left‐truncation is a common feature when sampling time‐to‐event data. A left‐truncation time for the target
is defined as a random variable such that is observed only when , determining the random set