The Statistical Analysis of Doubly Truncated Data. Prof Carla Moreira

Чтение книги онлайн.

Читать онлайн книгу The Statistical Analysis of Doubly Truncated Data - Prof Carla Moreira страница 7

The Statistical Analysis of Doubly Truncated Data - Prof Carla Moreira

Скачать книгу

occurs, for example, with cross‐sectional sampling, where the sampled individuals are those being between the origin and the end point at a certain calendar time, which is the cross‐section date (Wang, 1991). That is, the observer arrives at the process at a given date, being allowed to observe the time‐to‐event
and the left‐truncation time
for the individuals 'in progress' by that date. With cross‐sectional sampling, the variable
is simply defined as the time from onset to the cross‐section date. This sampling procedure is often applied because it entails relatively little effort to reach a pre‐specified sampling size. In medical research, such a design leads to the sampling of the so‐called prevalent cases: patients already diagnosed from a certain disease of interest who survived beyond the cross‐section date. Clearly, such a sampling design implies an observational bias, in the sense that individuals with longer survival (the
value) will be observed with a relatively large probability. There exist well investigated proposals to overcome such a bias, based on the simple idea of taking the observed left‐truncation times into account to define suitable risk sets. For this purpose, independence between
and
has been traditionally assumed. This independence assumption states that the time‐to‐event distribution remains unchanged along time, being unrelated to the date of onset. A classical example of left‐truncation are the Channing House data, where the age at death is measured for people living in that retirement centre; in this case, the target variable is left‐truncated by the age when entering the residence (Klein and Moeschberger, 2003).

      Another feature leading to left‐truncation is the delayed entry into study. This happens when the individuals enter the study only at some random time

after onset. For example, diagnosis of a certain disease may not be ascertained until the first visit to the hospital. If the 'end‐of‐disease' event occurs before the potential date of visit, the time‐to‐event of such a patient will be never known, with the resulting difficulty in observing relatively small event times. Beyersmann et al. (2012) provide an illustrative example of this issue in the investigation of abortion times.

      1.2.2 Right‐truncation

is observed only for the individuals who experience the event before a certain calendar time
. A typical example of such a situation is the investigation of the incubation (or induction) times for AIDS; see for example Klein and Moeschberger (2003). The incubation time is defined as the time elapsed between the date of HIV infection,
say, and the development of AIDS. If
stands for the incubation time and
, then the incubation times of individuals developing AIDS prior
follow the distribution of
conditionally on
. Here,
is called the right‐truncation time. An immediate effect of right‐truncation is that large values of
are sampled with a relatively small probability.

      1.2.3 Truncation vs. Censoring

      At this point, the reader may be curious about the difference between truncation and censoring. Right‐censoring is a very well known phenomenon in Survival Analysis and reliability studies, among other fields. It happens when the follow‐up of a given individual stops before the event of interest has taken place. In such a case, the observer only knows that the target variable is larger than the registered value, which is referred to as censoring time. A sample made up of real and censored values is typically analysed by the Kaplan–Meier estimator (Kaplan and Meier, 1958), which corrects for the fact that some of the recorded values for

are smaller than the true ones. With truncated data, every value in the sample corresponds to a true observation of
; however, the distribution of the observed values may be shifted with respect to the true one due to the truncation event. This difference between truncation and censoring suggests that specific methods to estimate the target distribution under random truncation should be employed. Indeed, Woodroofe (1985) provides a deep analysis of one‐sided truncation, introducing the original idea of Lynden–Bell (1971) as a nonparametric maximum likelihood estimator (NPMLE) of the probability distribution in that setting. The estimator in Woodroofe (1985) is a particular case of the estimator corresponding to doubly truncated data, on which this book is focused.

      A variable of interest

is said to be doubly truncated by a couple of random variables
if the observation of
is possible only when
occurs. In such a case,
and
are called left‐ and right‐truncation variables respectively. Double truncation reduces to left‐truncation when
degenerates at
, while it corresponds to right‐truncation when
. This book is focused on the problem of estimating the distribution of
, and other related curves, from a set of iid triplets with the distribution of
given
.

Скачать книгу