The Statistical Analysis of Doubly Truncated Data. Prof Carla Moreira

Чтение книги онлайн.

Читать онлайн книгу The Statistical Analysis of Doubly Truncated Data - Prof Carla Moreira страница 8

The Statistical Analysis of Doubly Truncated Data - Prof Carla Moreira

Скачать книгу

sampling, where the sample is restricted to the individuals with event between two specific dates
and
(Zhu and Wang, 2012). Then, the right‐truncation time is
, where
denotes the date of onset for the time‐to‐event, and the left‐truncation time is
, where
is the interval width. The Childhood Cancer Data in Section 1.4.1 is an example of data obtained through interval sampling.

      With interval sampling the variable

is degenerated at
. This occurs in other sampling schemes too, in which
and
are certain subject‐specific event dates. An illustrative example is given by the Parkinson's Disease Data, see Section 1.4.5, where
is the individual age at blood sampling. When
is constant, the couple
falls on a line, and its joint density does not exist, even when the truncating variables may be continuous.

      In other situations, the truncating variables

and
are not linked through the linear equation
. For example,
and
could represent some random observation limits beyond which the variable of interest
can not be sampled or detected. Situations like this occur for example in Astronomy, as it is illustrated in Section 1.4.4.

      With random double truncation, both large and small values of

are observed in principle with a relatively small probability. However, the real observational bias for
varies from application to application, depending on the joint distribution of
. We will see, for example, that the probability of sampling a value
, namely
, may be roughly constant, inducing no observational bias; or that it may be roughly decreasing, indicating the dominance of the right‐truncation bias relative to the left‐truncation bias.

      Another issue of relevance is that of the identifiability of the distribution of

. Intuitively it is clear that with doubly truncated data it is only possible to estimate the distribution of
conditional on
, where
and
denote respectively the lower and upper endpoints of the supports of
and
(see Chapter 2 for details). This may have important practical consequences, as we will see. On the other hand, in applications with doubly truncated survival data the estimates correspond to the susceptible population for which the terminal event of interest is sure. This is in contrast to the standard analysis of survival times where a portion of the individuals may belong to the so‐called cured fraction, or immunes. This should be taken into account when interpreting the results from the analysis.

      In this section we introduce the datasets that will be used throughout the book for illustration purposes. All of them suffer from double truncation. These examples are available within the last update of the DTDA package (Moreira et al., 2021a).

      1.4.1 Childhood Cancer Data

      The Childhood Cancer Data were gathered from the IPO (Instituto Português de Oncologia) of Porto, Portugal, by the RORENO (Registro Oncológico do Norte) service. The information corresponds to all children diagnosed from cancer between 1 January 1999 (

) and 31 December 2003 (
) in the region of North Portugal, which includes five districts: Porto, Braga, Bragança, Vila Real and Viana do Castelo. The variable of main interest upper 
						<noindex><p style= Скачать книгу