The Statistical Analysis of Doubly Truncated Data. Prof Carla Moreira
Чтение книги онлайн.
Читать онлайн книгу The Statistical Analysis of Doubly Truncated Data - Prof Carla Moreira страница 9
Название:
Автор:
Жанр:
Серия:
Издательство:
Because of the interval sampling, the age at diagnosis
is doubly truncated by the pair , where the right‐truncation variable is the time in years from birth (date of onset, ) to 31 December 2003, and . The triplets , , with the values observed for were reported in Moreira and de Uña‐Álvarez (2010), while de Uña‐Álvarez (2020) included the cancer group in the statistical analysis. Ordinary descriptive statistics can be applied to the information gathered along this 5 year long window to compute, for instance, the average age at cancer diagnosis. However, if the goal is to describe the population of children eventually developing cancer, the double truncation issue should be acknowledged and properly corrected, so potential biases are avoided.Interestingly, the observed values for
range between and 14.5 (years); equivalently, the observed values for range between 0.5 and 19.5. This means that the lower and upper endpoints of and satisfy and . Thus, in this case, the target variable is observable on its whole support , and there are no identification issues for , the cdf of . Information on is summarized in Table 1.1.Table 1.1 Descriptive statistics for Childhood Cancer Data: sample size
and mean (and standard deviation, SD) for the age at diagnosis (years).Group | Mean (SD) | ||
---|---|---|---|
All | 406 | 6.47 (4.50) | |
By gender | Female | 178 | 6.43 (4.51) |
Male | 228 | 6.51 (4.51) | |
By ICCC Group | Leukemia | 107 | 6.30 (4.15) |
Lymphoma | 57 | 8.66 (4.39) | |
N. System Tumour | 94 | 6.38 (4.29) | |
Neuroblastoma | 38 | 3.16 (3.47) | |
Other | 105 | 6.87 (4.70) | |
Missing | 5 | 3.92 (5.18) |
This dataset is used in Chapters 2, 3 and 5 and is accessible in the DTDA
package in ChildCancer
.
1.4.2 AIDS Blood Transfusion Data
Kalbfleish and Lawless (1989) reported 494 cases of transfusion‐related AIDS, corresponding to individuals diagnosed prior to 1 July 1986 (
). The variable of ultimate interest is the induction or incubation time, which is the time elapsed from HIV infection to AIDS. Importantly, HIV was unknown before 1982 (); this implies that cases developing AIDS prior to this date were not reported. Let denote the time from HIV infection to 1 July 1986 (in months), and introduce ; then, due to the interval sampling, only triplets satisfying