Survivor function of a variable that has discrete and continuous components

Question

I'm currently reading The Statistical Analysis of Failure Time Data by Kalbfleisch and Prentice and had trouble at arriving at the expression for the survivor function of a random variable $T$ having both discrete and continuous components. The setup is the following:

Let $T$ be a random variable on $[0,\infty)$ with survivor function $F(t)=P(T>t)$. Then

if $T$ is absolutely continuous with density $f$, then the hazard function $\lambda$ can be defined as $$ \lambda(t):=\lim_{h\to 0^+}\frac{P(t\leq T<t+h\mid T\geq t)}{h}=\frac{f(t)}{F(t)} $$ for $t\geq 0$, and hence we have $$ F(t)=\exp\left(-\int_0^t\lambda(s)\,\mathrm ds\right),\quad t\geq 0. $$
if $T$ is discrete taking on the values $0\leq a_1<a_2<\cdots$, then we define the hazard at $a_i$ as $$ \lambda_i=P(T=a_i\mid T\geq a_i),\quad i=1,2,\ldots. $$ Then we can show that $$ F(t)=\prod_{j\mid a_j\leq t}(1-\lambda_j),\quad t\geq 0. $$

These expressions for the survivor functions I am ok with. Now they write the following:

More generally, the distribution of $T$ may have both discrete and continuous components. In this case, the hazard function can be defined to have the continuous component $\lambda_c(t)$ and discrete components $\lambda_1,\lambda_2,\ldots$ at the discrete times $a_1<a_2<\cdots$. The overall survivor function can then be written $$ F(t)=\exp\left(-\int_0^t\lambda_c(s)\,\mathrm ds\right)\prod_{j\mid a_j\leq t}(1-\lambda_j).\tag{1} $$

That $T$ has both discrete and continuous components means that the distribution of $T$ is of the form $$ P_T(\mathrm dx)=f_c(x) \lambda(\mathrm dx)+\sum_{j=1}^\infty b_j \delta_{a_j}(\mathrm dx) $$ or equivalently $$ P(T\in A)=\int_A f_c(x)\,\mathrm dx+\sum_{j\mid a_j\in A} b_j $$ for some sequence $a_1<a_2<\cdots$ and $b_i\in (0,1)$ and some non-negative measurable function $f_c$ with $\int_0^\infty f_c\,\mathrm d\lambda+\sum_{i=1}^\infty b_j=1$. If we define $$ \lambda_c(t)=\frac{f_c(t)}{P(T\geq t)}=\frac{f_c(t)}{F(t)},\quad t\neq a_i, $$ and $$ \lambda_i=P(T=a_i\mid T\geq a_i), $$ then how do I show (and is it even true) that the survivor function of $T$ is given by $(1)$?

I don't see an issue here: the authors consider relations between survivor function and hazard function separately for the case when there is density, and for the case when the distribution is purely discrete. In $(1)$ they combine the results for the distribution that has a continuous part and an purely discrete one. Now, for $t\neq a_j$ they define $\lambda_c(t) = f(t)/F(t)$, whereas for $t = a_j$ they use the second formula. — SBF, Jun 25 '13 at 09:44
@Ilya: Thanks for the response. I guess my question is, what is the definition of a variable having both a continuous part and a purely discrete part? — Stefan Hansen, Jun 25 '13 at 09:49
Well, I don't think that in a measure-theory oriented course you would find a formal definition of it, but most likely they mean that the distribution $\mu_T$ of $T$ is given by $$ \mu_T(\mathrm dx) = f_c(x)\lambda(\mathrm dx) + \sum_{j}b_j \delta_{a_j}(\mathrm dx) $$ where $f_c$ is some "sub-density" function, and $\lambda$ is the Lebesgue measure. — SBF, Jun 25 '13 at 09:51
@Ilya: That makes sense, but is it obvious that if $T$ has distribution given by $\mu_T$ above, and we define $\lambda_c$ according to $f_c$, and $\lambda_1,\lambda_2,\ldots$ according to $a_1,a_2,\ldots$, then its survivor function is given by $(1)$? — Stefan Hansen, Jun 25 '13 at 10:50
To be honest, in the current shape it is even not very obvious how do they define $\lambda_c$ and $\lambda_j$ in such a case. — SBF, Jun 25 '13 at 10:51
@Ilya: I've edited the question, so that it's (hopefully) clear how $\lambda_c$ and $\lambda_j$ should be defined. — Stefan Hansen, Jun 25 '13 at 11:07

score 3 · Accepted Answer · edited Jun 25 '13 at 12:04

3

The function $F(t) = \mathsf P(T>t) = 1-\mathsf P(T\leq t)$ is clearly of RCLL class on $[0,\infty)$. As a result, the definitions of continuous part of the hazard function $\lambda_c$ and discrete parts allow you computing $F$ by integrating $\lambda_c$ in between of the jumps, and applying jump conditions at $t = a_j$. The latter have the following shape: $$ \lambda_j = \mathsf P(T = a_j\mid T\geq a_j) = \frac{F(a_j-) - F(a_j)}{F(a_j-)}\implies F(a_j) = F(a_j-)(1-\lambda_j) $$ where $F(t-):=\lim_{s\uparrow t}F(s)$.

edited Jun 25 '13 at 12:04

Stefan Hansen

25,582
7
59
91

answered Jun 25 '13 at 11:57

SBF

36,041

1

Changed some $S$'s to $F$'s - hopefully it's what you intented to write. Thanks for your help. – Stefan Hansen Jun 25 '13 at 12:04
@StefanHansen: thanks – SBF Jun 25 '13 at 12:32
Stefan what I don't understand is that in the book by Kalbfleisch and Prentice they define $F(t) = P(T \geq t)$ on page 6 instead of $F(t) = P(T > t)$ the way you define it. Hence in the denominator of the fraction in the post by @Ilya I fail to see how $P(T \geq a_j)$ can get turned to $\lim\limits_{t\to a_j^{-}}F(t)$ given the book's definition shouldn't it be just F(a_j). Which then raises the question, how did the authors get to the general survival function you quote in (1)? :S – user1200428 Dec 09 '14 at 14:18
Also their notation is horrible!! – user1200428 Dec 09 '14 at 14:27

Mittens · Answer 2 · 2020-09-11T21:25:42.147

This may be very late but I would like to give my two cents on this question.

Suppose $\mu$ is a probability measure on $((0,\infty),\mathscr{B}((0,\infty))$ and let $F(x):=\mu(0,x]$. The Integrated Hazard Function $Q$ of $\mu$ is defined as $$ Q(t)=\int_{(0,t]}\frac{1}{1-F(x-)}\mu(dx). $$ The function $S(t):=1-F(t)$ is a right--continuous monotone nonincreasing function. $Q$ is a right--continuous monotone nondecreasing function whose associated (Lebesgue-Stieltjes) measure $\mu_Q\ll\mu$ satisfies $$ \begin{align} \mu_{Q}(\{x\})&=\Delta Q(x)=\frac{\Delta F(x)}{S(x-)}\\ \mu_{Q_c}(dx)&=\frac{1}{S(x-)}\mu_{F_c}(dx)\\ S(x-)\mu_Q(dx)&=\mu(dx)=\mu_F(dx). \end{align} $$ where $F_c$ and $Q_c$ is the continuous part of $F$ and $Q$ respectively. Then, $Q$ and $F$ have the same points of discontinuity $\{x_j:j\in I\}$, and since $S(t)=1-F(t)=1-\int_{(0,t]}\mu(dx)$, $$ \begin{align} S(t)=S(0)-\int_{(0,t]}S(x-)\mu_Q(dx)\tag{1}\label{one} \end{align} $$ We will show that $S$ is the unique solution to $\eqref{one}$ that is bounded in any bounded set, and that $$ S(t)=\exp\big(-Q_c(t)\big)\prod_{0<x_j\leq t} (1-\Delta Q(x_j)). $$

The proof of this will be a consequence of the following theorem:

Theorem: Let $F$ be a right--continuous monotone nondecreasing function in $[0,\infty)$ and let $\mu_F$ be the unique measure on $(0,\infty)$ such that $\mu\big((a,b]\big)=F(b)-F(a)$. Let $\{x_j:j\in\mathbb{N}\}$ be the sequence of all discontinuities of $F$. If $v\in\mathcal{L}^{loc}_1(\mu_F)$ then, for any number $H_0\geq0$ the function $$ \begin{align} H(t)=H_0\exp\Big(\int_{(0,t]}v(x)\mu_{F_c}(dx)\Big)\prod_{0<x_j\leq t}(1+v(x_j)\Delta F(x_j))\tag{2}\label{expo-form} \end{align} $$ is the unique solution in $t\geq0$ of the integral equation \begin{align} \label{integro-exp} H(t)=H(0)+\int_{(0,t]}H(x-)v(x)\mu_F(dx) \end{align} satisfying $\|H\mathbb{1}_{(0,t]}\|_u<\infty$ for all $t>0$.

The formula quoted in the OP is $\eqref{one}$, and existence and uniqueness are obtained by the Theorem above with $v\equiv-1$.

Since formula $\eqref{expo-form}$ appears often in applications Survival analysis and reliability theory, I think I is worthwhile to present a proof. This will be based entirely on Lebesgue integration by parts.

Preliminary notation:

For any real valued function $F$ on an interval $I$, denote by $\mu_F$ the Lebesgue-Stieltjes measure generated by $F$, so $\mu_F\big((a,b]\big)=F(b)-F(a)$ for all $[a,b]\subset I$.

Recall that for any real valued functions $F$, $G$ of local finite variation in some interval $I$ $$ \int_{(a,b]} F(t)\,\mu_G(dt)=F(b)G(b)-F(a)G(a)-\int_{(a,b]}G(t-)\,\mu_F(dt) $$ for all $[a,b]\subset I$. This formula may be denoted as $$ d(FG)=F\,dG+ G_-\,dF $$ where $G_-(t):=G(t-)$ and $dF(x):=\mu_F(dx)$, that is $dF\big((a,b]\big)=F(b)-F(a)$.
If $G$ is a continuous function of locally finite variation, then $$ dG^n = n G^{n-1}(t)\,dG$$ This can be shown by induction. For $n=1$ is valid. For $n\geq1$ $$ d(G^{n+1})=d(G^n\,G)=G\,dG^n + G^n\,dG=nG^n\,dG+ G^n\,dG=(n+1) G^n\,dG$$ From this, we obtain the well known exponential formula for continuous measures: $$\begin{align} d e^G(t) = e^{G(t)}\,dG(t):= e^{G(t)}\,d\mu_G(dt)\tag{3}\label{exp-for1} \end{align} $$

A technical result:

Lemma: Suppose $G$ is right--continuous nondecreasing in the interval $[0,T)$ $(0<T\leq\infty)$. Then, for any $n\in\mathbb{N}$ $$ \int_{(0,t]}G^{n-1}(s-)\mu_G(ds)\leq \frac{G^n(t)-G^n(0)}{n}\leq\int_{(0,t]}G^{n-1}(s)\mu_G(ds) $$ for all $0<t<T$. (In differential notation, $nG^{n-1}_-dG\leq dG^n\leq nG^{n-1}dG$.)

Here is a short proof:

For $n\in\mathbb{N}$, $G^n$ is right--continuous an nondecreasing and so, the associates Lebesgue--Stieltjes measure $\mu_{G^n}$ is nonnegative. Repeated application of integration by parts gives $$ \begin{align} dG^n &= G^{n-1}_-\,dG + G\,dG^{n-1}=G^{n-1}_-\,dG + G (G^{n-2}_-\,dG + G\,dG^{n-2})\\ &= (G^{n-1}_-+GG^{n-2}_- +\ldots + G^{n-1})\,dG \end{align} $$ in differential notation. As $G(s-)\leq G(s)$ for all $0<s\leq T$, we conclude that $$ n G^{n-1}_-\,dG \leq dG^n\leq n G^{n-1}\,dG $$

Proof of main Theorem:

As $v\in \mathcal{L}^{loc}_1(\mu_F)$, $v\in\mathcal{L}^{loc}_1(\mu_{F_I})$, and so $$ \|v\mathbb{1}_{(0,t]}\|_{\mathcal{L}_1(\mu_{F_I})}=\sum_{0<x_j\leq t}|v(x_j)|\Delta F(x_j)<\infty. $$ Consequently $H$ is bounded on each compact subinterval of $[0,\infty)$. Let $$ \begin{align} G_1(t)&=H_0\prod_{0<x_j\leq t}(1+v(x_j)\Delta F(x_j))\\ G_2(t)&=\exp\Big(\int_{(0,t]}v(x)\mu_{F_c}(dx)\Big). \end{align} $$ $G_1$ is right--continuous pure jump function of bounded variation which changes only at $x_j$; moreover, $$ \begin{align} \Delta G_1(x_j)=G(x_j)-G(x_j-)&=G(x_j-)\big(1+v(x_j)\Delta F(x_j)\big)-G(x_j-)\\ &= G(x_j-)v(x_j)\Delta F(x_j). \end{align} $$ $G_2$ is a continuous monotone nondecreasing function and $$ \begin{align} \mu_{G_2}(dx)&=\exp\Big(\int_{(0,x]}v(y)\mu_{F_c}(dy)\Big)v(x)\mu_{F_c}(dx)\\ &= G_2(x)v(x)\mu_{F_c}(dx). \end{align} $$ Applying the integration by parts formula to $H(t)=G_1(t)G_2(t)$ gives $$ \begin{align} H(t)-H(0)&=\int_{(0,t]}G_1(x-)\mu_{G_2}(dx)+\int_{(0,t]}G_2(x)\mu_{G_1}(dx)\\ &= \int_{(0,t]}G_1(x-)G_2(x)v(x)\mu_{F_c}(dx)+ \sum_{0<x_j\leq t}G_2(x_j)G_1(x_j-)v(x_j)\Delta F(x_j)\\ &= \int_{(0,t]}H(x-)v(x)\mu_{F_c}(dx)+\int_{(0,t]}H(x-)v(x)\mu_{F_I}(dx)\\ &=\int_{(0,t]}H(x-)v(x)\mu_F(dx). \end{align} $$ It remains to prove uniqueness. Suppose $H_1$ and $H_2$ are two solutions and set $D=H_1-H_2$. Let $M:=\|D\mathbb{1}_{(0,t]}\|_u$ and $\Lambda(t)=\int_{(0,t]}|v(x)|\mu_F(dx)$. Then, $$ |D(t)|\leq \int_{(0,t]}|D(x-)||v(x)|\mu_F(dx)\leq M\int_{(0,t]}|v(x)|\mu_{F}(dx) = M\Lambda(t). $$ As $\Lambda$ is nondecreasing and right continuous, $|D(x-)| \leq M\Lambda(x-)$. By the technical Lemma above $$ \begin{align} |D(t)|&\leq M\int_{(0,t]}\Lambda(x-) |v(x)|\mu_F(dx) = M\int_{(0,t]}\Lambda(x-)\mu_\Lambda(dx)\leq \frac{M}{2} \Lambda^2(t). \end{align} $$ Continuing by induction we obtain $|D(t)|\leq \frac{M}{n!}\Lambda^n(t)$. Letting $n\rightarrow\infty$ gives $|D(t)|=0$.

Survivor function of a variable that has discrete and continuous components

2 Answers2

Linked