# PhD thesis: Advances in frailty models

### What are frailty models?

Imagine we have a device that ticks every now and then, and denote the number of ticks up to time $t$ as
$N(t)$. Broadly speaking, the goal of survival analysis is to determine the *intensity* of this process,
so as to answer how likely it is that, after observing this device for a while, we would expect another
tick to happen in the next period of time.

This quantity is usually modeled like:
$$
h(t) = h_0(t) \exp(\beta^\top \mathbf{x}(t))
$$
where $h_0(t)$ is a *baseline* intensity and $\mathbf{x}(t)$ represents a number of possibly time dependent factors,
or knowledge that we have about $N(t)$, that we believe it might influence the intensity. This is called a
*proportional intensity* model and has been developed from the 70s.
The term *survival analysis* comes from the situation where $N$ can only click once, usually representing death.

Frailty models come in for two big reasons:

- usually we do not know
*all*the factors that may influence $h(t)$. - when several devices are related in a way that is difficult to quantify (thinking similar unmeasured genetic build or similar environmental conditions that can’t be measured), we know that the tickers are related, and this information must be incorporated somehow

So in essence we claim that there is an unobservable variable $Z$ that affects $h(t)$. If we had observed $Z$ (which we don’t), then the intensity of $N(t)$ would be $$ h(t) = Z h_0(t) \exp(\beta^\top \mathbf{x}(t)). $$

We can work out that this is essentially a linear random intercept mixed model for the log-intensity. It turns out that, compared to regular linear mixed models, this leads to a large number of surprising results, while the estimation is in general challenging.

This type of models is still controversial to some extent, as they do involve a healthy dose of speculation. However, often, we are interested in the *conditional* treatment effect rather than the *marginal*, that is, the effect on the individual and not that on the population.

### What is the purpose of my PhD?

The aim of my PhD is to study how these things relate in the real world.

- In Score test for association between recurrent events and a terminal event, we developed, well… a score test to see whether a recurrent event process and a terminal event process are associated or not. The same idea can be used to test for dependent censoring of a recurrent event process. This is partially implemented in the
`frailtyEM`

package. - In Ascertainment correction in frailty models for recurrent events data, we looked at what happens when you select individual histories of recurrent events based on a number of events having happened in a certain calendar time period. We propose a fix for that and an iterative estimation method.
- In the frailtyEM: An R Package for Estimating Semiparametric Shared Frailty Models, we implemented a general estimation method for semi-parametric frailty models. It’s the only package (to my knowledge) that estimates gamma, positive stable, inverse Gaussian and a number of other distributions for the random effect, for semi-parametric shared frailty models. The original motivation was to have an R package that does things in a consistent way and is well documented, and many other ideas transformed it into what it is. For example, you can get a predicted survival curve with the frailty integrated out for an individual with time-dependent covariates.

We are also working on a number of other projects that will make it into this list as they get closer to completion.