--- title: "The Gompertz distribution" author: "Göran Broström" date: "`r Sys.Date()`" slug: eha output: bookdown::html_document2 link-citations: yes pkgdown: as_is: true bibliography: mybib.bib vignette: > %\VignetteIndexEntry{The Gompertz distribution} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} \usepackage[utf8]{inputenc} --- ```{r setup, include = FALSE} library(eha) options(digits = 4) ``` This vignette is still ongoing work, so if you are looking for something you cannot find, please [alert me](https://github.com/goranbrostrom/eha/issues/) and I will do something about it. # Background Despite stated otherwise by some, the *Gompertz* distribution can be parameterized both as a PH model *and* as an AFT one. The Gompertz families of distributions are defined in essentially two ways in the **R** package `eha`: The *rate* and the *canonical* representations. The reason for this is that the families need to be differently represented depending on whether proportional hazards or accelerated failure time models are under consideration. In the *proportional hazards* case, the *rate* formulation is used, and it is characterized by an exponentially increasing hazard function with fixed rate `r`: \begin{equation} h(t; p, r) = p e^{r t}, \quad p, t > 0; -\infty < r < \infty. \end{equation} As noted earlier, when $r < 0$, the hazard function $h$ is decreasing "too fast" to define a proper survival function, and $r = 0$ gives the *exponential distribution* as a special case. And for each fixed $r$, the family of distributions indexed by $p > 0$ constitutes a proportional hazards family of distributions, and the corresponding regression model is written as \begin{equation} h(t; x, p, r, \beta) = p e^{r t} e^{\beta x}, \quad t > 0. \end{equation} # Proportional hazards model In Figure \@ref(fig:gowecof8) a Gompertz fit is compared to ordinary Cox regression. ```{r gowecof8, fig.cap = "Baseline cumulative hazards for Cox and Gompertz regressions."} fit.c <- coxreg(Surv(enter - 60, exit - 60, event) ~ sex + region, data = oldmort) fit.g <- phreg(Surv(enter - 60, exit - 60, event) ~ sex + region, dist = "gompertz", param = "rate", data = oldmort) plot(fit.c, xlab = "Years above age 60.") haz.g <- hazards(fit.g, cum = TRUE) lines(haz.g$x, haz.g$y, lty = 2) legend("topleft", legend = c("Cox regression", "Gompertz regression"), lty = 1:2) ``` The Gompertz model fits the baseline hazard very well up until duration 30 (age 90), but after that the exponential growth slows down. The result of fitting the Gompertz model is shown here, ```{r gocofph8} summary(fit.g) ``` to be compared to the Cox regression results. ```{r cocofph8} summary(fit.c) ``` # Accelerated failure time model The *Gompertz* distribution is special in that it can be fit into both the AFT and the PH framework. Of course, as we have seen, this also holds for the Weibull distribution in a trivial way, the AFT and the PH models are the same, but for the Gompertz distribution, the AFT and PH approaches yield different models. For the AFT framework to be in place in the Gompertz case, it needs to be formulated with a rather unfamiliar parametrization, which is called *the canonical parametrization* in the package `eha`. It works as follows. The standard definition of the Gompertz hazard function is \begin{equation*} h_r(t; (\alpha, \beta)) = \alpha \exp(\beta t), \quad t > 0; \; \alpha > 0, -\infty < \sigma < \infty. \end{equation*} and it is called the *rate* parametrization in `eha` As noted earlier, in order for $h_G$ to determine a proper survival distribution, it must be required that $\beta \ge 0$. It was also noted that when $\beta = 0$, the resulting distribution is *exponential*. The *canonical* definition of the Gompertz hazard function is given by \begin{equation*} h_c(t; (\tau, \sigma)) = \frac{\tau}{\sigma} \exp\biggl(\frac{t}{\sigma}\biggr), \quad t > 0; \; \tau, \sigma > 0. \end{equation*} The transition from $h_r$ to $h_c$ is given by $\sigma = 1 / \beta, \, \tau = \alpha / \beta$, and note that this implies that the rate in the canonical form must be strictly positive. Furthermore, the exponential special case now appears in the limit as $\sigma \rightarrow \infty$. In practice this means that when the baseline hazard is only weakly increasing, $\sigma$ is very large and numerical problems in the estimation process is likely to occur. The conclusion of all this is that the AFT Gompertz model is suitable in situations where the intensity of an event is clearly increasing with time. A good example is adult mortality. We repeat the PH analysis but with the AFT model. ```{r gocofaft8} fit.gaft <- aftreg(Surv(enter - 60, exit - 60, event) ~ sex + region, id = id, param = "lifeExp", dist = "gompertz", data = oldmort) summary(fit.gaft) ```