crmPack Training for Merck

Advanced Training

Daniel Sabanés Bové

RPACT

March 31, 2026

Advanced Training Outline 📦

How to contribute code on GitHub
Custom JAGS model workflow
Prior construction:
- Minimally informative prior
- Mixture priors
Time-to-event (TITE) designs

How to contribute on GitHub

The package’s repo is hosted on GitHub at github.com/openpharma/crmPack
Contributions are very welcome:
- Questions
- Bug reports
- Feature requests
- Code contributions (under the license of the project, i.e. GPL v2 or higher)
We will also soon have a contributing guide in the repo with more detailed instructions on how to contribute code, but here are some general guidelines:
- Fork the repo for your work
- Make your changes and commit them with clear messages
- Push to your fork and create a pull request to the main crmPack repo
- I will review your pull request and provide feedback, and once it’s approved I will merge it into the main branch

Custom JAGS model workflow

It is actually quite straightforward to implement a custom JAGS model in crmPack:

Define a new S4 class that inherits from GeneralModel with setClass()
Implement a constructor / initialization function for the new class, which specifies the likelihood and prior distributions in the underlying GeneralModel slots:
1. The datamodel slot should contain the JAGS code for the likelihood
2. The priormodel slot should contain the JAGS code for the prior distributions
3. The modelspecs slot is a function that returns required JAGS inputs
4. The init slot is a function that returns initial values for the MCMC sampling
5. The datanames slot is a character vector of the names of the data inputs required for the likelihood
Implement dose() and prob() methods for the new class, which compute the dose and toxicity probability for given model parameters using setMethod()

Full examples are available in the JSS paper and in the parallelization vignette.

Prior construction

Minimally informative prior

As we have seen in the basic training, it is not trivial to construct a reasonable prior for the CRM model parameters.
It gets easier when we don’t directly specify priors for the model parameters, but instead specify a prior for the toxicity probabilities at the doses of interest.
For logistic (log-)normal models, the crmPack function Quantiles2LogisticNormal can be used to compute the model parameters that correspond to a given set of quantiles for the toxicity probabilities at the doses of interest.

Minimally informative prior (cont’d)

Specifically, Neuenschwander et al. (2008) introduced the concept of a “minimally informative prior” for the CRM, which is a prior for the logistic (log-) normal model that is as vague as possible while still being consistent with the prior information we have about the toxicity probabilities at the doses of interest
The algorithm can be summarized as follows:
- For a high dose, have a small chance (say 5%) of being below a low toxicity probability (say 20%)
- For a low dose have a small chance (say 5%) of being above a low toxicity probability (say 10%)
- For intermediate doses, assume prior medians are linear in log-dose on the logit scale
- Then iteratively compute the model prior parameters that come closest to satisfying these constraints

Example of a minimally informative prior

Note that we only use a coarse grid here to simplify the optimization problem a bit.

coarseGrid <- c(25, 100, 300)
min_result <- MinimalInformative(
    dosegrid = coarseGrid,
    refDose = 100,
    logNormal = TRUE, # use a log-normal distribution for the slope parameter
    threshmin = 0.1, # the quantile for the low dose
    threshmax = 0.2, # the quantile for the high dose
    seed = 432, # for reproducibility
    control = list(max.time = 30) # limit the optimization time here
)

It: 1, obj value (lsEnd): 0.06732883077 indTrace: 1
It: 214, obj value (lsEnd): 0.01474025243 indTrace: 214
It: 238, obj value (lsEnd): 0.01416193999 indTrace: 238
It: 460, obj value (lsEnd): 0.01295269405 indTrace: 460
timeSpan = 30.001304 maxTime = 30
Emini is: 0.01295269405
xmini are:
-1.280026342 0.6618977116 1.254935046 0.1500300078 0.5608228116 
Totally it used 30.00134 secs
No. of function call is: 10164

Comparing the prior result with the target quantiles

matplot(
    x = coarseGrid,
    y = min_result$required,
    type = "o",
    lty = 1,
    col = 1
)
matlines(
    x = coarseGrid,
    y = min_result$quantiles,
    type = "o",
    lty = 2,
    col = 1
)
legend(
    "topleft",
    legend = c("Target quantiles", "Obtained quantiles"),
    lty = 1:2,
    col = 1
)

Comparing the prior result with the target quantiles

Resulting prior parameters

min_result$model

A logistic log normal model will describe the relationship between dose and toxicity: $p(Tox | d) = f(X = 1 | \theta, d) = \frac{e^{\alpha + \beta \cdot log(d/d_{ref})}}{1 + e^{\alpha + \beta \cdot log(d/d_{ref})}}$ where d_ref denotes a reference dose.

The prior for θ is given by $\boldsymbol\theta = \begin{bmatrix}\alpha \\ log(\beta)\end{bmatrix}\sim N \left(\begin{bmatrix}-1.28 \\ 0.66\end{bmatrix} , \begin{bmatrix} 1.57 & 0.11 \\ 0.11 & 0.02\end{bmatrix} \right)$

The reference dose will be 100.00.

Mixture priors

Sometimes we have more specific prior information about the toxicity probabilities at the doses of interest, for example from preclinical data or from similar drugs.
In such cases, we can use mixture priors to incorporate this information into the CRM model
For example, we can use a logistic log-normal model with a mixture of two normal distributions, with the weights of the two components reflecting our uncertainty about which scenario is more likely
Applications of mixture priors include:
- bimodal: e.g. one representing a “low toxicity” scenario and one representing a “high toxicity” scenario
- borrowing from another compound: e.g. one representing the information from a similar drug and one representing a minimally informative prior
- borrowing from another population: e.g. one representing the behavior from a global study population and one minimally informative

Mixture priors in crmPack

crmPack currently includes the following classes for mixture priors:

LogisticNormalMixture:
- standard logistic regression model with a mixture of $K=2$ bivariate normal priors on the intercept and slope parameters
- the weight parameter is also estimated from the data and has a Beta hyperprior distribution
LogisticNormalFixedMixture
- standard logistic regression model with fixed mixture of multiple bivariate (log-) normal priors on the intercept and slope parameters
- the weights are fixed, and $K>2$ mixture components can be used

Example of LogisticNormalMixture

Let’s first define two bivariate normal components:

comp1 <- ModelParamsNormal(
    mean = c(-0.85, 1),
    cov = matrix(c(1, -0.5, -0.5, 1), nrow = 2)
)
comp2 <- ModelParamsNormal(
    mean = c(1, 1.5),
    cov = matrix(c(1.2, -0.45, -0.45, 0.6), nrow = 2)
)

Say we use a uniform $\textrm{Beta}(1,1)$ hyperprior for the weight parameter, and a reference dose of $d^{*} = 50$ :

normal_mix_prior <- LogisticNormalMixture(
    comp1 = comp1,
    comp2 = comp2,
    weightpar = c(a = 1, b = 1),
    ref_dose = 50
)
normal_mix_prior

A mixture of two logistic log normal models will describe the relationship between dose and toxicity: $p(Tox | d) = f(X = 1 | \theta, d) = \frac{e^{\alpha + \beta \cdot log(d/d^*)}}{1 + e^{\alpha + \beta \cdot log(d/d^*)}}$ where d* denotes a reference dose.

The prior for θ is given by $\theta = \begin{bmatrix} \alpha \\ log(\beta) \end{bmatrix} \sim w \cdot N \left(\begin{bmatrix}-0.85 \\ 1.00\end{bmatrix} , \begin{bmatrix} 1.00 & -0.50 \\ -0.50 & 1.00\end{bmatrix} \right) + (1 - w) \cdot N \left(\begin{bmatrix} 1.00 \\ 1.50\end{bmatrix} , \begin{bmatrix} 1.20 & -0.45 \\ -0.45 & 0.60\end{bmatrix} \right)$

and the prior for w is given by

$w \sim Beta( 1.00, 1.00)$

The reference dose will be 50.00.

Example of LogisticNormalFixedMixture

Here we need to fix the weights of the mixture components, for example to 0.5 each:

normal_mix_fix_weights_prior <- LogisticNormalFixedMixture(
    components = list(
        comp1 = comp1,
        comp2 = comp2
    ),
    weights = c(0.5, 0.5),
    ref_dose = 50
)
normal_mix_fix_weights_prior

A mixture of 2 logistic log normal models with fixed weights will describe the relationship between dose and toxicity: $p(Tox | d) = f(X = 1 | \theta, d) = \frac{e^{\alpha + \beta \cdot log(d/d^*)}}{1 + e^{\alpha + \beta \cdot log(d/d^*)}}$ where d* denotes a reference dose.

The prior for θ is given by $\theta = \begin{bmatrix} \alpha \\ \beta \end{bmatrix} \sim \sum_{i=1}^{2}w_i \cdot N \left( \mathbf{\mu}_i , \mathbf{\Sigma}_i \right)$ with $\sum_{i=1}^{2} w_i = 1$ The individual components of the mixture are $\boldsymbol\theta_1 = \begin{bmatrix}\alpha \\ \beta\end{bmatrix}\sim N \left(\begin{bmatrix}-0.85 \\ 1.00\end{bmatrix} , \begin{bmatrix} 1.00 & -0.50 \\ -0.50 & 1.00\end{bmatrix} \right)$

with weight 0.5 and $\boldsymbol\theta_2 = \begin{bmatrix}\alpha \\ \beta\end{bmatrix}\sim N \left(\begin{bmatrix} 1.00 \\ 1.50\end{bmatrix} , \begin{bmatrix} 1.20 & -0.45 \\ -0.45 & 0.60\end{bmatrix} \right)$

with weight 0.5 The reference dose will be 50.00.

Comparison of the two prior specifications

We can easily see the difference in the JAGS code:

body(normal_mix_prior@priormodel)

{
    w ~ dbeta(weightpar[1], weightpar[2])
    wc <- 1 - w
    comp0 ~ dbern(wc)
    comp <- comp0 + 1
    theta ~ dmnorm(mean[1:2, comp], prec[1:2, 1:2, comp])
    alpha0 <- theta[1]
    alpha1 <- theta[2]
}

body(normal_mix_fix_weights_prior@priormodel)

{
    comp ~ dcat(weights)
    theta ~ dmnorm(mean[1:2, comp], prec[1:2, 1:2, comp])
    alpha0 <- theta[1]
    alpha1 <- theta[2]
}

Yet another but different mixture prior

LogisticLogNormalMixture
- standard logistic model with online mixture of $K=2$ bivariate log normal priors on the intercept and slope parameters
- can be used when data is arising online from the informative component of the prior, at the same time with the data of the trial of main interest
- assumes the same prior distribution for the informative component and the current trial data (for simplicity), the only question is whether the parameters are identical or not
- needs to use DataMixture class to specify the data structure for the online data

Example of LogisticLogNormalMixture

Note that we just use one component, i.e. one prior here:

online_mix_prior <- LogisticLogNormalMixture(
    share_weight = 0.5, # probability that the current and external data are from the same distribution
    ref_dose = 50,
    mean = comp1@mean,
    cov = comp1@cov
)

Now we define the data, which consists of the current trial data and the external data:

data_share <- DataMixture(
    doseGrid = c(25, 50, 100, 200, 300),
    x = c(25, 25, 50, 50),
    y = c(0, 1, 0, 1),
    xshare = c(25, 25, 50, 50, 100, 100, 300, 300),
    yshare = c(0, 0, 0, 1, 0, 0, 0, 1)
)

Example of LogisticLogNormalMixture (cont’d)

Let’s look at the model definition:

body(online_mix_prior@priormodel)

{
    for (k in 1:2) {
        theta[k, 1:2] ~ dmnorm(mean, prec)
        alpha0[k] <- theta[k, 1]
        alpha1[k] <- exp(theta[k, 2])
    }
    comp ~ dcat(cat_probs)
}

body(online_mix_prior@datamodel)

{
    for (i in 1:nObs) {
        stand_log_dose[i] <- log(x[i]/ref_dose)
        logit(p[i]) <- alpha0[comp] + alpha1[comp] * stand_log_dose[i]
        y[i] ~ dbern(p[i])
    }
    for (j in 1:nObsshare) {
        stand_log_dose_share[j] <- log(xshare[j]/ref_dose)
        logit(pshare[j]) <- alpha0[2] + alpha1[2] * stand_log_dose_share[j]
        yshare[j] ~ dbern(pshare[j])
    }
}

We see that comp == 2 means that the current and external data are from the same distribution, while comp == 1 means they are from different distributions.

Example of LogisticLogNormalMixture (cont’d)

We can produce samples then as usually:

samples_share <- mcmc(data_share, online_mix_prior, McmcOptions())
plot(samples_share, online_mix_prior, data_share)

Probability that the current and external data are from the same distribution:

mean(samples_share@data$comp == 2)

[1] 0.5645

Time-to-event (TITE) designs

Motivation

The TITE-CRM design was introduced by Cheung and Chappell (2004) to address the issue of late-onset toxicities in dose-escalation trials (e.g. up to 6 months in radiotherapy trials)
It would be impractical to always wait for the full possible DLT assessment period to elapse for all patients before making dose-escalation decisions, as this would lead to very long trial durations (e.g. 12 years)
The TITE-CRM design allows for making dose-escalation decisions based on partial information about the occurrence of DLTs, by using a weighted likelihood approach that accounts for the time-to-event nature of the data
This leads to much shorter trial durations (e.g. 2-4 years) while still maintaining the CRM’s accuracy and safety

Required design inputs

In addition to the standard CRM design inputs, additional inputs are required for TITE-CRM designs:

A DLT observation time period of length $T_{\textrm{max}}$ , also called the DLT window
A weighting function $w(t)$ that assigns weights to the observed data based on the time of observation relative to the DLT window
The required follow-up time before enrolling the next cohort at a potentially higher dose level, called “safety window” in crmPack

Model specification

The TITE-CRM uses a pseudo-likelihood function:

$\prod_{i=1}^{n} \left[ (p(x_i)w(t_i))^{y_i} (1 - p(x_i)w(t_i))^{1 - y_i} \right]$

where $p(x_i)$ is a standard CRM model for the probability of DLT at dose $x_i$ , $y_i$ is the binary indicator of whether a DLT was observed for patient $i$ , and $w(t_i)$ is the weight assigned to patient $i$ based on the time of observation $t_i$ relative to the DLT window.

Note: In crmPack this is implemented with the “Poisson trick” in JAGS. In principle this could work for any CRM model, but we restrict it to LogisticLogNormal so far in crmPack.

Weighting functions

In crmPack you can choose between

the linear weighting function $w(t) = \min(1, t / T_{\textrm{max}})$ , which is the most commonly used weighting function in the literature, and
the adaptive weighting function proposed in formula (3) of Cheung and Chappell (2004): $w(t) = \frac{\kappa}{z + 1} + \frac{1}{z + 1}\left(\frac{t - t_{(\kappa)}}{t_{(\kappa + 1)} - t_{(\kappa)}}\right)$
- where $z$ is the total number of DLTs, $0 \equiv t_{(0)} < t_{(1)} \leq \cdots \leq t_{(z)} < t_{(z+1)} \equiv T_{\textrm{max}}$ are the ordered failure times, and $\kappa = \max_{0 \leq j \leq z}\{j : t \geq t_{(j)}\}$
- Basically, the adaptive function allocates more weight to each record than the linear function when DLTs are observed early and less weight when DLTs are observed late.

Illustration

From Fig. 1 from Werkhoven et al. (2020)

Example with crmPack

model <- TITELogisticLogNormal(
    mean = c(1.33, 1.49),
    cov = matrix(c(1.826, 0.0209, 0.0209, 0.0245), nrow = 2),
    ref_dose = 56,
    weight_method = "linear"
)
increments <- IncrementsRelative(intervals = c(0, 20), increments = c(2, 1))
next_best <- NextBestMTD(
    # Note that we use a "nonparametric" rule here.
    target = 0.3,
    derive = function(mtd_samples) mean(mtd_samples)
)
stopping <- StoppingMinPatients(n = 20)
size <- CohortSizeConst(3)
t_max <- 42
# Note that we need to use the DataDA class because we need to have
# DLT times in the data.
empty_data <- DataDA(doseGrid = seq(from = 2, to = 50, by = 2), Tmax = t_max)
safety_window <- SafetyWindowConst(gap = c(7, 5), follow = 7, follow_min = 14)

Example with crmPack (cont’d)

Some observations here:

We use a “nonparametric” rule for the next best dose, because we do not (yet) have a TITE-CRM specific model-based rule implemented
We use the DataDA class for the data, because we need to have DLT times in the data
The safety window can be a bit confusing:
- gap is the time between patients in the same cohort (so 7 days between the first and second patient, and 5 days between the second and third patient)
- follow is the time we need to wait after the last patient in the cohort before enrolling the next cohort
- follow_min is the time at least one patient needs to have been followed up for before enrolling the next cohort

Example with crmPack (cont’d)

We need to define the true scenario in terms of both the true toxicity probabilities at the doses of interest and the distribution of DLT times:

truth <- function(x) {
    plogis(2 + 3 * log(x / 56))
}

onset <- 3
exp_cond_cdf <- function(x) {
    1 -
        (pexp(x, 1 / onset, lower.tail = FALSE) -
            pexp(t_max, 1 / onset, lower.tail = FALSE)) /
            pexp(t_max, 1 / onset)
}

Example with crmPack (cont’d)

Let’s plot the CDF of this truncated exponential distribution of DLT times (constrained to the DLT window) to see how it looks like:

curve(exp_cond_cdf, from = 0, to = t_max, ylab = "CDF", xlab = "Time")
abline(v = onset)

Example with crmPack (cont’d)

Now we can define the design and run simulations:

design <- DADesign(
    model = model,
    increments = increments,
    nextBest = next_best,
    stopping = stopping,
    cohort_size = size,
    data = empty_data,
    safetyWindow = safety_window,
    startingDose = 2
)
sims <- simulate(
    object = design,
    truthTox = truth,
    truthSurv = exp_cond_cdf,
    trueTmax = t_max,
    nsim = 100,
    seed = 413,
    mcmcOptions = McmcOptions(
        burnin = 500,
        step = 2,
        samples = 200,
        rng_kind = "Mersenne-Twister",
        rng_seed = 413
    )
)

Example with crmPack (cont’d)

summary(sims, truth = truth, target = c(0.29, 0.31))

Summary of 100 simulations

Target toxicity interval was 29, 31 %
Target dose interval corresponding to this was 21.3, 22 
Intervals are corresponding to 10 and 90 % quantiles

Number of patients overall : mean 21 (21, 21) 
Number of patients treated above target tox interval : mean 8 (0, 12) 
Proportions of DLTs in the trials : mean 27 % (19 %, 33 %) 
Mean toxicity risks for the patients on active : mean 27 % (17 %, 35 %) 
Doses selected as MTD : mean 23.3 (18, 28) 
True toxicity at doses selected : mean 35 % (20 %, 48 %) 
Proportion of trials selecting target MTD: 25 %
Dose most often selected as MTD: 22 
Observed toxicity rate at dose most often selected: 28 %
Fitted toxicity rate at dose most often selected : mean 29 % (17 %, 46 %) 
Stop reason triggered:
 ≥ 20 patients dosed :  100 %

Practical aspects

Werkhoven et al. (2020) discuss practical aspects of running TITE-CRM designs:

They “have found that designing and running early phase trials using the TiTE-CRM can be complex, but is feasible and worthwhile”
Prior probability of DLTs at the dose grid:
- use the “background DLT probability”, i.e. without any experimental treatment, as reference (all probabilities need to be higher)
- effect of the addition of the experimental drug can then be approximated by multiplication with dose enhancement factors obtained from preclinical studies
Recommend to have the first few patients treated at the starting dose, to avoid aggressive escalation early in the trial
Don’t just report the final MTD estimate, but also the credible interval for it: particularly relevant with incomplete follow-up data

References

Cheung, Ying Kuen, and Rick Chappell. 2004. “Sequential Designs for Phase i Clinical Trials with Late-Onset Toxicities.” Biometrics 56 (4): 1177–82. https://doi.org/10.1111/j.0006-341X.2000.01177.x.

Neuenschwander, B, M Branson, and T Gsponer. 2008. “Critical Aspects of the Bayesian Approach to Phase i Cancer Trials.” Statistics in Medicine.

Werkhoven, Erik van, Samantha Hinsley, Eleni Frangou, et al. 2020. “Practicalities in Running Early-Phase Trials Using the Time-to-Event Continual Reassessment Method (TiTE-CRM) for Interventions with Long Toxicity Periods Using Two Radiotherapy Oncology Trials as Examples.” BMC Medical Research Methodology 20 (1): 162. https://doi.org/10.1186/s12874-020-01012-z.