A penalised multinomial logistic regression engine that incorporates Deep Mutational Scanning (DMS) escape scores as informative priors on variant fitness. This is valuable for early-emergence scenarios where a new lineage has few observed sequences but laboratory-measured phenotypic data (e.g., ACE2 binding affinity, antibody escape) are available.
Arguments
- data
An
lfq_dataobject.- dms_scores
Named numeric vector of DMS-derived fitness priors. Names correspond to lineage identifiers. Values are on the log growth rate scale (positive = fitter than average). Lineages not in the vector receive a prior of 0.
- lambda
Regularisation strength (penalty weight). Default 1.0. Larger values pull estimates more strongly toward the DMS prior. At
lambda = 0, the result is identical to the standard MLR engine.- pivot
Reference lineage name. Default
NULL(automatic selection).- ci_level
Confidence level. Default 0.95.
Value
An lfq_fit object compatible with all downstream
functions (forecast, growth_advantage, etc.).
Details
The approach uses penalised maximum likelihood where the penalty is proportional to the squared difference between the estimated growth rate and the DMS-derived prior. This implements an empirical Bayes shrinkage: with abundant data, the penalty has little effect; with sparse data, estimates are pulled toward the DMS prior.
The penalised log-likelihood is: $$\ell_{\text{pen}}(\alpha, \delta) = \ell(\alpha, \delta) - \frac{\lambda}{2} \sum_v (\delta_v - \mu_v)^2$$ where \(\ell\) is the standard multinomial log-likelihood, \(\delta_v\) is the growth rate for lineage \(v\), and \(\mu_v\) is the DMS prior. The Hessian is adjusted accordingly, ensuring correct confidence interval widths.
References
Dadonaite B, Crawford KHD, Radford CE, et al. (2023). A pseudovirus system enables deep mutational scanning of the full SARS-CoV-2 spike. Cell, 186(6), 1263–1278. doi:10.1016/j.cell.2023.02.001
Bloom JD, Neher RA (2023). Fitness effects of mutations to SARS-CoV-2 proteins. Virus Evolution, 9(2), vead055. doi:10.1093/ve/vead055
See also
fit_model for the standard MLR engine.
Examples
# \donttest{
sim <- simulate_dynamics(n_lineages = 3,
advantages = c("A" = 1.3, "B" = 0.9),
n_timepoints = 8, total_per_tp = 100, seed = 1)
# DMS suggests lineage A has fitness advantage
dms <- c("A" = 0.04, "B" = -0.02)
fit_dms <- fit_dms_prior(sim, dms_scores = dms, lambda = 2)
growth_advantage(fit_dms)
#> # A tibble: 3 × 6
#> lineage estimate lower upper type pivot
#> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 A 0.227 0.150 0.304 growth_rate ref
#> 2 B -0.0882 -0.186 0.0100 growth_rate ref
#> 3 ref 0 0 0 growth_rate ref
# }