
Compare epidemic curves using GAM smoothing, functional distances, clustering, and optional permutation/bootstrapping
Source:R/compare_curves.R
compare_curves.RdFits a generalized additive model (GAM) to replicated disease progress curves. By default, the response is modeled with beta regression (logit link) after a Smithson–Verkuilen adjustment to handle 0 and 1 values; if beta precision (theta) estimation fails, the model falls back to a quasibinomial GAM fit on the unclipped proportion response.
From the fitted model, the function derives (i) environment-adjusted treatment mean curves on a common time grid, (ii) L2 functional distances among treatment mean trajectories followed by hierarchical clustering, and (iii) curve-level fitted trajectories (excluding unit random effects) to build a curve-by-curve distance matrix. Optionally, it performs a restricted permutation test on the curve-level distance matrix for a priori group labels, and computes bootstrap envelopes and distance uncertainty summaries from resampled predicted curves.
Usage
compare_curves(
data,
time,
response,
treatment,
environment = NULL,
block = NULL,
unit = NULL,
response_scale = c("proportion", "percent"),
eps = 1e-04,
min_points = 5,
grid_n = 140,
env_ref = NULL,
k_smooth = 10,
k_env = 4,
k_trt = 6,
gamma = 1.4,
discrete = TRUE,
cluster_k = 4,
hc_method = "ward.D2",
test_factor = NULL,
n_perm = 999,
test_mode = c("auto", "none", "global", "global_pairwise"),
min_strata_for_pairwise = 6,
perm_unit = NULL,
perm_strata = NULL,
bootstrap = FALSE,
boot_B = 399,
boot_seed = 1,
boot_ci = c(0.025, 0.975),
bootstrap_mode = c("predicted", "refit"),
...
)Arguments
- data
A data.frame containing disease progress observations in long format.
- time
Character string naming the time variable (numeric or coercible to numeric).
- response
Character string naming the response variable (disease incidence/severity).
- treatment
Character string naming the treatment/cultivar factor.
- environment
Optional character string naming an environment factor (e.g., site-year).
- block
Optional character string naming a blocking factor.
- unit
Optional character string naming a unique experimental unit (curve) identifier. If
NULL, a unit is constructed from the interaction of available factors (environment, treatment, and block).- response_scale
Character string specifying the response scale:
"proportion"(0–1) or"percent"(0–100).- eps
Numeric small constant retained for API compatibility (not used for beta handling).
- min_points
Minimum number of observations required per curve (unit) to be retained.
- grid_n
Number of time points for evaluating predicted curves on a common grid.
- env_ref
Reference environment level used for adjusted treatment mean predictions. If
NULL, the first level ofenvironmentis used.- k_smooth
Basis dimension for the global smooth of time.
- k_env
Basis dimension for the environment-specific smooth (if
environmentis provided).- k_trt
Basis dimension for the treatment-specific smooth.
- gamma
Penalization parameter passed to
mgcv::bam().- discrete
Logical; whether to use discrete (approximate) fitting in
mgcv::bam().- cluster_k
Number of clusters used to cut the hierarchical tree for treatments.
- hc_method
Clustering linkage method passed to
stats::hclust().- test_factor
Optional a priori grouping used for a distance-based permutation test. Either (i) a single character naming a column in
datathat maps uniquely to the permutation unit (seeperm_unit), or (ii) a named vector whose names are permutation-unit identifiers and whose values are group labels.- n_perm
Number of permutations for the distance-based test.
- test_mode
Character; permutation test mode. One of
"auto","none","global", or"global_pairwise". In"auto", pairwise tests may be disabled when restricted strata support is limited.- min_strata_for_pairwise
Minimum number of strata levels (e.g., blocks) required to enable pairwise tests under
test_mode = "auto".- perm_unit
Optional character naming the unit at which group labels are permuted (e.g., genotype/cultivar). If
NULL, permutation is performed at the curve (unit) level.- perm_strata
Optional character naming a factor defining restricted permutation strata (e.g., block or environment). If
NULL, defaults toblockwhen provided.- bootstrap
Logical; if
TRUE, compute bootstrap envelopes and distance summaries from resampled predicted curves.- boot_B
Integer number of bootstrap replicates.
- boot_seed
Integer seed for bootstrap resampling.
- boot_ci
Length-2 numeric vector of quantiles for bootstrap intervals (e.g.,
c(0.025, 0.975)).- bootstrap_mode
Character; bootstrap strategy. Currently supports
"predicted"(resampling predicted curves)."refit"is not implemented in this version.- ...
Reserved for future extensions.
Value
An object of class "r4pde_compare_curves" containing:
gam: fitted GAM object andfamily_used;pred: environment-adjusted treatment mean curves on the common grid;distance: treatment-by-treatment functional distance matrix andhc;clusters: treatment cluster assignments and summary scores;curve_distance: curve-by-curve distance matrix;test: optional distance-based permutation test results;bootstrap: optional bootstrap envelopes and distance summaries;diagnostic fields including
settingsand captured beta-fit warnings.
Details
Functional distances among treatments are computed as an L2 norm over the predicted mean curves evaluated on a common time grid. Curve-level distances are computed analogously using fitted trajectories (excluding unit random effects), and are scaled by the median non-zero distance for numerical stability; interpret these distances as relative rather than absolute units.
The permutation test uses a PERMANOVA-like pseudo-\(F\) statistic computed from the curve-level distance matrix, with optional restricted permutations within strata. When enabled, pairwise comparisons are adjusted using Holm's method.