A helper function to guide selection of the k_smooth, k_trt,
k_env, and gamma parameters used by functional_curves.
Can infer the effective replication from a data frame or accept scalar values
directly when the data are not yet available.
For sparse epidemic data it is strongly recommended to use
rule = "minimum" so that k values are based on the least-replicated
treatment-by-environment combination, guarding against over-fitting.
Arguments
- data
Optional data frame. If supplied,
time,treatment, and optionallyenvironmentare used to compute summaries of the number of unique time points and environments per treatment.- time
Unquoted column name for the time variable, or a character string naming the column.
- treatment
Unquoted column name for the treatment / cultivar variable, or a character string naming the column.
- environment
Optional unquoted column name for the environment variable, or a character string naming the column.
- n_time
Integer. Number of unique time points to use directly (ignored when
datais supplied).- n_env
Integer. Number of unique environments to use directly (ignored when
datais supplied, or when there is no environment variable).- rule
Character string. How to summarise the distribution of replication counts across treatment-by-environment combinations:
"minimum"(default, conservative) or"median".- smoothness
Character string controlling how liberal the recommendations are:
"conservative"(default),"moderate", or"flexible".
Value
A named list with the following elements:
time_summaryNamed numeric vector with minimum, median, and maximum unique time points per treatment-by-environment combination.
environment_summaryNamed numeric vector with minimum, median, and maximum unique environments per treatment (or
NULLwhen no environment variable is given).effective_n_timeThe effective number of time points chosen by
rule.effective_n_envThe effective number of environments chosen by
rule, orNULL.k_smoothRecommended basis dimension for the global time smooth.
k_trtRecommended basis dimension for the treatment-specific smooth.
k_envRecommended basis dimension for the environment random effect.
gammaRecommended penalisation multiplier.
messageA short interpretation message.
Details
The function computes, for every treatment-by-environment combination, the
number of unique time points at which observations are available. It then
summarises these counts using the chosen rule to obtain
effective_n_time. Similarly it computes, per treatment, the number
of unique environments, and summarises using the same rule to obtain
effective_n_env.
Recommended k values follow these heuristics:
k_smooth: global smooth over time; capped beloweffective_n_time - 1.k_trt: treatment-specific smooth; capped belowk_smooth.k_env: environment random effect; capped beloweffective_n_env.gamma: penalty multiplier; higher values encourage smoother fits.
Examples
# Using explicit values
suggest_k(n_time = 5, n_env = 8)
#> Error in suggest_k(n_time = 5, n_env = 8): could not find function "suggest_k"
# Inferring from a data frame (unquoted column names)
df <- data.frame(
time = rep(1:6, times = 6),
cultivar = rep(c("A", "B", "C"), each = 12),
env = rep(rep(c("E1", "E2"), each = 6), times = 3),
severity = runif(36, 0, 0.5)
)
suggest_k(
data = df,
time = time,
treatment = cultivar,
environment = env,
rule = "minimum",
smoothness = "conservative"
)
#> Error in suggest_k(data = df, time = time, treatment = cultivar, environment = env, rule = "minimum", smoothness = "conservative"): could not find function "suggest_k"
