Warmup Schedule

bgms uses a multi-stage warmup schedule to tune MCMC sampler parameters before collecting posterior samples. The three core stages (1, 2, 3a) follow Stan’s warmup scheme for NUTS. Stages 3b and 3c are bgms extensions that accommodate the spike-and-slab edge selection machinery. The schedule is defined in src/mcmc/execution/warmup_schedule.h.

Five stages

The warmup period is divided into five stages. Stages 3b and 3c only apply when edge selection is enabled.

Stage	Name	Purpose
1	Initial step-size adaptation	Tune step size with identity mass matrix
2	Adaptation windows	Tune step size (dual averaging) and mass matrix (Welford)
3a	Terminal step-size tuning	Final step-size pass with frozen mass matrix
3b	Proposal-SD tuning	Tune element-wise Metropolis proposal SDs (edge selection not yet active)
3c	Re-adaptation under edge selection	Re-tune step size after edge selection changes the geometry

Stage 1: Initial step-size adaptation

A short burn-in phase (default: the first 75 iterations). The mass matrix is held at the identity. Dual averaging tunes the step size toward a target acceptance rate of 0.8. The starting value of $\epsilon$ comes from Algorithm 4 in Hoffman & Gelman (2014), which doubles or halves a trial step size until the Metropolis acceptance ratio of a single leapfrog step crosses the target acceptance rate.

Stage 2: Adaptation windows

The step size and mass matrix are tuned in expanding windows. The first window has 25 iterations. Each subsequent window doubles in size (25 → 50 → 100 → …) until the Stage 3a boundary is reached.

Within each window, a Welford running variance accumulates the sampled parameter vectors. At the end of each window, the diagonal mass matrix is set to

\[ \hat{M}^{-1}_j = \frac{n}{n + w}\,\hat{\sigma}^2_j + \frac{w}{n + w}\,\sigma^2_0, \]

blending the empirical variance $\hat{\sigma}^2_j$ (from $n$ samples) with a prior variance $\sigma^2_0 = 10^{-3}$ at weight $w = 5$. After each mass-matrix update:

The step-size heuristic reruns with the new mass matrix
Dual averaging restarts
The Welford accumulator resets for the next window

Stage 3a: Terminal step-size tuning

The mass matrix is frozen. Dual averaging continues to give the step size a final tuning pass against the converged mass matrix. At the end of this stage, the step size is fixed at the dual-averaging estimate.

Stage 3b: Proposal-SD tuning

Step-size adaptation is frozen. For models with element-wise Metropolis proposals (e.g., the OMRF’s category parameters), the proposal standard deviations are tuned via Robbins–Monro updates:

\[ \sigma_{k+1} = \sigma_k \cdot \exp\!\bigl(w_k (\alpha_k - \alpha^*)\bigr) \]

where $\alpha_k$ is the observed acceptance probability, $\alpha^* = 0.44$ is the target, and $w_k = k^{-0.75}$ is the decay weight. Edge selection is not yet active in this stage.

Stage 3c: Re-adaptation under edge selection

Edge selection is activated. Edge indicators $\gamma_{ij}$ are updated every iteration, changing the effective parameter space. Dual averaging restarts to re-tune $\epsilon$ for the new geometry. For models with constraints, the constrained step-size heuristic (using RATTLE) is used.

Budget allocation

When edge selection is active, the total warmup budget is split as:

Stages 1–3a: 85% of total warmup
Stage 3b: 10% (skipped if fewer than 20 iterations)
Stage 3c: ~5%

When edge selection is inactive, stages 3b and 3c are skipped and the full budget is allocated to stages 1–3a.

Warning system

The WarmupSchedule tracks a warning_type field that records when the warmup budget is insufficient:

Code	Meaning
0	No warning
1	Warmup too short for the standard schedule
2	Proportional fallback used (reduced window sizes)
3	Limited tuning (fewer adaptation windows)
4	Proposal SD learning skipped (stage 3b dropped)

The warmup_check table returned by bgm() reports these warnings per chain, accessible via fit$warmup_check in R.

Dual averaging

The step-size update follows Algorithm 6 in Hoffman & Gelman (2014) with parameters $\gamma = 0.05$, $t_0 = 10$, $\kappa = 0.75$, and bias target $\mu = \log(10\,\epsilon_0)$.

Querying the schedule

The WarmupSchedule struct exposes predicate methods used by the chain runner and samplers:

bool in_stage1(int i) const;
bool in_stage2(int i) const;
bool in_stage3a(int i) const;
bool in_stage3b(int i) const;
bool in_stage3c(int i) const;
bool sampling(int i) const;          // i >= total_warmup
bool selection_enabled(int i) const; // i >= stage3b_start (or never)

These predicates control when the adaptation controller accumulates samples, when edge selection is turned on, and when the chain transitions from warmup to sampling.

References

Hoffman, M. D., & Gelman, A. (2014). The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15, 1593–1623. https://jmlr.org/papers/v15/hoffman14a.html