NUTS Algorithm

The No-U-Turn Sampler (NUTS) is the primary MCMC algorithm used in bgms for continuous parameter sampling. It extends Hamiltonian Monte Carlo by adaptively selecting the trajectory length, eliminating the need to tune the number of leapfrog steps.

The implementation is in src/mcmc/algorithms/nuts.h and nuts.cpp, based on the NUTS algorithm described in Hoffman & Gelman (2014) with the generalized U-turn criterion from Betancourt (2017).

Leapfrog integration

Each elementary step of HMC is a leapfrog integrator that approximately preserves the Hamiltonian $H(q, p) = -\log \pi(q) + \frac{1}{2} p^\top M^{-1} p$:

Half-step momentum: $p \leftarrow p + \frac{\epsilon}{2} \nabla \log \pi(q)$
Full-step position: $q \leftarrow q + \epsilon\, M^{-1} p$
Half-step momentum: $p \leftarrow p + \frac{\epsilon}{2} \nabla \log \pi(q)$

The implementation uses a diagonal mass matrix $M$, stored as the vector of inverse diagonal elements (inv_mass_diag).

Memoizer

The Memoizer class in leapfrog.h caches a single $(\log \pi, \nabla \log \pi)$ pair keyed by the position vector $q$. During NUTS tree building, each leapfrog step produces a new unique $q$, so the cache avoids redundant evaluations at tree-node boundaries where the same $q$ appears as both the end of one subtree and the start of another.

When RATTLE is active, the SHAKE projection moves $q$ to a new point on the constraint manifold, making the cached gradient stale. The Memoizer is explicitly invalidated after projection, forcing a fresh gradient evaluation at the projected position. The reversibility check in leapfrog_constrained_checked() uses a separate Memoizer instance for its backward verification step, so the caller’s cache remains pointed at the forward-step result.

Constrained leapfrog (RATTLE)

When a model has constraints (edge selection or a sparse graph in the mixed MRF), the standard leapfrog integrator cannot enforce zero entries for excluded edges. bgms uses a RATTLE-based constrained leapfrog scheme that projects position and momentum onto the constraint manifold at each step, with a runtime reversibility check that detects numerical failures.

The full algorithm, projection callbacks, reversibility check, and tolerance scaling are documented in Constrained Leapfrog (RATTLE).

Integrator dispatch

At each leaf of the NUTS tree (the base case of build_tree), the algorithm selects between two integrators:

If the model provides projection callbacks (non-null project_position and project_momentum), the constrained path runs leapfrog_constrained_checked() — the RATTLE integrator with reversibility verification.
Otherwise, the unconstrained path runs the standard Störmer–Verlet integrator with Memoizer gradient caching.

During sampling, a non-reversible RATTLE step terminates the NUTS tree (s_prime = 0), similar to a divergence. During warmup, non-reversible steps are recorded but do not terminate the tree.

Binary tree construction

NUTS builds a balanced binary tree by doubling the trajectory in a randomly chosen direction (forward or backward) at each depth level. The tree grows until one of these stopping conditions is met:

U-turn — The trajectory starts bending back. Detected by checking whether the momentum sum across the subtree is no longer moving away from the endpoints (the generalized U-turn criterion from Betancourt, 2017).
Divergence — The Hamiltonian error exceeds a threshold (default: $\Delta H > 1000$), indicating the integrator has entered a region of high curvature.
Maximum depth — The tree reaches max_tree_depth (default 10, corresponding to $2^{10} = 1024$ leapfrog steps).

BuildTreeResult

Each subtree returns a BuildTreeResult struct containing:

Field	Purpose
`theta_min`, `r_min`	Leftmost (backward) position and momentum
`theta_plus`, `r_plus`	Rightmost (forward) position and momentum
`theta_prime`, `r_prime`	Proposed sample from this subtree
`rho`	Running momentum sum (for U-turn check)
`p_sharp_beg`, `p_sharp_end`	Mass-preconditioned momentum $M^{-1}p$ at subtree endpoints (used in the generalized U-turn criterion)
`p_beg`, `p_end`	Raw momentum at subtree endpoints (used in cross-subtree U-turn checks at merge time)
`n_prime`	Number of valid states (for multinomial sampling)
`s_prime`	Continuation flag (0 = stop)
`alpha`, `n_alpha`	Acceptance probability accumulator
`divergent`	Whether a divergence was detected
`non_reversible`	Whether a RATTLE reversibility check failed

Proposal selection

bgms follows Stan’s base_nuts.hpp and uses multinomial candidate weighting throughout. Each leaf $i$ in a subtree carries weight $\log w_i = H_0 - h_i$, the Hamiltonian offset from the initial state; weights are accumulated with log-sum-exp. Within a subtree (build_tree) sibling subtrees are combined symmetrically: accept the new subtree with probability $\exp(w_{\text{new}} - \mathrm{LSE}(w_{\text{old}},\, w_{\text{new}}))$. At the top level (nuts_step) the new subtree is accepted via biased progressive sampling — accept with probability $\min(1,\, w_{\text{subtree}} / w_{\text{trajectory}})$ — so later subtrees are preferred on average. Divergence is detected directly by $h - H_0 > 1000$; no slice variable is used.

Divergence detection

A divergence occurs when a single leapfrog step produces a Hamiltonian error $\Delta H = H' - H_0 > 1000$. Divergences indicate that the step size is too large relative to the local curvature of the posterior. The number of divergent transitions is recorded in StepResult::NUTSDiagnostics and surfaced to the user via extract_nuts_diagnostics().

Warmup

NUTS adapts the step size $\epsilon$ and the diagonal mass matrix $M$ during a multi-stage warmup period before sampling begins. The three core stages (1, 2, 3a) follow Stan’s warmup scheme for NUTS. Stages 3b and 3c are bgms extensions that accommodate the spike-and-slab edge selection machinery.

The full warmup schedule — stage boundaries, budget allocation, adaptation algorithms, and warning system — is documented in Warmup Schedule.

NUTS diagnostics

Each NUTS step records five diagnostics:

Tree depth — Number of doubling steps (0 to max_tree_depth). Consistently hitting the maximum indicates the step size may be too small.
Divergent — Whether a divergence was detected in any leapfrog step during tree building.
Non-reversible — Whether a constrained leapfrog step (RATTLE) failed the round-trip reversibility check. Only relevant for models that use the constrained integrator (e.g., edge selection with continuous variables). A non-reversible step terminates the NUTS tree, similar to a divergence.
Energy — The Hamiltonian $H = -\log \pi(q) + \frac{1}{2} p^\top M^{-1} p$. Used to compute E-BFMI (energy Bayesian fraction of missing information).
Mean acceptance probability — The Metropolis acceptance probability averaged over every leapfrog step in the trajectory, paralleling Stan’s accept_stat__. Surfaced as fit$nuts_diag$accept_prob (chains × iterations) with a per-chain mean in the diagnostic summary; also drives dual-averaging step-size adaptation.

These diagnostics are collected per chain and accessible via extract_nuts_diagnostics() in R.

References

Betancourt, M. (2017). A conceptual introduction to Hamiltonian Monte Carlo. arXiv Preprint. https://arxiv.org/abs/1701.02434

Hoffman, M. D., & Gelman, A. (2014). The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15, 1593–1623. https://jmlr.org/papers/v15/hoffman14a.html