Mixed MRF
Implementation details of the mixed Markov random field in bgms. Source: src/models/mixed/mixed_mrf_model.h and four .cpp files (mixed_mrf_model.cpp, mixed_mrf_gradient.cpp, mixed_mrf_likelihoods.cpp, mixed_mrf_metropolis.cpp).
Block structure
The mixed MRF handles \(p\) discrete and \(q\) continuous variables. The parameter space is organized into three interaction blocks plus main effects:
| Block | Parameters | Size |
|---|---|---|
| Main effects | Discrete thresholds + continuous means | \(\text{thresholds} + q\) |
| Discrete-discrete | Pairwise discrete interactions (symmetric) | \(p(p-1)/2\) |
| Cross-type | Discrete-continuous interactions | \(p \times q\) |
| Continuous-continuous | Cholesky factor of the precision matrix | \(q(q+1)/2\) |
All five blocks are updated jointly by NUTS. The continuous precision matrix is parameterized through its Cholesky factor \(R\) (with diagonal entries on the log scale) so that positive definiteness is guaranteed by construction. The gradient includes the Jacobian of the Cholesky-to-precision mapping.
Parameter vector layout
The model distinguishes three vector types:
Active vector (parameter_dimension())
Contains the parameters that NUTS updates, excluding edges set to zero by edge selection. The Cholesky block is always fully included:
[ thresholds | discrete interactions (active) | means | cross-type (active) | Cholesky (all) ]
The mass matrix and gradient are dimensioned to this vector.
Full vector (full_parameter_dimension())
Contains all parameters including inactive edges:
[ thresholds | discrete interactions (all) | means | cross-type (all) | Cholesky (all) ]
Used for storage and initialization.
Storage vector (storage_dimension())
Identical layout to the full vector. This is what gets saved to ChainResult::samples at each iteration.
Sampling strategy
Each MCMC iteration updates the full active parameter vector (all five blocks) in a single NUTS step. The logp_and_gradient() method computes the log-pseudoposterior and gradient for the entire vector, including the Cholesky block with its log-determinant Jacobian.
When edge selection is active (or the initial graph is sparse), the model reports has_constraints() = true and the NUTS sampler switches to constrained (RATTLE) integration, using logp_and_gradient_full() and projection callbacks to enforce zero entries for excluded edges (see Constrained Leapfrog (RATTLE)).
Edge indicator updates (when active) operate on all three interaction blocks.
The model also provides a component-wise adaptive Metropolis fallback (do_one_metropolis_step()), used when the sampler is set to "adaptive-metropolis". In Metropolis mode, each parameter is updated individually with Robbins–Monro-tuned proposal SDs targeting an acceptance rate of 0.44. Off-diagonal precision updates use a rank-2 Cholesky update; diagonal updates use rank-1.
Gradient computation
The gradient (mixed_mrf_gradient.cpp) covers all five parameter blocks in the NUTS vector. It involves:
- Discrete conditionals — Category probabilities via Variable Helpers, contributing gradients for thresholds and discrete pairwise interactions
- Cross-type terms — Contributions from discrete-continuous interactions to both discrete and continuous conditional distributions
- Continuous means — Gradient of the continuous-variable contribution to the pseudolikelihood
- Cholesky block — The precision gradient \(\bar{\Omega}\) is computed from the GGM conditional contribution and the conditional-mean coupling, then mapped to the Cholesky gradient \(\bar{R} = R(\bar{\Omega} + \bar{\Omega}^\top)\). Diagonal entries use the log-scale Jacobian: \(\partial\ell / \partial\psi_j = \bar{R}_{jj} R_{jj} + (q - j + 1)\).
The log-prior gradient adds Cauchy penalties on off-diagonal precision entries and Gamma(1,1) penalties on diagonal entries.
Pseudolikelihood options
The pseudolikelihood argument controls how the discrete full conditionals are computed. It does not affect the continuous block, which always uses the GGM likelihood \(\log p(y \mid x) \propto \tfrac{n}{2}\log|\Omega| - \tfrac{1}{2}\operatorname{tr}(\Omega D^\top D)\).
For a cross-model overview, see Pseudolikelihood.
"conditional"(default) — Computes discrete full conditionals \(\log p(x_s \mid x_{-s}, y_{\text{obs}})\), conditioning on the observed continuous values. The rest score usespairwise_effects_discrete_directly. Faster because updating the continuous precision elements does not require re-evaluating the discrete pseudolikelihood."marginal"— Computes discrete full conditionals \(\log p(x_s \mid x_{-s})\) by integrating out the continuous variables. Uses the marginal interaction matrix \(\Theta = 2A_{xx} + 2A_{xy}\Sigma_{yy}A_{xy}^\top\), which must be recomputed whenever the precision or cross-type parameters change. This adds coupling terms to the gradient (the \(\partial\Theta/\partial\Sigma\) chain rule contributes to the precision gradient via \(-2\Sigma A_{xy}^\top\bar\Theta A_{xy}\Sigma\)).
Missing data
The mixed MRF tracks missing indices separately for discrete and continuous variables (missing_index_discrete, missing_index_continuous). During imputation:
- Missing discrete values are sampled from their full conditional categorical distributions
- Missing continuous values are sampled from their conditional normal distributions given the current precision matrix
Output assembly
The R output builder (build_output_mixed_mrf() in build_output.R) maps the flat storage vector back to the original variable ordering using fill_mixed_symmetric(). This function places discrete-discrete, continuous-continuous, and cross-type interactions into the correct cells of the \((p+q) \times (p+q)\) output matrix, respecting the original column order (which may interleave discrete and continuous variables).