Mixed MRF

Implementation details of the mixed Markov random field in bgms. Source: src/models/mixed/mixed_mrf_model.h and four .cpp files (mixed_mrf_model.cpp, mixed_mrf_gradient.cpp, mixed_mrf_likelihoods.cpp, mixed_mrf_metropolis.cpp).

Block structure

The mixed MRF handles \(p\) discrete and \(q\) continuous variables. The parameter space is organized into three interaction blocks plus main effects:

Block	Parameters	Size
Main effects	Discrete thresholds + continuous means	\(\text{thresholds} + q\)
Discrete-discrete	Pairwise discrete interactions (symmetric)	\(p(p-1)/2\)
Cross-type	Discrete-continuous interactions	\(p \times q\)
Continuous-continuous	Cholesky factor of the precision matrix	\(q(q+1)/2\)

All five blocks are updated jointly by NUTS. The continuous precision matrix is parameterized through its Cholesky factor \(R\) (with diagonal entries on the log scale) so that positive definiteness is guaranteed by construction. The gradient includes the Jacobian of the Cholesky-to-precision mapping.

Parameter vector layout

The model distinguishes three vector types:

Active vector (`parameter_dimension()`)

Contains the parameters that NUTS updates, excluding edges set to zero by edge selection. The Cholesky block is always fully included:

[ thresholds | discrete interactions (active) | means | cross-type (active) | Cholesky (all) ]

The mass matrix and gradient are dimensioned to this vector.

Full vector (`full_parameter_dimension()`)

Contains all parameters including inactive edges:

[ thresholds | discrete interactions (all) | means | cross-type (all) | Cholesky (all) ]

Used for storage and initialization.

Storage vector (`storage_dimension()`)

Identical layout to the full vector. This is what gets saved to ChainResult::samples at each iteration.

Sampling strategy

Each MCMC iteration updates the full active parameter vector (all five blocks) in a single NUTS step. The logp_and_gradient() method computes the log-pseudoposterior and gradient for the entire vector, including the Cholesky block with its log-determinant Jacobian.

When edge selection is active (or the initial graph is sparse), the model reports has_constraints() = true and the NUTS sampler switches to constrained (RATTLE) integration, using logp_and_gradient_full() and projection callbacks to enforce zero entries for excluded edges (see Constrained Leapfrog (RATTLE)).

Edge indicator updates (when active) operate on all three interaction blocks.

The model also provides a component-wise adaptive Metropolis fallback (do_one_metropolis_step()), used when the sampler is set to "adaptive-metropolis". In Metropolis mode, each parameter is updated individually with Robbins–Monro-tuned proposal SDs targeting an acceptance rate of 0.44. Off-diagonal precision updates use a rank-2 Cholesky update; diagonal updates use rank-1.

Gradient computation

The gradient (mixed_mrf_gradient.cpp) covers all five parameter blocks in the NUTS vector. It involves:

Discrete conditionals — Category probabilities via Variable Helpers, contributing gradients for thresholds and discrete pairwise interactions
Cross-type terms — Contributions from discrete-continuous interactions to both discrete and continuous conditional distributions
Continuous means — Gradient of the continuous-variable contribution to the pseudolikelihood
Cholesky block — The precision gradient \(\bar{\Omega}\) is computed from the GGM conditional contribution and the conditional-mean coupling, then mapped to the Cholesky gradient \(\bar{R} = R(\bar{\Omega} + \bar{\Omega}^\top)\). Diagonal entries use the log-scale Jacobian: \(\partial\ell / \partial\psi_j = \bar{R}_{jj} R_{jj} + (q - j + 1)\).

The log-prior gradient adds Cauchy penalties on off-diagonal precision entries and Gamma(1,1) penalties on diagonal entries.

Discrete pseudolikelihood

The discrete full conditionals integrate out the continuous block: \(\log p(x_s \mid x_{-s})\) rather than \(\log p(x_s \mid x_{-s}, y_{\text{obs}})\). The continuous block uses the GGM likelihood \(\log p(y \mid x) \propto \tfrac{n}{2}\log|\Omega| - \tfrac{1}{2}\operatorname{tr}(\Omega D^\top D)\).

For a cross-model overview, see Pseudolikelihood.

The discrete conditionals use the marginal interaction matrix \(M = A_{xx} + 2A_{xy}\Sigma_{yy}A_{xy}^\top\), which must be recomputed whenever the precision or cross-type parameters change. The \(\partial M/\partial\Sigma_{yy}\) chain rule contributes to the precision gradient via \(-2\Sigma_{yy} A_{xy}^\top\bar{M} A_{xy}\Sigma_{yy}\).

Missing data

The mixed MRF tracks missing indices separately for discrete and continuous variables (missing_index_discrete, missing_index_continuous). During imputation:

Missing discrete values are sampled from their full conditional categorical distributions
Missing continuous values are sampled from their conditional normal distributions given the current precision matrix

Output assembly

The R output builder (build_output_mixed_mrf() in build_output.R) maps the flat storage vector back to the original variable ordering using fill_mixed_symmetric(). This function places discrete-discrete, continuous-continuous, and cross-type interactions into the correct cells of the \((p+q) \times (p+q)\) output matrix, respecting the original column order (which may interleave discrete and continuous variables).