OMRF

Implementation details of the ordinal Markov random field in bgms. Source: src/models/omrf/omrf_model.h and omrf_model.cpp.

Ordinal threshold parameterization

For an ordinal variable \(X_i\) with \(K_i + 1\) categories \((0, 1, \ldots, K_i)\), the model assigns \(K_i\) threshold parameters \(\mu_i(1), \ldots, \mu_i(K_i)\). Category \(c = 0\) serves as the reference with \(\mu_i(0) = 0\). The conditional probability of observing category \(c\) for person \(n\) is:

\[ P(X_{ni} = c \mid \text{rest}) = \frac{\exp\!\bigl(\mu_i(c) + c \cdot r_{ni}\bigr)}{\sum_{k=0}^{K_i} \exp\!\bigl(\mu_i(k) + k \cdot r_{ni}\bigr)} \]

where \(r_{ni} = 2\sum_{j \neq i} x_{nj}\, \omega_{ij}\) is the residual score summarizing the pairwise interactions with all other variables.

Blume-Capel model

For Blume-Capel variables, the threshold structure is replaced by two parameters: a linear effect \(\alpha_i\) and a quadratic effect \(\beta_i\). The category score is:

\[ \mu_i(c) = \alpha_i (c - b_i) + \beta_i (c - b_i)^2 \]

where \(b_i\) is the reference (baseline) category. This allows the model to capture non-monotone category effects, such as preferences for moderate responses.

Residual matrix

The model maintains a precomputed \(n \times p\) residual matrix where entry \((n, i)\) stores:

\[ r_{ni} = 2\sum_{j \neq i} x_{nj}\, \omega_{ij} \]

When a single interaction \(\omega_{ij}\) changes by \(\delta\), the residual matrix is updated incrementally:

Column \(i\): \(r_{ni} \leftarrow r_{ni} + 2\delta \cdot x_{nj}\) for all \(n\)
Column \(j\): \(r_{nj} \leftarrow r_{nj} + 2\delta \cdot x_{ni}\) for all \(n\)

This avoids recomputing the full matrix product after each parameter update.

Pseudolikelihood

The OMRF uses a pseudolikelihood approximation: the product of full conditional distributions rather than the intractable joint distribution. The log-pseudolikelihood is:

For a cross-model overview, see Pseudolikelihood.

\[ \text{PL}(\boldsymbol{\theta}) = \sum_{i=1}^{p} \sum_{n=1}^{N} \log P(X_{ni} = x_{ni} \mid \text{rest}) \]

The gradient of the log-pseudolikelihood with respect to each parameter involves the expected category values under the current conditionals, computed by the Variable Helpers.

Gradient computation

The logp_and_gradient() method computes both the log-pseudoposterior and its gradient in a single pass. The observed contribution to the gradient for a pairwise interaction \(\omega_{ij}\) is \(4 \sum_{n} x_{ni} x_{nj}\) (from the sufficient statistic \(\mathbf{X}^\top\mathbf{X}\)). The expected contribution accumulates over both variables:

\[ \frac{\partial \text{PL}}{\partial \omega_{ij}} = 4\sum_{n=1}^{N} x_{ni}\, x_{nj} - 2\sum_{n=1}^{N} \bigl(E[X_i \mid \text{rest}]\, x_{nj} + E[X_j \mid \text{rest}]\, x_{ni}\bigr) \]

The expected values \(E[X_i \mid \text{rest}]\) are computed from the category probabilities, which are obtained via the numerically stable FAST/SAFE paths in variable_helpers.cpp. The pairwise gradient is vectorized using BLAS matrix products: \(\mathbf{X}^\top \mathbf{E}\) where \(\mathbf{E}\) is the matrix of conditional expected values.

Edge indicator updates

Edge selection uses a Metropolis-Hastings sampler based on the mixtures of mutually singular (MoMS) distributions framework (Gottardo & Raftery, 2008; van den Bergh et al., 2026). All interaction parameters live in a fixed-dimensional space; excluded edges are represented as parameters fixed at zero rather than removed from the model.

When edge selection is active, the model performs a full scan over all \(p(p-1)/2\) edges after each gradient-based step. For each edge \((i, j)\):

Add move (\(\gamma_{ij} = 0 \to 1\)). Propose a new interaction value \(\omega_{ij}^\star \sim \mathcal{N}(\omega_{ij},\, \sigma_{ij}^2)\) and accept with probability \[ \alpha = \min\!\left\{1,\; \frac{p(\boldsymbol{\gamma}^\star, \boldsymbol{\omega}^\star \mid \mathbf{X})} {p(\boldsymbol{\gamma},\, \boldsymbol{\omega} \mid \mathbf{X})} \;\frac{1}{q(\omega_{ij}^\star \mid \omega_{ij})} \right\} \] where the posterior ratio includes the pseudolikelihood ratio, the Cauchy slab prior on \(\omega_{ij}^\star\), and the prior inclusion odds.

Delete move (\(\gamma_{ij} = 1 \to 0\)). Set \(\omega_{ij}^\star = 0\) and accept with probability \[ \alpha = \min\!\left\{1,\; \frac{p(\boldsymbol{\gamma}^\star, \boldsymbol{\omega}^\star \mid \mathbf{X})} {p(\boldsymbol{\gamma},\, \boldsymbol{\omega} \mid \mathbf{X})} \;\frac{q(\omega_{ij} \mid \omega_{ij}^\star)}{1} \right\} \]

The scan order is randomized at each iteration via prepare_iteration(), which calls arma_randperm().

Adaptive Metropolis fallback

Even when NUTS is the primary sampler, the OMRF implements element-wise adaptive Metropolis proposals for use during warmup stage 3b (proposal SD learning) and as a user-selectable alternative via update_method = "adaptive-metropolis". The proposal SDs are tuned by Robbins-Monro targeting 0.44 acceptance, clamped to \([0.001, 2.0]\).

References

Gottardo, R., & Raftery, A. E. (2008). Markov chain Monte Carlo with mixtures of mutually singular distributions. Journal of Computational and Graphical Statistics, 17(4), 949–975. https://doi.org/10.1198/106186008X386102

van den Bergh, D., Clyde, M. A., Raftery, A. E., & Marsman, M. (2026). Reversible jump MCMC with no regrets: Bayesian variable selection using mixtures of mutually singular distributions. Manuscript in Preparation.