Pseudolikelihood

This page describes how pseudolikelihood approximations are used in bgms across OMRF, mixed MRF, and group-comparison models.

Why pseudolikelihood

For discrete and mixed graphical models, the joint likelihood contains a normalizing constant that is expensive or intractable to evaluate for realistic network sizes. bgms therefore uses pseudolikelihood-based objectives built from full conditional distributions.

Generic form

Let \(\mathbf{x}_n\) denote observation \(n\), and let \(x_{ni}\) be variable \(i\) in that observation. The pseudolikelihood is

\[ \mathrm{PL}(\boldsymbol{\theta}) = \prod_{n=1}^{N} \prod_{i=1}^{p} p(x_{ni} \mid \mathbf{x}_{n,-i}, \boldsymbol{\theta}) \]

with log form

\[ \log \mathrm{PL}(\boldsymbol{\theta}) = \sum_{n=1}^{N} \sum_{i=1}^{p} \log p(x_{ni} \mid \mathbf{x}_{n,-i}, \boldsymbol{\theta}). \]

The MCMC samplers in bgms target the pseudoposterior distribution, combining the pseudolikelihood with the prior.

Statistical properties

The pseudolikelihood and the full likelihood share the same mode: PL does not introduce additional bias beyond that already present in the full likelihood (Arena & Marsman, 2026; Keetelaar et al., 2024). However, the pseudolikelihood underestimates the variance of the parameters, producing pseudoposterior distributions that are too narrow compared to the true posterior (Arena & Marsman, 2026; Keetelaar et al., 2024; Miller, 2021).

This variance underestimation has consequences for edge selection. Bayes factors derived from the pseudoposterior overstate the evidence for or against edge inclusion because the spike-and-slab comparison operates on a distribution that is artificially concentrated.

For the mixed MRF, the two pseudolikelihood options differ in how they handle this variance underestimation for the cross-type (discrete-continuous) interactions. The marginal pseudolikelihood does not appear to underestimate the variance of the cross-type interactions, whereas the conditional pseudolikelihood does.

OMRF pseudolikelihood

For OMRF, bgms uses the product of ordinal full conditionals. The conditional probabilities depend on category thresholds and residual scores that summarize the influence of all other variables.

Implementation details and gradient expressions are documented in OMRF Internals.

Mixed MRF pseudolikelihood

For mixed models, the pseudolikelihood combines discrete and continuous parts. The continuous block always uses the GGM likelihood. The pseudolikelihood argument controls how the discrete full conditionals are computed:

  • "conditional" (default): computes \(\log p(x_s \mid x_{-s}, y_{\text{obs}})\), conditioning on the observed continuous values. Faster because updating the continuous precision does not require re-evaluating the discrete pseudolikelihood.
  • "marginal": computes \(\log p(x_s \mid x_{-s})\) by integrating out the continuous variables. Uses the marginal interaction matrix \(\Theta = 2A_{xx} + 2A_{xy}\Sigma_{yy}A_{xy}^\top\), which couples the discrete pseudolikelihood to the precision parameters.

Implementation details are documented in Mixed MRF Internals.

Group comparison (bgmCompare)

The group-comparison engine uses the same pseudolikelihood idea but evaluates it per group under the baseline-plus-differences parameterization. Gradient contributions are then projected through the contrast structure.

Implementation details are documented in bgmCompare Engine.

Numerical and gradient considerations

In discrete blocks, both log-normalizers and category probabilities are needed for efficient gradient computation. bgms computes these together to avoid redundant work and uses numerically stable paths for exponentiation-sensitive regions.

See Variable Helpers and Numerical Considerations.

References

Arena, G., & Marsman, M. (2026). Bayesian inference for discrete Markov random fields through coordinate rescaling. arXiv Preprint. https://arxiv.org/abs/2601.17205
Keetelaar, S., Sekulovski, N., Borsboom, D., & Marsman, M. (2024). Comparing maximum likelihood and pseudo-maximum likelihood estimators for the Ising model. Advances.in/Psychology, 2(e25745). https://doi.org/10.56296/aip00013
Miller, J. W. (2021). Asymptotic normality, concentration, and coverage of generalized posteriors. Journal of Machine Learning Research, 22(168), 1–53.