Pseudolikelihood

This page describes how pseudolikelihood approximations are used in bgms across OMRF, mixed MRF, and group-comparison models.

Why pseudolikelihood

For discrete and mixed graphical models, the joint likelihood contains a normalizing constant that is expensive or intractable to evaluate for realistic network sizes. bgms therefore uses pseudolikelihood-based objectives built from full conditional distributions.

Generic form

Let \(\mathbf{x}_n\) denote observation \(n\), and let \(x_{ni}\) be variable \(i\) in that observation. The pseudolikelihood is

\[ \mathrm{PL}(\boldsymbol{\theta}) = \prod_{n=1}^{N} \prod_{i=1}^{p} p(x_{ni} \mid \mathbf{x}_{n,-i}, \boldsymbol{\theta}) \]

with log form

\[ \log \mathrm{PL}(\boldsymbol{\theta}) = \sum_{n=1}^{N} \sum_{i=1}^{p} \log p(x_{ni} \mid \mathbf{x}_{n,-i}, \boldsymbol{\theta}). \]

The MCMC samplers in bgms target the pseudoposterior distribution, combining the pseudolikelihood with the prior.

Statistical properties

The pseudolikelihood and the full likelihood share the same mode: PL does not introduce additional bias beyond that already present in the full likelihood (Arena & Marsman, 2026; Keetelaar et al., 2024). However, the pseudolikelihood underestimates the variance of the parameters, producing pseudoposterior distributions that are too narrow compared to the true posterior (Arena & Marsman, 2026; Keetelaar et al., 2024; Miller, 2021).

This variance underestimation has consequences for edge selection. Bayes factors derived from the pseudoposterior overstate the evidence for or against edge inclusion because the spike-and-slab comparison operates on a distribution that is artificially concentrated.

For the mixed MRF, the marginal pseudolikelihood (which integrates out the continuous block when computing discrete full conditionals) does not appear to underestimate the variance of the cross-type (discrete-continuous) interactions. bgms therefore uses the marginal pseudolikelihood for mixed models.

OMRF pseudolikelihood

For OMRF, bgms uses the product of ordinal full conditionals. The conditional probabilities depend on category thresholds and residual scores that summarize the influence of all other variables.

Implementation details and gradient expressions are documented in OMRF Internals.

Mixed MRF pseudolikelihood

For mixed models, the pseudolikelihood combines discrete and continuous parts. The continuous block uses the GGM likelihood. For the discrete block, bgms integrates out the continuous variables to obtain a marginal MRF on \(\mathbf{x}\) and applies the pseudolikelihood to that marginal model: \(\log p(x_s \mid x_{-s})\) rather than \(\log p(x_s \mid x_{-s}, y_{\text{obs}})\). The marginal interaction matrix is \(M = A_{xx} + 2A_{xy}\Sigma_{yy}A_{xy}^\top\), which couples the discrete pseudolikelihood to the precision parameters.

Implementation details are documented in Mixed MRF Internals.

Group comparison (bgmCompare)

The group-comparison engine uses the same pseudolikelihood idea but evaluates it per group under the baseline-plus-differences parameterization. Gradient contributions are then projected through the contrast structure.

Implementation details are documented in bgmCompare Engine.

Numerical and gradient considerations

In discrete blocks, both log-normalizers and category probabilities are needed for efficient gradient computation. bgms computes these together to avoid redundant work and uses numerically stable paths for exponentiation-sensitive regions.

See Variable Helpers and Numerical Considerations.

References

Arena, G., & Marsman, M. (2026). Bayesian inference for discrete Markov random fields through coordinate rescaling. arXiv Preprint. https://arxiv.org/abs/2601.17205

Keetelaar, S., Sekulovski, N., Borsboom, D., & Marsman, M. (2024). Comparing maximum likelihood and pseudo-maximum likelihood estimators for the Ising model. Advances.in/Psychology, 2(e25745). https://doi.org/10.56296/aip00013

Miller, J. W. (2021). Asymptotic normality, concentration, and coverage of generalized posteriors. Journal of Machine Learning Research, 22(168), 1–53.