Getting Started

bgms implements Bayesian estimation and edge selection for graphical models of mixed binary, ordinal, and continuous variables. Networks are estimated at the level of conditional associations: an edge between two variables indicates a direct relationship that persists after accounting for all other variables in the model.

The package has two main entry points:

For background on graphical models and the Bayesian approach, see Graphical Models and The Bayesian Approach.

Installation

From CRAN

install.packages("bgms")

Development version

if(!requireNamespace("remotes")) {
  install.packages("remotes")
}
remotes::install_github("Bayesian-Graphical-Modelling-Lab/bgms")

Fitting a model

The main function is bgm(). It accepts a data frame or matrix of mixed binary, ordinal, and continuous variables and estimates a Markov random field.

The Wenchuan dataset is included in the package, and contains ordinal measures of symptoms of posttraumatic stress from 362 survivors of the 2008 Wenchuan earthquake.

library(bgms)

data(Wenchuan)
head(Wenchuan)
     intrusion dreams flash upset physior avoidth avoidact amnesia lossint
[1,]         2      2     2     2       3       2        3       2       3
[2,]         2      2     2     3       3       3        3       2       3
[3,]         2      4     4     4       3       3        3       5       4
[4,]         2      1     2     2       1       1        2       2       2
[5,]         2      2     2     2       2       2        2       2       3
[6,]         4      3     2     2       2       2        3       3       2
     distant numb future sleep anger concen hyper startle
[1,]       2    2      1     3     4      3     4       2
[2,]       3    2      2     3     3      2     3       3
[3,]       3    2      3     4     4      4     3       4
[4,]       1    1      2     2     1      2     3       3
[5,]       2    2      2     3     2      3     2       3
[6,]       2    2      3     2     3      2     3       2

For illustration, we select five symptoms from the Wenchuan dataset — intrusive memories, disturbing dreams, flashbacks, emotional upset, and physical reactions — so that the output fits on screen:

fit = bgm(Wenchuan[, 1:5], seed = 1234)

By default, bgm() runs four parallel MCMC chains using the No-U-Turn Sampler (NUTS), in combination with Bayesian edge selection. Here we use seed = 1234 for reproducibility. If NUTS diagnostics indicate persistent problems, refit with update_method = "adaptive-metropolis".

Posterior summaries

summary(fit)
Posterior summaries from Bayesian estimation:

Category thresholds: 
                mean  mcse    sd    n_eff  Rhat
intrusion (1)  0.470 0.007 0.244 1103.335 1.001
intrusion (2) -1.911 0.014 0.347  598.092 1.003
intrusion (3) -4.857 0.025 0.558  502.555 1.003
intrusion (4) -9.525 0.038 0.900  557.397 1.003
dreams (1)    -0.591 0.006 0.195 1159.771 1.007
dreams (2)    -3.780 0.012 0.359  850.323 1.012
... (use `summary(fit)$main` to see full output)

Pairwise interactions:
                   mean  mcse    sd    n_eff n_eff_mixt  Rhat
intrusion-dreams  0.314 0.001 0.035 1303.379            1.011
intrusion-flash   0.170 0.001 0.033 1345.315            1.006
intrusion-upset   0.097 0.003 0.037  437.558    196.221 1.029
intrusion-physior 0.100 0.003 0.032  375.970    138.389 1.010
dreams-flash      0.252 0.001 0.032 1327.208            1.003
dreams-upset      0.113 0.001 0.028 1093.191   1023.494 1.000
... (use `summary(fit)$pairwise` to see full output)
Note: NA values are suppressed in the print table. They occur here when an 
indicator was zero across all iterations, so mcse/n_eff/n_eff_mixt/Rhat are undefined;
`summary(fit)$pairwise` still contains the NA values.

Inclusion probabilities:
                   mean  mcse    sd n0->0 n0->1 n1->0 n1->1 n_eff_mixt  Rhat
intrusion-dreams  1.000       0.000     0     0     0  1999                 
intrusion-flash   1.000       0.000     0     0     0  1999                 
intrusion-upset   0.946 0.024 0.227   100     9     9  1881     91.318 1.156
intrusion-physior 0.975 0.025 0.158    49     2     2  1946            1.168
dreams-flash      1.000       0.000     0     0     0  1999                 
dreams-upset      0.999 0.001 0.032     1     1     1  1996            1.291
... (use `summary(fit)$indicator` to see full output)
Note: NA values are suppressed in the print table. They occur when an indicator
was constant or had fewer than 5 transitions, so n_eff_mixt is unreliable;
`summary(fit)$indicator` still contains all computed values.

Use `summary(fit)$<component>` to access full results.
Use `extract_log_odds(fit)` for log odds ratios.
See the `easybgm` package for other summary and plotting tools.

The summary shows posterior means, standard deviations, Monte Carlo standard errors (MCSE), effective sample sizes (ESS), and R-hat values for each parameter. For partial associations, it also reports posterior inclusion probabilities because we used edge selection. For a full description of the output structure, see MCMC Output; for guidance on interpreting the convergence diagnostics (ESS and R-hat), see MCMC Diagnostics.

Posterior means

coef(fit)
$main
             cat (1)   cat (2)   cat (3)    cat (4)
intrusion  0.4698311 -1.911163 -4.856581  -9.524646
dreams    -0.5905234 -3.780086 -7.101203 -11.531746
flash     -0.1100408 -2.586952 -5.400103  -9.741958
upset      0.4146327 -1.311763 -3.376583  -7.041663
physior   -0.6188232 -3.179556 -6.245554 -10.604088

$pairwise
           intrusion      dreams       flash       upset     physior
intrusion 0.00000000 0.313666440 0.170222621 0.097213345 0.099811155
dreams    0.31366644 0.000000000 0.251730425 0.112601006 0.003430553
flash     0.17022262 0.251730425 0.000000000 0.004192279 0.153782540
upset     0.09721334 0.112601006 0.004192279 0.000000000 0.355184144
physior   0.09981116 0.003430553 0.153782540 0.355184144 0.000000000

$indicator
          intrusion dreams  flash  upset physior
intrusion    0.0000  1.000 1.0000 0.9455  0.9745
dreams       1.0000  0.000 1.0000 0.9990  0.0760
flash        1.0000  1.000 0.0000 0.0865  1.0000
upset        0.9455  0.999 0.0865 0.0000  1.0000
physior      0.9745  0.076 1.0000 1.0000  0.0000

coef() returns posterior mean matrices: $pairwise contains the partial associations and $indicator contains edge inclusion probabilities.

Network plot

To visualize the estimated network, we select the edges with posterior inclusion probabilities 0.5 or up, and plot the result using the qgraph package (Epskamp et al., 2012).

library(qgraph)

median_probability_network = coef(fit)$pairwise
median_probability_network[coef(fit)$indicator < 0.5] = 0.0

qgraph(median_probability_network,
  theme = "TeamFortress",
  maximum = 1,
  fade = FALSE,
  color = c("#f0ae0e"), vsize = 10, repulsion = .9,
  label.cex = 1, label.scale = "FALSE",
  labels = colnames(Wenchuan)[1:5]
)

In this plot, edges with posterior inclusion probability below 0.5 are set to zero. Practically, this means that a shown edge indicates evidence for a direct conditional association between two symptoms after accounting for the others in the network, whereas a missing edge indicates evidence that this direct association is absent at the selected threshold. Because the cutoff is 0.5, this reflects minimum evidence for inclusion (i.e., an inclusion Bayes factor threshold of 1 when prior inclusion probability is 0.5); higher posterior inclusion probabilities indicate stronger evidence for inclusion.

Continuous data

For continuous variables, bgms fits a Gaussian graphical model. The interface is identical; only the variable_type argument changes.

As a continuous-data example, we use the classic examination marks dataset from Mardia et al. (1979). The data contain scores of 88 students on five examinations:

marks = data.frame(
  mechanics  = c(77,63,75,55,63,53,51,59,62,64,52,55,50,65,31,60,44,42,62,31,
                 44,49,12,49,54,54,44,18,46,32,30,46,40,31,36,56,46,45,42,40,
                 23,48,41,46,46,40,49,22,35,48,31,17,49,59,37,40,35,38,43,39,
                 62,48,34,18,35,59,41,31,17,34,46,10,46,30,13,49,18,8,23,30,
                 3,7,15,15,5,12,5,0),
  vectors    = c(82,78,73,72,63,61,67,70,60,72,64,67,50,63,55,64,69,69,46,49,
                 61,41,58,53,49,53,56,44,52,45,69,49,27,42,59,40,56,42,60,63,
                 55,48,63,52,61,57,49,58,60,56,57,53,57,50,56,43,35,44,43,46,
                 44,38,42,51,36,53,41,52,51,30,40,46,37,34,51,50,32,42,38,24,
                 9,51,40,38,30,30,26,40),
  algebra    = c(67,80,71,63,65,72,65,68,58,60,60,59,64,58,60,56,53,61,61,62,
                 52,61,61,49,56,46,55,50,65,49,50,53,54,48,51,56,57,55,54,53,
                 59,49,49,53,46,51,45,53,47,49,50,57,47,47,49,48,41,54,38,46,
                 36,41,50,40,46,37,43,37,52,50,47,36,45,43,50,38,31,48,36,43,
                 51,43,43,39,44,32,15,21),
  analysis   = c(67,70,66,70,70,64,65,62,62,62,63,62,55,56,57,54,53,55,57,63,
                 62,49,63,62,47,59,61,57,50,57,52,59,61,54,45,54,49,56,49,54,
                 53,51,46,41,38,52,48,56,54,42,54,43,39,15,28,21,51,47,34,32,
                 22,44,47,56,48,22,30,27,35,47,29,47,15,46,25,23,45,26,48,33,
                 47,17,23,28,36,35,20,9),
  statistics = c(81,81,81,68,63,73,68,56,70,45,54,44,63,37,73,40,53,45,45,62,
                 46,64,67,47,53,44,36,81,35,64,45,37,61,68,51,35,32,40,33,25,
                 44,37,34,40,41,31,39,41,33,32,34,51,26,46,45,61,50,24,49,43,
                 42,33,29,30,29,19,33,40,31,36,17,39,30,18,31,9,40,40,15,25,
                 40,22,18,17,18,21,20,14)
)
head(marks)
  mechanics vectors algebra analysis statistics
1        77      82      67       67         81
2        63      78      80       70         81
3        75      73      71       66         81
4        55      72      63       70         68
5        63      63      65       70         63
6        53      61      72       64         73
fit_ggm = bgm(marks, variable_type = "continuous", seed = 1234)
summary(fit_ggm)
summary(fit_ggm)
Posterior summaries from Bayesian estimation:

Residual variances:
                                  mean  mcse     sd    n_eff  Rhat
mechanics (residual variance)  199.690 0.624 29.353 2209.332 1.005
vectors (residual variance)     83.047 0.401 13.971 1213.725 1.085
algebra (residual variance)     31.538 0.119  4.204 1248.653 1.010
analysis (residual variance)   104.263 0.464 15.811 1160.907 1.001
statistics (residual variance) 162.436 0.535 25.163 2209.373 1.001

Pairwise interactions:
                       mean mcse    sd    n_eff n_eff_mixt  Rhat
mechanics-vectors    -0.003    0 0.001 1032.494     76.123 1.276
mechanics-algebra    -0.001    0 0.001              89.705 1.478
mechanics-analysis    0.000                2000            1.187
mechanics-statistics  0.000                2000            1.084
vectors-algebra      -0.007    0 0.002 2191.098            1.001
vectors-analysis      0.000            1537.282            1.173
... (use `summary(fit)$pairwise` to see full output)
Note: NA values are suppressed in the print table. They occur here when an 
indicator was zero across all iterations, so mcse/n_eff/n_eff_mixt/Rhat are undefined;
`summary(fit)$pairwise` still contains the NA values.

Inclusion probabilities:
                      mean  mcse    sd n0->0 n0->1 n1->0 n1->1 n_eff_mixt  Rhat
mechanics-vectors    0.904  0.04 0.294   182     9     9  1799     53.492 1.429
mechanics-algebra    0.126 0.035 0.331  1729    19    19   232     90.483 1.501
mechanics-analysis   0.002 0.001 0.050  1989     5     5     0   2010.055 1.019
mechanics-statistics 0.001 0.001 0.032  1995     2     2     0            0.999
vectors-algebra      1.000       0.000     0     0     0  1999                 
vectors-analysis     0.002 0.001 0.039  1994     2     2     1            1.051
... (use `summary(fit)$indicator` to see full output)
Note: NA values are suppressed in the print table. They occur when an indicator
was constant or had fewer than 5 transitions, so n_eff_mixt is unreliable;
`summary(fit)$indicator` still contains all computed values.

Use `summary(fit)$<component>` to access full results.
Use `extract_partial_correlations(fit)` for partial correlations,
and `extract_precision(fit)` for the precision matrix.
See the `easybgm` package for other summary and plotting tools.

GGM network plot

library(qgraph)

ggm_network = coef(fit_ggm)$pairwise
ggm_network[coef(fit_ggm)$indicator < 0.5] = 0.0

qgraph(ggm_network,
  theme = "TeamFortress",
  maximum = 1,
  fade = FALSE,
  color = c("#5DA5DA"), vsize = 10, repulsion = .9,
  label.cex = 1, label.scale = "FALSE",
  labels = colnames(marks)
)

This plot shows the estimated conditional association network for the continuous variables; edges are retained when their posterior inclusion probability is at least 0.5.

Next steps

References

Epskamp, S., Cramer, A. O. J., Waldorp, L. J., Schmittmann, V. D., & Borsboom, D. (2012). qgraph: Network visualizations of relationships in psychometric data. Journal of Statistical Software, 48(4), 1–18. https://doi.org/10.18637/jss.v048.i04
Mardia, K. V., Kent, J. T., & Bibby, J. M. (1979). Multivariate analysis. Academic Press.