Skip to content

Computes conditional probability distributions for one or more variables given the observed values of other variables in the data. Supports ordinal, Blume-Capel, continuous (GGM), and mixed MRF models.

Usage

# S3 method for class 'bgms'
predict(
  object,
  newdata,
  variables = NULL,
  type = c("probabilities", "response"),
  method = c("posterior-mean", "posterior-sample"),
  ndraws = NULL,
  seed = NULL,
  ...
)

Arguments

object

An object of class bgms.

newdata

A matrix or data frame with n rows and p columns containing the observed data. Must have the same variables (columns) as the original data used to fit the model.

variables

Which variables to predict. Can be:

  • A character vector of variable names

  • An integer vector of column indices

  • NULL (default) to predict all variables

type

Character string specifying the type of prediction:

"probabilities"

Return the full conditional probability distribution for each variable and observation.

"response"

Return the predicted category (mode of the conditional distribution).

method

Character string specifying which parameter estimates to use:

"posterior-mean"

Use posterior mean parameters.

"posterior-sample"

Average predictions over posterior draws.

ndraws

Number of posterior draws to use when method = "posterior-sample". If NULL, uses all available draws.

seed

Optional random seed for reproducibility when method = "posterior-sample".

...

Additional arguments (currently ignored).

Value

Ordinal models:

For type = "probabilities": A named list with one element per predicted variable. Each element is a matrix with n rows and num_categories + 1 columns containing \(P(X_j = c | X_{-j})\) for each observation and category.

For type = "response": A matrix with n rows and length(variables) columns containing predicted categories.

When method = "posterior-sample", probabilities are averaged over posterior draws, and an attribute "sd" is included containing the standard deviation across draws.

GGM (continuous) models:

For type = "probabilities": A named list with one element per predicted variable. Each element is a matrix with n rows and 2 columns ("mean" and "sd") containing the conditional Gaussian parameters \(E(X_j | X_{-j})\) and \(\text{SD}(X_j | X_{-j})\).

For type = "response": A matrix with n rows and length(variables) columns containing conditional means.

When method = "posterior-sample", conditional parameters are averaged over posterior draws, and an attribute "sd" is included.

Mixed MRF models:

For mixed models, the return list contains elements for both discrete and continuous predicted variables. Discrete variables return probability matrices (as in ordinal models); continuous variables return conditional mean and SD matrices (as in GGM models).

Details

For each observation, the function computes the conditional distribution of the target variable(s) given the observed values of all other variables. This is the same conditional distribution used internally by the Gibbs sampler.

For GGM (continuous) models, the conditional distribution of \(X_j | X_{-j}\) is Gaussian with mean \(-\omega_{jj}^{-1} \sum_{k \neq j} \omega_{jk} x_k\) and variance \(\omega_{jj}^{-1}\), where \(\Omega\) is the precision matrix.

See also

simulate.bgms for generating new data from the model.

Other prediction: predict.bgmCompare(), simulate.bgmCompare(), simulate.bgms(), simulate_mrf()

Examples

# \donttest{
# Fit a model
fit = bgm(x = Wenchuan[, 1:5], chains = 2)
#> 7 rows with missing values excluded (n = 355 remaining).
#> To impute missing values instead, use na_action = "impute".
#> Chain 1 (Warmup): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 100/2000 (5.0%)
#> Chain 2 (Warmup): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 148/2000 (7.4%)
#> Total   (Warmup): ⦗━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 248/4000 (6.2%)
#> Elapsed: 0s | ETA: 0s
#> Chain 1 (Warmup): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 350/2000 (17.5%)
#> Chain 2 (Warmup): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 400/2000 (20.0%)
#> Total   (Warmup): ⦗━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 750/4000 (18.8%)
#> Elapsed: 1s | ETA: 4s
#> Chain 1 (Warmup): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 600/2000 (30.0%)
#> Chain 2 (Warmup): ⦗━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 652/2000 (32.6%)
#> Total   (Warmup): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 1252/4000 (31.3%)
#> Elapsed: 2s | ETA: 4s
#> Chain 1 (Warmup): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 900/2000 (45.0%)
#> Chain 2 (Warmup): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 933/2000 (46.7%)
#> Total   (Warmup): ⦗━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━⦘ 1833/4000 (45.8%)
#> Elapsed: 2s | ETA: 2s
#> Chain 1 (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 1150/2000 (57.5%)
#> Chain 2 (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━⦘ 1166/2000 (58.3%)
#> Total   (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━⦘ 2316/4000 (57.9%)
#> Elapsed: 3s | ETA: 2s
#> Chain 1 (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 1400/2000 (70.0%)
#> Chain 2 (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 1395/2000 (69.8%)
#> Total   (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 2795/4000 (69.9%)
#> Elapsed: 3s | ETA: 1s
#> Chain 1 (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 1650/2000 (82.5%)
#> Chain 2 (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━⦘ 1616/2000 (80.8%)
#> Total   (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 3266/4000 (81.7%)
#> Elapsed: 4s | ETA: 1s
#> Chain 1 (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 1900/2000 (95.0%)
#> Chain 2 (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 1843/2000 (92.2%)
#> Total   (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━⦘ 3743/4000 (93.6%)
#> Elapsed: 4s | ETA: 0s
#> Chain 1 (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 2000/2000 (100.0%)
#> Chain 2 (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 2000/2000 (100.0%)
#> Total   (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 4000/4000 (100.0%)
#> Elapsed: 5s | ETA: 0s

# Compute conditional probabilities for all variables
probs = predict(fit, newdata = Wenchuan[1:10, 1:5])

# Predict the first variable only
probs_v1 = predict(fit, newdata = Wenchuan[1:10, 1:5], variables = 1)

# Get predicted categories
pred_class = predict(fit, newdata = Wenchuan[1:10, 1:5], type = "response")
# }