Predict Conditional Probabilities from a Fitted bgms Model
Source:R/simulate_predict.R
predict.bgms.RdComputes conditional probability distributions for one or more variables given the observed values of other variables in the data. Supports ordinal, Blume-Capel, continuous (GGM), and mixed MRF models.
Arguments
- object
An object of class
bgms.- newdata
A matrix or data frame with
nrows andpcolumns containing the observed data. Must have the same variables (columns) as the original data used to fit the model.- variables
Which variables to predict. Can be:
A character vector of variable names
An integer vector of column indices
NULL(default) to predict all variables
- type
Character string specifying the type of prediction:
"probabilities"Return the full conditional probability distribution for each variable and observation.
"response"Return the predicted category (mode of the conditional distribution).
- method
Character string specifying which parameter estimates to use:
"posterior-mean"Use posterior mean parameters.
"posterior-sample"Average predictions over posterior draws.
- ndraws
Number of posterior draws to use when
method = "posterior-sample". IfNULL, uses all available draws.- seed
Optional random seed for reproducibility when
method = "posterior-sample".- ...
Additional arguments (currently ignored).
Value
Ordinal models:
For type = "probabilities": A named list with one element per
predicted variable. Each element is a matrix with n rows and
num_categories + 1 columns containing
\(P(X_j = c | X_{-j})\)
for each observation and category.
For type = "response": A matrix with n rows and
length(variables) columns containing predicted categories.
When method = "posterior-sample", probabilities are averaged over
posterior draws, and an attribute "sd" is included containing the
standard deviation across draws.
GGM (continuous) models:
For type = "probabilities": A named list with one element per
predicted variable. Each element is a matrix with n rows and
2 columns ("mean" and "sd") containing the conditional
Gaussian parameters \(E(X_j | X_{-j})\) and
\(\text{SD}(X_j | X_{-j})\).
For type = "response": A matrix with n rows and
length(variables) columns containing conditional means.
When method = "posterior-sample", conditional parameters are
averaged over posterior draws, and an attribute "sd" is included.
Mixed MRF models:
For mixed models, the return list contains elements for both discrete and continuous predicted variables. Discrete variables return probability matrices (as in ordinal models); continuous variables return conditional mean and SD matrices (as in GGM models).
Details
For each observation, the function computes the conditional distribution of the target variable(s) given the observed values of all other variables. This is the same conditional distribution used internally by the Gibbs sampler.
For GGM (continuous) models, the conditional distribution of \(X_j | X_{-j}\) is Gaussian with mean \(-\omega_{jj}^{-1} \sum_{k \neq j} \omega_{jk} x_k\) and variance \(\omega_{jj}^{-1}\), where \(\Omega\) is the precision matrix.
See also
simulate.bgms for generating new data from the model.
Other prediction:
predict.bgmCompare(),
simulate.bgmCompare(),
simulate.bgms(),
simulate_mrf()
Examples
# \donttest{
# Fit a model
fit = bgm(x = Wenchuan[, 1:5], chains = 2)
#> 7 rows with missing values excluded (n = 355 remaining).
#> To impute missing values instead, use na_action = "impute".
#> Chain 1 (Warmup): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 100/2000 (5.0%)
#> Chain 2 (Warmup): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 148/2000 (7.4%)
#> Total (Warmup): ⦗━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 248/4000 (6.2%)
#> Elapsed: 0s | ETA: 0s
#> Chain 1 (Warmup): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 350/2000 (17.5%)
#> Chain 2 (Warmup): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 400/2000 (20.0%)
#> Total (Warmup): ⦗━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 750/4000 (18.8%)
#> Elapsed: 1s | ETA: 4s
#> Chain 1 (Warmup): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 600/2000 (30.0%)
#> Chain 2 (Warmup): ⦗━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 652/2000 (32.6%)
#> Total (Warmup): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 1252/4000 (31.3%)
#> Elapsed: 2s | ETA: 4s
#> Chain 1 (Warmup): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 900/2000 (45.0%)
#> Chain 2 (Warmup): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 933/2000 (46.7%)
#> Total (Warmup): ⦗━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━⦘ 1833/4000 (45.8%)
#> Elapsed: 2s | ETA: 2s
#> Chain 1 (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 1150/2000 (57.5%)
#> Chain 2 (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━⦘ 1166/2000 (58.3%)
#> Total (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━⦘ 2316/4000 (57.9%)
#> Elapsed: 3s | ETA: 2s
#> Chain 1 (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 1400/2000 (70.0%)
#> Chain 2 (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 1395/2000 (69.8%)
#> Total (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 2795/4000 (69.9%)
#> Elapsed: 3s | ETA: 1s
#> Chain 1 (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 1650/2000 (82.5%)
#> Chain 2 (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━⦘ 1616/2000 (80.8%)
#> Total (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 3266/4000 (81.7%)
#> Elapsed: 4s | ETA: 1s
#> Chain 1 (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 1900/2000 (95.0%)
#> Chain 2 (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 1843/2000 (92.2%)
#> Total (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━⦘ 3743/4000 (93.6%)
#> Elapsed: 4s | ETA: 0s
#> Chain 1 (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 2000/2000 (100.0%)
#> Chain 2 (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 2000/2000 (100.0%)
#> Total (Sampling): ⦗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━⦘ 4000/4000 (100.0%)
#> Elapsed: 5s | ETA: 0s
# Compute conditional probabilities for all variables
probs = predict(fit, newdata = Wenchuan[1:10, 1:5])
# Predict the first variable only
probs_v1 = predict(fit, newdata = Wenchuan[1:10, 1:5], variables = 1)
# Get predicted categories
pred_class = predict(fit, newdata = Wenchuan[1:10, 1:5], type = "response")
# }