R Scaffolding

The R layer sits between user-facing functions (bgm(), bgmCompare()) and the C++ sampler. It validates inputs, constructs a specification object, dispatches to the appropriate C++ sampler, and assembles the returned output into an S3 fit object.

Data flow

flowchart TD
    A["<b>bgm(x, ...)</b>"]
    B["<b>bgm_spec()</b><br/>validates, assembles sub-lists"]
    C["<b>new_bgm_spec()</b><br/>asserts types and field presence"]
    D["<b>validate_bgm_spec()</b><br/>cross-field invariant checks"]
    E["<b>run_sampler(spec)</b><br/>dispatches to C++ via model_type"]
    F["<b>build_output(spec, raw)</b><br/>normalizes raw C++ output to S3"]
    G["<b>bgm fit object</b>"]
    A --> B --> C --> D --> E --> F --> G

Both bgm() and bgmCompare() follow this pipeline. The bgmCompare() path adds group-specific preprocessing (projection matrices, group indices, precomputed sufficient statistics) before the spec is constructed.

Validation pipeline

Validation is split across four files, each with a focused scope.

Data checks (`validate_data.R`)

data_check() is the entry point. It enforces:

Input is a data frame or matrix (converts to integer matrix for ordinal data)
No constant columns
At least two variables
Column names exist and are unique
For ordinal data: all values are non-negative integers

Missing data handling branches on na_action:

"listwise" — rows with any NA are dropped
"impute" — missing indices are recorded for Bayesian imputation in C++

Variable type detection (`validate_model.R`)

validate_variable_types() classifies each column as "ordinal" or "blume-capel" (for discrete variables) based on the variable_type argument. When variable_type = "ordinal" (the default), all discrete variables are treated as ordinal. When a named vector is provided, each variable is classified individually.

validate_baseline_category() sets the reference category for Blume-Capel variables. The default (0) places the reference at the lowest observed category.

Sampler validation (`validate_sampler.R`)

validate_sampler() resolves the update_method argument into a concrete sampler type ("nuts" or "adaptive_metropolis") and sets defaults for tuning parameters (target_accept, nuts_max_depth). It detects the number of available cores and resolves the progress display type.

Prior validation (`R/priors.R`, `validate_model.R`)

Priors are user-facing S3 objects (bgms_parameter_prior, bgms_scale_prior, bgms_indicator_prior). Each prior constructor (cauchy_prior(), gamma_prior(), bernoulli_prior(), …) validates its own hyperparameters at the call site, so by the time the spec is built every prior object is already self-consistent.

unpack_parameter_prior(), unpack_scale_prior(), unpack_interaction_prior(), unpack_threshold_prior(), and unpack_indicator_prior() flatten each prior object into the (family, hyperparameters) representation that the C++ bridge expects. validate_edge_prior() and validate_difference_prior() accept either a prior object or a legacy character string ("Bernoulli", "Beta-Bernoulli", "Stochastic-Block"); strings are forwarded to the matching constructor with a lifecycle warning.

Spec construction (`bgm_spec.R`)

bgm_spec() is the user-facing constructor. It calls all validators, then delegates to one of four model-specific builders:

Model type	Builder	Key additions
`"ggm"`	`build_spec_ggm()`	Sufficient statistics (`X'X`), precision scale
`"omrf"`	`build_spec_omrf()`	Category counts, ordinal/BC flags, scaling factors
`"mixed_mrf"`	`build_spec_mixed_mrf()`	Separate discrete/continuous matrices
`"compare"`	`build_spec_compare()`	Group indices, projection matrix, precomputed pairwise stats

Each builder assembles a spec with seven components:

$model_type — model class string ("ggm", "omrf", etc.)
$data — observations, dimensions, variable names
$variables — variable types, ordinal flags, baseline categories
$missing — na_action, imputation flag, missing indices
$prior — prior type, hyperparameters, scaling factors
$sampler — algorithm, iterations, warmup, chains, seed
$precomputed — sufficient statistics and derived quantities (e.g., cross-products for GGM, pairwise stats for compare)

The result passes through new_bgm_spec() (type assertions) and validate_bgm_spec() (cross-field invariants, such as: if edge_selection = FALSE then edge_prior must be "Not applicable").

Sampler dispatch (`run_sampler.R`)

run_sampler() reads spec$model_type and calls the corresponding C++ entry point:

`model_type`	C++ function	Entry file
`"ggm"`	`sample_ggm()`	`src/sample_ggm.cpp`
`"omrf"`	`sample_omrf()`	`src/sample_omrf.cpp`
`"mixed_mrf"`	`sample_mixed_mrf()`	`src/sample_mixed.cpp`
`"compare"`	`run_bgmCompare_parallel()`	`src/bgmCompare_interface.cpp`

Each C++ function receives an R list of arguments, constructs model and prior objects, and calls run_mcmc_sampler() from the chain runner. The return value is a list of per-chain raw output.

Output assembly (`build_output.R`)

build_output() transforms raw C++ output into the S3 objects returned to the user ("bgms" or "bgmCompare" class).

GGM and OMRF (`build_output_bgm()`)

The GGM and OMRF paths share a unified builder. Key operations:

Normalize raw samples — Split the flat parameter vector into main-effect and pairwise-interaction matrices per chain
Compute posterior means — Average across iterations and chains
Compute inclusion probabilities — Average indicator samples (when edge selection is active)
Assemble $raw_samples — Per-chain lists of main, pairwise, indicator, and allocations matrices
NUTS diagnostics — When available, collect tree depth, divergence, and energy samples into $nuts_diagnostics
Warmup check — Collect $warmup_check from C++

Mixed MRF (`build_output_mixed_mrf()`)

The mixed MRF builder handles the block structure: discrete-discrete, continuous-continuous, and cross-type interactions are stored in separate blocks by C++ and need to be mapped back to the original variable ordering.

Compare (`build_output_compare()`)

The comparison builder splits posterior means into group-level ($group_posterior_means) and contrast ($difference_posterior_means) components, and computes contrast inclusion probabilities.

Data flow

Validation pipeline

Data checks (validate_data.R)

Variable type detection (validate_model.R)

Sampler validation (validate_sampler.R)

Prior validation (R/priors.R, validate_model.R)

Spec construction (bgm_spec.R)

Sampler dispatch (run_sampler.R)

Output assembly (build_output.R)

GGM and OMRF (build_output_bgm())

Mixed MRF (build_output_mixed_mrf())

Compare (build_output_compare())