Working with Posteriors

Summary statistics

We can easily customize the summary statistics reported by $summary() and $print().

fit <- cmdstanr::cmdstanr_example("schools", method = "sample")
fit$summary()

By default all variables are summaries with the follow functions:

posterior::default_summary_measures()

To change the variables summarized, we use the variables argument

fit$summary(variables = c("mu", "tau"))

We can additionally change which functions are used

fit$summary(variables = c("mu", "tau"), mean, sd)

To summarize all variables with non-default functions, it is necessary to set explicitly set the variables argument, either to NULL or the full vector of variable names.

fit$metadata()$model_params
fit$summary(variables = NULL, "mean", "median")

Summary functions can be specified by character string, function, or using a formula (or anything else supported by rlang::as_function()). If these arguments are named, those names will be used in the tibble output. If the summary results are named they will take precedence.

my_sd <- function(x) c(My_SD = sd(x))
fit$summary(
  c("mu", "tau"), 
  MEAN = mean, 
  "median",
  my_sd,
  ~quantile(.x, probs = c(0.1, 0.9)),
  Minimum = function(x) min(x)
)        

Arguments to all summary functions can also be specified with .args.

fit$summary(c("mu", "tau"), quantile, .args = list(probs = c(0.025, .05, .95, .975)))

The summary functions are applied to the array of sample values, with dimension iter_samplingxchains.

fit$summary(variables = NULL, dim, colMeans)

For this reason users may have unexpected results if they use stats::var() directly, as it will return a covariance matrix. An alternative is the distributional::variance() function, which can also be accessed via posterior::variance().

fit$summary(c("mu", "tau"), posterior::variance, ~var(as.vector(.x)))

Summary functions need not be numeric, but these won’t work with $print().

strict_pos <- function(x) if (all(x > 0)) "yes" else "no"
fit$summary(variables = NULL, "Strictly Positive" = strict_pos)
# fit$print(variables = NULL, "Strictly Positive" = strict_pos)

For more information, see posterior::summarise_draws(), which is called by $summary().

Extracting posterior draws/samples

The $draws() method can be used to extract the posterior draws in formats provided by the posterior package. Here we demonstrate only the draws_array and draws_df formats, but the posterior package supports other useful formats as well.

# default is a 3-D draws_array object from the posterior package
# iterations x chains x variables
draws_arr <- fit$draws() # or format="array"
str(draws_arr)

# draws x variables data frame
draws_df <- fit$draws(format = "df")
str(draws_df)
print(draws_df)

To convert an existing draws object to a different format use the posterior::as_draws_*() functions.

To manipulate the draws objects use the various methods described in the posterior package vignettes and documentation.

Structured draws similar to rstan::extract()

The posterior package’s rvar format provides a multidimensional, sample-based representation of random variables. See https://mc-stan.org/posterior/articles/rvar.html for details. In addition to being useful in its own right, this format also allows CmdStanR users to obtain draws in a similar format to rstan::extract().

Suppose we have a parameter matrix[2,3] x. The rvar format lets you interact with x as if it’s a 2 x 3 matrix and automatically applies operations over the many posterior draws of x. To instead directly access the draws of x while maintaining the structure of the matrix use posterior::draws_of(). For example:

draws <- posterior::as_draws_rvars(fit$draws())
x_rvar <- draws$x
x_array <- posterior::draws_of(draws$x)

The object x_rvar will be an rvar that can be used like a 2 x 3 matrix, with the draws handled behind the scenes. The object x_array will be a 4000 x 2 x 3 array (assuming 4000 posterior draws), which is the same as it would be after being extracted from the list returned by rstan::extract().