Bayesian Logistic Regression using brms, Part 1

This is the first of a series of posts on how to fit, interpret, and evaluate Bayesian logistic regression models using the brms package in R. This is a post written with Nils Karl Reimer.

This post provides a gentle introduction to fitting Bayesian logistic regression models using the brms package in R (Bürkner, 2017). Why use brms? Besides being an excellent package with lots of cool features, the specification of regression models in brms closely mirrors the syntax you may already use to fit regression models in R, such as with the lm() or glm() commands. This should make a first pass at fitting Bayesian logistic regression models easier.

Before jumping into the tutorial, a couple of caveats: We’re assuming that you have some familiarity with (1) the R statistical computing environment, (2) interpreting logistic regression models, and (3) key concepts in Bayesian statistical analyses. If you need a refresher on R, see here. For information on logistic regression, perhaps take a look at this introductory tutorial. If you’d like some background on Bayesian methods and are short on time, We’d suggest this article. For a more thorough coverage, see McElreath (2017), Lambert (2018), or—if you’re ready for the deep dive—Bayesian Data Analysis by Gelman and colleagues (2016).

OK, let’s dive in!

The Data

We’re going to use some data collected via Project Implicit—a non-profit organization that operates a virtual laboratory where people can take various versions of the Implicit Association Test (IAT). Project Implicit kindly makes the data from this site publicly available on the Open Science Framework. Anyone can use it (just acknowledge Project Implicit in any publications!) and there’s a lot of it. You should check it out!

This tutorial uses data from the Sexuality IAT collected during 2017. If you’re not familiar with this particular version of the IAT, it measures someone’s relative preference for straight people over gay people. Scores above zero indicate a preference for straight people over gay people, a score of zero indicates no preference for either group, and scores below zero indicate a preference for gay people over straight people. Specifically, we’re going to look at whether implicit straight/gay attitudes predicts attitude-relevant policy positions.

The raw data for the Sexuality IAT task is available here. However, Bayesian analyses can be quite computationally intensive. As there is data from over 300,000 respondents in the 2017 data set, we’ll be working with a random draw of 5,000 sessions so it’s less time intensive for you to try these analyses for yourself. We’ve also done some data cleaning and recoding. If you want to see how we derived the data for this example, see here.

First, we’re going to have a couple of packages handy to get started:

library(brms); library(tidyverse); library(tidybayes); library(ggplot2); library(LaplacesDemon)
## Warning: package 'LaplacesDemon' was built under R version 3.5.2

As you already know, we’re going to need the brms package to fit our logistic regression models. Critically, brms relies on Stan (Stan Development Team, 2017), which is a great tool for fitting Bayesian analyses. For details on installing Stan, see here. Once you’ve done these steps you’ll need to have the brms library loaded. Another option is to use the free RStudio cloud service—they already have Stan installed. For details, see here.

Now, let’s read the data into R:

lrsmall <- read_csv(url("https://raw.githubusercontent.com/jamesrrae/sexuality-iat-website-data/master/lrsmall.csv"))

Now, let’s get a sense of the data:

print(lrsmall)
## # A tibble: 5,000 x 2
##    Allowed     D.z
##      <int>   <dbl>
##  1      NA   1.48
##  2       1   0.976
##  3      NA  NA
##  4       1   0.568
##  5      NA  NA
##  6       1   1.08
##  7       1  -1.68
##  8       1  -0.507
##  9       1   0.487
## 10      NA   0.174
## # ... with 4,990 more rows

Ok, so we have two variables that we’ll use for our logistic regression analysis. Allowed indexes self-reported answers to the question “Do you think it should be legal for same-sex partners to adopt a child?”. We’ll just focus on two response options: 0 = “Should not be legal” and 1 = “Should be legal”. The variable D.z is standardized IAT scores calculated by subtracting off the mean and dividing by the standard deviation.

Next, let’s look at the correlation between our two variables. This will help us know what to expect before doing our logistic regression analysis:

cor.test(lrsmall$Allowed,lrsmall$D.z)
##
##  Pearson's product-moment correlation
##
## data:  lrsmall$Allowed and lrsmall$D.z
## t = -13.083, df = 2922, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2691916 -0.2007010
## sample estimates:
##        cor
## -0.2352383

The variables are negatively correlated at about .20. And this makes sense—people with stronger implicit preferences for straight people over gay people are less likely to report that it’s OK for same-sex couples to adopt.

Using brms for Logistic Regression Analysis

First, we must set some priors for the parameters in our logistic regression model—the intercept and slope. We follow common prior choice recommendations for the intercept and slope:

$\beta \sim \text{Student}(3, 0, 2.5)$

This prior is a Student-t distribution (bell-shaped but with fatter tails than a normal distribution) with 3 degrees of freedom, a mean of 0, and scale of 2.5. This is a pretty wide prior. To see why, let’s look at the distribution:

x_values <- seq(-15,15, length.out = 1000)
data.frame(x_values) %>%
ggplot(aes(x_values))+
stat_function(fun=dst, args=list(nu=3,mu=0,sigma=2.5))

As you can see from the figure, this prior roughly expresses an expectation most values to be between -10 and +10. However, as pointed out by Gelman and colleagues (2008), even a change from -5 to 0 correspondents to a change on a probability scale of .01 to .50. It would be pretty huge effect if a 1 unit change in D.z (i.e., a 1 standard deviation difference on the IAT) corresponded to nearly .50 change in the probability of supporting adoption rights for same-sex couples. For further discussion of these priors, see Gelman et al. 2008.

Now, let’s specify these priors in a format that brms can use when we run the model:

m1priors <- c(
prior(student_t(3, 0, 2.5), class = "Intercept"),
prior(student_t(3, 0, 2.5), class = "b")
)

Ok, now we’re finally ready to fit our model (be warned, this could take a bit of time!):

m1 <- brm(
Allowed ~ D.z,
data = lrsmall,
prior = m1priors,
family = "bernoulli",
seed = 123 # Adding a seed makes results reproducible.
) 
## Warning: Rows containing NAs were excluded from the model.
## Compiling the C++ model
## Start sampling
##
## SAMPLING FOR MODEL 'd59bd52f6c6c61f33001490da41f7746' NOW (CHAIN 1).
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1:  Elapsed Time: 1.872 seconds (Warm-up)
## Chain 1:                1.638 seconds (Sampling)
## Chain 1:                3.51 seconds (Total)
##
## SAMPLING FOR MODEL 'd59bd52f6c6c61f33001490da41f7746' NOW (CHAIN 2).
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2:  Elapsed Time: 1.859 seconds (Warm-up)
## Chain 2:                1.901 seconds (Sampling)
## Chain 2:                3.76 seconds (Total)
##
## SAMPLING FOR MODEL 'd59bd52f6c6c61f33001490da41f7746' NOW (CHAIN 3).
## Chain 3: Gradient evaluation took 0.001 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds.
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3:  Elapsed Time: 1.909 seconds (Warm-up)
## Chain 3:                1.804 seconds (Sampling)
## Chain 3:                3.713 seconds (Total)
##
## SAMPLING FOR MODEL 'd59bd52f6c6c61f33001490da41f7746' NOW (CHAIN 4).
## Chain 4: Gradient evaluation took 0.001 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds.
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4:  Elapsed Time: 1.848 seconds (Warm-up)
## Chain 4:                1.683 seconds (Sampling)
## Chain 4:                3.531 seconds (Total)

So in the model above, we’ve specified that we’re regressing Allowed on D.z using the priors in m1priors. You’ll also notice that we had to specify a likelihood, which is set by the family argument. As we had a dichotomous outcome, we set this to "bernoulli". Finally, we used a seed to make the results the same each time we run this particular model.

Ok, now it’s time to look at the results. First, let’s just look at the summary of out model:

summary(m1)
##  Family: bernoulli
## Formula: Allowed ~ D.z
##    Data: lrsmall (Number of observations: 2924)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
##
## Population-Level Effects:
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept     2.96      0.10     2.77     3.17       1346 1.00
## D.z          -1.07      0.09    -1.26    -0.89       1271 1.00
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

First of all, this summary provides some really useful information about the model, including the number of observations, chains, and iterations. For a more thorough description of this information, see the brms manual here.

If you’re just getting started with Bayesian data analysis, the most familiar information is going to be under the heading Population-Level Effects. Here you’ll find summary information about the regression slope and intercept. First of all, the Rhat values near 1.0 indicates that we have convergence. If Rhat values get too large (see here for a suggestion of 1.2 or 1.1) then we should be worried. Second, we see that the most credible values of the slope are negative, which is good because above we observed that D.z was negatively correlated with Allowed.

So while our logistic regression model indicates that there is a negative relationship between Allowed and D.z, the summary of the model has estimates on the log-odds scale, which can be difficult to interpret. So let’s use the tidybayes package (Kay, 2018) to put our results on the odds scale:

parameters <- m1 %>% gather_draws(b_D.z) %>% median_hdi()
print(exp(parameters[c(".value",".lower",".upper")])) # exp() converts log-odds to odds
## # A tibble: 1 x 3
##   .value .lower .upper
##    <dbl>  <dbl>  <dbl>
## 1  0.343  0.287  0.412

On the odds scale, we are comparing our estimate to 1.0. If the most credible value of the slope is mostly below 1.0, then we infer that there’s a negative association between Allowed and D.z. This is the case here as we are 95% sure (given the data and the model) that the slope is roughly between .28 and .40.

Finally, let’s make a basic plot that shows the model predictions. Conveniently, brms has a command that makes this quite easy:

plot(marginal_effects(m1), points = TRUE, rug = TRUE)

This plot shows the predicted probability of supporting adoption for same-sex couples at different levels of D.z. First, notice that for values below zero on the x-axis (i.e., below the mean IAT score) the support of this policy is quite high: near 1.0. Then, notice that the probability of supporting adoption right for same-sex couples drops, eventually bottoming out at about .50 for those respondents that have the strongest preference for straight people over gay people. Finally, while these are less useful when dealing with a large sample like we’re using here, the points and rug arguments plot the data data alongside the model predictions. You can see from the black dots show responses on the Allowed variable (either 0 or 1), while the black vertical lines at the bottom of the plot show scores on the D.z variable. These additional features are helpful for making sure that the model is providing a sensible fit to the data.

Ok, so that’s a basic logistic regression model using brms. Future posts might address some of the issues that we didn’t touch on here, such as what to do about missing data? How to create publication-worthy logistic regression plots? And what additional model assessments can we do with brms?