1.
Researchers reporting results from an experiment often report estimates of the treatment effect at particular levels of a predictor. For example, a panel of Figure 2 in Barnes et al. 2018 plotted, over a range of hostile sexism, the estimated difference in the probability of reporting being very unlikely to vote for a female representative target involved in a sex scandal relative to the probability of reporting being very unlikely to vote for a male representative target involved in a sex scandal. For another example, Chudy 2020 plotted, over a range of racial sympathy, estimated punishments for a Black culprit target and a White culprit target. Both of these plots report estimates derived from a regression. However, as indicated in Hainmueller et al. 2020, regression can nontrivially misestimate a treatment effect at particular levels of a predictor.
This post presents another example of this phenomenon, based on data from the experiment in Costa et al. 2020 "How partisanship and sexism influence voters' reactions to political #MeToo scandals" (link to a correction to Costa et al. 2020).
---
2.
The Costa et al. 2020 experiment had a control condition, two treatment conditions, and multiple outcome variables, but my illustration will focus on only two conditions and only one outcome variable. Participants were asked to respond to four items measuring participant sexism and to rate a target male senator on a 0-to-10 scale. Participants who were then randomized to the "sexual assault" condition were provided a news story indicating that the senator had been accused of groping two women without consent. Participants who were instead randomized to the control condition were provided a news story about the senator visiting a county fair. The outcome variable of interest for this illustration is the percent change in the favorability of the senator, from the pretest to the posttest.
Estimates in the left panel of Figure 1 are based on a linear regression predicting the outcome variable of interest, using predictors of a pretest measure of participant sexism from 0 for low sexism to 16 for high sexism, a dichotomous variable coded 1 for participants in the sexual assault condition and 0 for participants in the control, and an interaction of these predictors. The panel plots the point estimates and 95% confidence intervals for the estimated difference in the outcome variable, between the control condition and the sexual assault condition, at each observed level of the participant sexism index.
The leftmost point indicates that the "least sexist" participants in the sexual assault condition were estimated to have a value of the outcome variable that was about 52 units less than the "least sexist" participants in the control condition; the "least sexist" participants in the control were estimated to have increased their rating of the senator by 4.6 percent, and the "least sexist" participants in the sexual assault condition were estimated to have reduced their rating of the senator by 47.6 percent.
The rightmost point of the plot indicates that the "most sexist" participants in the sexual assault condition were estimated to have a value of the outcome variable that was about 0 units less than did the "most sexist" participants in the control condition; the "most sexist" participants in the control were estimated to have increased their rating of the senator by 1.7 percent, and the "least sexist" participants in the sexual assault condition were estimated to have increased their rating of the senator by 2.1 percent. Based on this rightmost point, a reader could conclude about the sexual assault allegations, as Costa et al. 2020 suggested, that:
...the most sexist subjects react about the same way to sexual assault and sexist jokes allegations as they do to the control news story about the legislator attending a county fair.
However, the numbers at the inside bottom of the Figure 1 panels indicate the sample size at that level of the sexism index, across the control condition and the sexual assault condition. These numbers indicate that the regression-based estimate for the "most sexist" participants was nontrivially based on the behavior of other participants.
Estimates in the right panel of Figure 1 are instead based on t-tests conducted for participants at only the indicated level of the sexism index. As in the left panel, the estimate for the "least sexist" participants falls between -50 and -60, and, for the next few higher observed values of the sexism index, estimates tend to rise and/or tend to get closer to zero. But the tendency does not persist above the midpoint of the sexism index. Moreover, the point estimates in the right panel for the three highest values of the sexism index do not fall within the corresponding 95% confidence intervals in the left panel.
The p-value fell below p=0.05 for the 28 participants at 15 or 16 on the sexism index, with a point estimate of -22. The sample size was 1,888 across these two conditions, so participants at 15 or 16 on the sexism index represent the top 1.5% of participants on the sexism index across these two conditions. Therefore, the sexual assault treatment appears to have had an effect on these "very sexist" participants.
---
3.
Regression can reveal patterns in data. For example, linear regression estimates correctly indicated that, in the Costa et al. 2020 experiment, the effect of the sexual assault treatment relative to the control was closer to zero for participants at higher levels of a sexism index than for participants at lower level of the sexism index. However, as indicated in the illustration above, regression can produce misestimates of an effect at particular levels of a predictor. Therefore, inferences about an estimated effect at a particular level of a predictor should be based only on cases at or around that level of the predictor and should not be influenced by other cases.
---
NOTES
1. Costa et al. 2020 data.
2. Stata code for the analysis.
3. R code for the plot. CSV file for the R plot.
4. The interflex R package (Hainmueller et al. 2020) produced the plot below, using six bins. The leveling off at higher values of the sexism index also appears in this interflex plot:
R code to add to the corrected Costa et al. 2020 code:
dat$sexism16 <- (dat$pre_sexism-1)*4
summary(dat$sexism16)
p1 <- inter.binning(data=dat, Y="perchange_vote", D="condition2", X="sexism16", nbins=6, base="Control")
plot(p1)