1.

In May, I published a blog post about deviations from the pre-analysis plan for the Stephens-Dougan 2022 APSR letter, and I tweeted a link to the blog post that tagged @LaFleurPhD and asked her directly about the deviations from the pre-analysis plan. I don't recall receiving a response from Stephens-Dougan, and, a few days later, on May 31, I emailed the APSR about my post, listing three concerns:

* The Stephens-Dougan 2022 description of racially prejudiced Whites not matching how the code for Stephens-Dougan 2022 calculated estimates for racially prejudiced Whites.

* The substantial deviations from the pre-analysis plan.

* Figure 1 of the APSR letter reporting weighted estimates, but the evidence being much weaker in unweighted analyses.

Six months later (December 5), the APSR has published a correction to Stephens-Dougan 2022. The correction addresses each of my three concerns, but not perfectly, which I'll discuss below, along with other discussion about Stephens-Dougan 2022 and its correction. I'll refer to the original APSR letter as "Stephens-Dougan 2022" and the correction as "the correction".

---

2.

The pre-analysis plan associated with Stephens-Dougan 2022 listed four outcomes at the top of its page 4, but only one of these outcomes (referred to as "Individual rights and freedom threatened") was reported on in Stephens-Dougan 2022. However, Table 1 of Stephens-Dougan 2022 reported results for three outcomes that were not mentioned in the pre-analysis plan.

The t-statistics for the key interaction term for the three outcomes included in Table 1 of Stephens-Dougan 2022 but not mentioned in pre-analysis plan were 2.6, 2.0, and 2.1, all of which indicate sufficient evidence. The t-statistics for the key interaction term mentioned in pre-analysis plan but omitted from Stephens-Dougan 2022 were 0.6, 0.6, and 0.6, none of which indicate sufficient evidence.

I calculated the t-statistics of 2.6, 2.0, and 2.1 from Table 1 of Stephens-Dougan 2022, by dividing a coefficient by its standard error. I wasn't able to use the correction to calculate the t-statistics of 0.6, 0.6, and 0.6, because the relevant data for these three omitted pre-analysis plan outcomes are not in the correction but instead are in Table A12 of a "replication-final.pdf" file hosted at the Dataverse.

That's part of what I meant about an imperfect correction: a reader cannot use information published in the APSR itself to calculate the evidence provided by the outcomes that were planned to be reported on in the pre-analysis plan, or, for that matter, to see how there is substantially less evidence in the unweighted analysis. Instead, a reader needs to go to the Dataverse and dig through table after table of results.

The correction refers to deviations from the pre-analysis plan, but doesn't indicate the particular deviations and doesn't indicate what happens when these deviations are not made.  The "Supplementary Materials Correction-Final.docx" file at the Dataverse for Stephens-Dougan 2022 has a discussion of deviations from the pre-analysis plan, but, as far as I can tell, the discussion does not provide a reason why the results should not be reported for the three omitted outcomes, which were labeled in Table A12 as "Slow the Spread", "Stay Home", and "Too Long to Loosen Restrictions".

It seems to me to be a bad policy to permit researchers to deviate from a pre-analysis plan without justification and to merely report results from a planned analysis on, say, page 46 of a 68-page file on the Dataverse. But a bigger problem might be that, as far as I can tell, many journals don't even attempt to prevent misleading selective reporting for survey research for which there is no pre-analysis plan. Journals could require researchers reporting on surveys to submit or link to the full questionnaire for the surveys or at least to declare that the main text reports on results for all plausible measured outcomes and moderators.

---

3.

Next, let me discuss a method used in Stephens-Dougan 2022 and the correction, which I think is a bad method.

The code for Stephens-Dougan 2022 used measures of stereotypes about Whites and Blacks on the traits of hard working and intelligent, to create a variable called "negstereotype_endorsement". The code divided respondents into three categories, coded 0 for respondents who did not endorse a negative stereotype about Blacks relative to Whites, 0.5 for respondents who endorsed exactly one of the two negative stereotypes about Blacks relative to Whites, and 1 for respondents who endorsed both negative stereotypes about Blacks relative to Whites. For both Stephens-Dougan 2022 and the correction, Figure 3 reported for each reported outcome an estimate of how much the average treatment effect among prejudiced Whites (defined as those coded 1) differed from the average treatment effect among unprejudiced Whites (defined as those coded 0).

The most straightforward way to estimate this difference in treatment effects is to [1] calculate the treatment effect for prejudiced Whites coded 1, [2] calculate the treatment effect for unprejudiced Whites coded 0, and [3] calculate the difference between these treatment effects. The code for Stephens-Dougan 2022 instead estimated this difference using a logit regression that had three predictors: the treatment, the 0/0.5/1 measure of prejudice, and an interaction of the prior two predictors. But, by this method, the estimated difference in treatment effect between the 1 respondents and the 0 respondents depends on the 0.5 respondents. I can't think of a valid reason why responses from the 0.5 respondents should influence an estimated difference between the 0 respondents and the 1 respondents.

See my Stata output file for more on that. The influence of the 0.5 respondents might not be major in most or all cases, but an APSR reader won't know, based on Stephens-Dougan 2022 or its correction, the extent to which the 0.5 respondents influenced the estimates for the comparison of the 0 respondents to the 1 respondents.

Now about those 0.5 respondents…

---

4.

Remember that the Stephens-Dougan 2022 "negative stereotype endorsement" variable has three levels: 0 for the 74% of respondents who did not endorse a negative stereotype about Blacks relative to Whites, 0.5 for the 16% of respondents who endorsed exactly one of the two negative stereotypes about Blacks relative to Whites, and 1 for the 10% of respondents who endorsed both negative stereotypes about Blacks relative to Whites.

The correction indicates that "I discovered an error in the description of the variable, negative stereotype endorsement" and that "there was no error in the code used to create the variable". So was the intent for Stephens-Dougan 2022 to measure racial prejudice so that only the 1 respondents are considered prejudiced? Or was the intent to consider the 0.5 respondents and the 1 respondents to be prejudiced?

The pre-analysis plan seems to indicate a different method for measuring the moderator of negative stereotype endorsement:

The difference between the rating of Blacks and Whites is taken on both dimensions (intelligence and hard work) and then averaged.

But the pre-analysis plan also indicates that:

For racial predispositions, we will use two or three bins, depending on their distributions.

So, even ignoring the plan to average the stereotype ratings, the pre-analysis plan is inconclusive about whether the intent was to use two or three bins. Let's try this passage from Stephens-Dougan 2022:

A nontrivial fraction of the nationally representative sample—26%—endorsed either the stereotype that African Americans are less hardworking than whites or that African Americans are less intelligent than whites.

So that puts the 16% of respondents at the 0.5 level of negative stereotype endorsement into the same bin as the 10% at the 1 level of negative stereotype endorsement. Stephens-Dougan 2022 doesn't report the percentage that endorsed both negative stereotypes about Blacks. Reporting the percentage of 26% is what would be expected if the intent was to place into one bin any respondent who endorsed at least one of the negative stereotypes about Blacks, so I'm a bit skeptical of the claim in the correction that the description is in error and the code was correct. Maybe I'm missing something, but I don't see how someone who intends to have three bins reports the 26% and does not report the 10%.

For another thing, Stephens-Dougan 2022 has only three figures: Figure 1 reports results for racially prejudiced Whites, Figure 2 reports results for non-racially prejudiced Whites, and Figure 3 reports on the difference between racially prejudiced Whites and non-racially prejudiced Whites. Did Stephens-Dougan 2022 intend to not report results for the group of respondents who endorsed exactly one of the negative stereotypes about Blacks? Did Stephens-Dougan 2022 intend to suggest that respondents who rate Blacks as lazier in general than Whites aren't racially prejudiced as long as they rate Blacks equal to or higher than Whites in general on intelligence?

---

5.

Stephens-Dougan 2022 and the correction depict 84% confidence intervals in all figures. Stephens-Dougan 2022 indicated (footnote omitted) that:

For ease of interpretation, I plotted the predicted probability of agreeing with each pandemic measure in Figure 1, with 84% confidence intervals, the graphical equivalent to p < 0.05.

The 84% confidence interval is good for assessing a p=0.05 difference between estimates, but not for assessing at p=0.05 whether an estimate differs from a particular number such as zero. So 84% confidence intervals make sense for Figures 1 and 2, in which the key comparisons are of the control estimate to the treatment estimate. But 84% confidence intervals don't make as much sense for Figure 3, which plot only one estimate and for which the key assessment is whether the estimate differs from zero (Figure 3 in Stephens-Dougan 2022) or from 1 (the correction).

---

6.

I didn’t immediately realize why, in Figure 3 in Stephens-Dougan 2022, two of the four estimates cross zero, but in Figure 3 in the correction, none of the four estimates cross zero. Then I realized that the estimates plotted in Figure 3 of the correction (but not Figure 3 in Stephens-Dougan 2022) are odds ratios.

The y-axis for odds ratios for Figure 3 of the correction ranges from 0 to 30-something, using a linear scale. The odds ratio that indicates no effect is 1, and an odds ratio can't be negative, so that it why none of the four estimates cross zero in the corrected Figure 3.

It seems like a good idea for a plot of odds ratios to have a guideline for 1, so that readers can assess whether an odds ratio indicating no effect is a plausible value. And a log scale seems like a good idea for odds ratios, too. Relevant prior post that mentions that Fenton and Stephens-Dougan 2021 described a "very small" 0.01 odds ratio as "not substantively meaningful".

None of the 84% confidence intervals for Figure 3 capture an odds ratio that crosses 1, but an 84% confidence interval for Figure A3 in "Supplementary Materials Correction-Final.docx" does.

---

7.

Often, when I alert an author or journal to an error in a publication, the subsequent correction doesn't credit me for my work. Sometimes the correction even suggests that the authors themselves caught the error, like the correction to Stephens-Dougan 2022 seems to do:

After reviewing my code, I discovered an error in the description of the variable, negative stereotype endorsement.

I guess it's possible that Stephens-Dougan "discovered" the error. For instance, maybe after she submitted page proofs, for some reason she decided to review her code, and just happened to catch the error that she had missed before, and it's a big coincidence that this was the same error that I blogged about and alerted the APSR to.

And maybe Stephens-Dougan also discovered that her APSR letter misleadingly deviated from the relevant pre-analysis plan, so that I don't deserve credit for alerting the APSR to that.

Tagged with: , , , , , , , ,

The American Political Science Review recently published a letter: Stephens-Dougan 2022 "White Americans' reactions to racial disparities in COVID-19".

Figure 1 of the Stephens-Dougan 2022 APSR letter reports results for four outcomes among racially prejudiced Whites, with the 84% confidence interval in the control overlapping with the 84% confidence interval in the treatment for only one of the four reported outcomes (zooming in on Figure 1, the confidence intervals for the parks outcome don't seem to overlap, and the code returns 0.1795327 for the upper bound for the control and 0.18800818 for the lower bound for the treatment). And results for the most obviously overlapping 84% confidence intervals seem to be interpreted as sufficient evidence of an effect, with all four reported outcomes discussed in the passage below:

When racially prejudiced white Americans were exposed to the racial disparities information, there was an increase in the predicted probability of indicating that they were less supportive of wearing face masks, more likely to feel their individual rights were being threatened, more likely to support visiting parks without any restrictions, and less likely to think African Americans adhere to social distancing guidelines.

---

There are at least three things to keep track of: [1] the APSR letter, [2] the survey questionnaire, located at the OSF site for the Time-sharing Experiments for the Social Sciences project; and [3] the pre-analysis plan, located at the OSF and in the appendix of the APSR article. I'll use the PDF of the pre-analysis plan. The TESS site also has the proposal for the survey experiment, but I won't discuss that in this post.

---

The pre-analysis plan does not mention all potential outcome variables that are in the questionnaire, but the pre-analysis plan section labeled "Hypotheses" includes the passage below:

Specifically, I hypothesize that White Americans with anti-Black attitudes and those White Americans who attribute racial disparities in health to individual behavior (as opposed to structural factors), will be more likely to disagree with the following statements:

The United States should take measures aimed at slowing the spread of the coronavirus while more widespread testing becomes available, even if that means many businesses will have to stay closed.

It is important that people stay home rather than participating in protests and rallies to pressure their governors to reopen their states.

I also hypothesize that White Americans with anti-Black attitudes and who attribute racial health disparities to individual behavior will be more likely to agree with the following statements:

State and local directives that ask people to "shelter in place" or to be "safer at home" are a threat to individual rights and freedom.

The United States will take too long in loosening restrictions and the economic impact will be worse with more jobs being lost

The four outcomes mentioned in the passage above correspond to items Q15, Q18, Q16, and Q21 in the survey questionnaire, but, of these four outcomes, the APSR letter reported on only Q16.

The outcome variables in the APSR letter are described as: "Wearing facemasks is not important", "Individual rights and freedom threatened", "Visit parks without any restrictions", and "Black people rarely follow social distancing guidelines". These outcome variables correspond to survey questionnaire items Q20, Q16, Q23A, and Q22A.

---

The pre-analysis plan PDF mentions moderators, with three moderators about racial dispositions: racial resentment, negative stereotype endorsement, and attributions for health disparities. The plan indicates that:

For racial predispositions, we will use two or three bins, depending on their distributions. For ideology and party, we will use three bins. We will include each bin as a dummy variable, omitting one category as a baseline.

The APSR letter reported on only one racial predispositions moderator: negative stereotype endorsement.

---

I'll post a link in the notes below to some of my analyses about the "Specifically, I hypothesize" outcomes, but I don't want to focus on the results, because I wanted this post to focus on deviations from the pre-analysis plan, because -- regardless of whether the estimates from the analyses in the APSR letter are similar to the estimates from the planned analyses in the pre-analysis plan -- I think that it's bad that readers can't trust the APSR to ensure that a pre-analysis plan is followed or at least to provide an explanation about why a pre-analysis plan was not followed, especially given that this APSR letter described itself as reporting on "a preregistered survey experiment" and included the pre-analysis plan in the appendix.

---

NOTES

1. The Stephens-Dougan 2022 APSR letter suggests that the negative stereotype endorsement variable was coded dichotomously ("a variable indicating whether the respondent either endorsed the stereotype that African Americans are less hardworking than whites or the stereotype that African Americans are less intelligent than whites"), but the code and the appendix of the APSR letter indicate that the negative stereotype endorsement variable was measured so that the highest level is for respondents who reported a negative relative stereotype about Blacks for both stereotypes. From Table A7:

(unintelligentstereotype 2 + lazystereotype2 )/2

In the data after running the code for the APSR letter, the negative stereotype endorsement variable is a three-level variable coded 0 for respondents who did not report a negative relative stereotype about Blacks for either stereotype, 0.5 for respondents who reported a negative stereotype about Blacks for one stereotype, and 1 for respondents who reported a negative relative stereotype about Blacks for both stereotypes.

2. The APSR letter indicated that:

The likelihood of racially prejudiced respondents in the control condition agreeing that shelter-in-place orders threatened their individual rights and freedom was 27%, compared with a likelihood of 55% in the treatment condition (p < 0.05 for a one-tailed test).

My analysis using survey weights got 44% and 29% among participants who reported a negative relative stereotype about Blacks for at least one of the two stereotype items, and my analysis got 55% and 26% among participants who reported negative relative stereotypes about Blacks for both stereotype items, with a trivial overlap in 84% confidence intervals.

But the 55% and 26% in a weighted analysis were 43% and 37% in an unweighted analysis with a large overlap in 84% confidence intervals, suggesting that at least some of the results in the APSR letter might be limited to the weighted analysis. I ran the code for the APSR letter removing the weights from the glm command and got the revised Figure 1 plot below. The error bars in the APSR letter are described as 84% confidence intervals.

I think that it's fine to favor the weighted analysis, but I'd prefer that publications indicate when results from an experiment are not robust to the application or non-application of weights. Relevant publication.

3. Given the results in my notes [1] and [2], maybe the APSR letter's Figure 1 estimates are for only respondents who reported negative relative stereotype about Blacks for both stereotypes. If so, the APSR letter's suggestion that this population is the 26% that reported anti-Black stereotypes for either stereotype might be misleading, if the Figure 1 analyses were estimated for only the 10% that reported negative relative stereotype about Blacks for both stereotypes.

For what it's worth, the R code for the APSR letter has code that doesn't use the 0.5 level of the negative stereotype endorsement variable, such as:

# Below are code for predicted probabilities using logit model

# Predicted probability "individualrights_dichotomous"

# Treatment group, negstereotype_endorsement = 1

p1.1 <- invlogit(coef(glm1)[1] + coef(glm1)[2] * 1 + coef(glm1)[3] * 1 + coef(glm1)[4] * 1)

It's possible to see what happens to the Figure 1 results when the negative stereotype endorsement variable is coded 1 for respondents who endorsed at least one of the stereotypes. Run this at the end of the Stata code for the APSR letter:

replace negstereotype_endorsement = ceil((unintelligentstereotype2 + lazystereotype2)/2)

Then run the R code for the APSR letter. Below is the plot I got for a revised Figure 1, with weights applied and the sample limited to respondents who endorsed at least one of the stereotypes:

Estimates in the figure above were close to estimates in my analysis using these Stata commands after running the Stata code from the APSR letter. Stata output.

4. Data, Stata code, and Stata output for my analysis about the "Specifically, I hypothesize" passage of the Stephens-Dougan pre-analysis plan.

My analysis in the Stata output had seven outcomes: the four outcomes mentioned in the "Specifically, I hypothesize" part of the pre-analysis plan as initially measured (corresponding to questionnaire items Q15, Q18, Q16, and Q21), with no dichotomization of five-point response scales for Q15, Q18, and Q16; two of these outcomes (Q15 and Q16) dichotomized as mentioned in the pre-analysis plan (e.g., "more likely to disagree" was split into disagree / not disagree categories, with the not disagree category including respondent skips); and one outcome (Q18) dichotomized so that one category has "Not Very Important" and "Not At All Important" and the other category has the other responses and skips, given that the pre-analysis plan had this outcome dichotomized as disagree but response options in the survey were not on an agree-to-disagree scale. Q21 was measured as a dichotomous variable.

The analysis was limited to presumed racially prejudiced Whites, because I think that that's what the pre-analysis plan hypotheses quoted above focused on. Moreover, that analysis seems more important than a mere difference between groups of Whites.

Note that, for at least some results, a p<0.05 treatment effect might be in the unintuitive direction, so be careful before interpreting a p<0.05 result as evidence for the hypotheses.

My analyses aren't the only analyses that can be conducted, and it might be a good idea to combine results across outcomes mentioned in the pre-analysis plan or across all outcomes in the questionnaire, given that the questionnaire had at least 12 items that could serve as outcome variables.

For what it's worth, I wouldn't be surprised if a lot of people who respond to survey items in an unfavorable way about Blacks backlashed against a message about how Blacks were more likely than Whites to die from covid-19.

5. The pre-analysis plan included a footnote that:

Given the results from my pilot data, it is also my expectation that partisanship will moderate the effect of the treatment or that the treatment effects will be concentrated among Republican respondents.

Moreover, the pre-analysis plan indicated that:

The condition and treatment will be blocked by party identification so that there are roughly equal numbers of Republicans and Democrats in each condition.

But the lone mention of "Repub-" in the APSR letter is:

The sample was 39% self-identified Democrats (including leaners) and 46% self-identified Republicans (including leaners).

6. Link to tweets about the APSR letter.

Tagged with: , , , , , , , ,

The British Journal of Political Science published Jardina and Piston 2021 "The Effects of Dehumanizing Attitudes about Black People on Whites' Voting Decisions".

---

1.

Jardina and Piston 2021 used the "Ascent of Man" measure of dehumanization, which I have discussed previously. Jardina and Piston 2021 subtracted participant responses to the 0-to-100 measure of perceptions of how evolved Blacks are from participant responses to the 0-to-100 measure of perceptions of how evolved Whites are, and placed this difference on a 0-to-1 scale.

Jardina and Piston 2021 placed this 0-to-1 measure of dehumanization into an OLS regression with controls, took the resulting coefficient, such as 0.60 in Table 1 (for which the p-value is less than p=0.001), and halved that coefficient, so that, for the 0.60 coefficient, moving from the neutral point on the dehumanization scale to the highest measured dehumanizing about Blacks relative to Whites accounted for 0.30 points on the outcome variable scale, which for this estimate was a 0-to-100 feeling thermometer rating about Donald Trump placed on a 0-to-1 scale.

However, this research design means that 0.30 points on a 0-to-1 scale is also the corresponding estimate of how much dehumanizing about Whites relative to Blacks affected feeling thermometer ratings about Donald Trump. Jardina and Piston 2021 thus did not permit the estimate of the marginal effect of dehumanizing Blacks to differ from the estimate of the marginal effect of dehumanizing Whites.

I discussed this before (1, 2), but it's often better to not assume a linear association for continuous predictors (citation to Hainmueller et al. 2019).

Figure 3 of Jardina and Piston 2021 censors the estimated effect of dehumanizing Whites, by plotting predicted probabilities of a Trump vote among Whites but restricting the range of dehumanization to run from neutral (0.5 on the dehumanization measure) to most dehumanization about Blacks (1.0 on the measure).

---

2.

Jardina and Piston 2021 claimed that "Finally, our findings serve as a warning about the nature of Whites' racial attitudes in the contemporary United States" (p. 20). But Jardina and Piston 2021 did not report any evidence that Whites' attitudes in this area differ from non-Whites' attitudes in this area. That seems like a relevant question for researchers interested in understanding racial attitudes.

If I'm reading page 9 correctly, Jardina and Piston 2021 reported on original survey data from a 2016 YouGov two-wave panel of 600 non-Hispanic Whites, a 2016 Qualtrics survey of 500 non-Hispanic Whites, a 2016 GfK survey of 2,000 non-Hispanic Whites, and another 2016 YouGov two-wave panel of 600 non-Hispanic Whites.

The funding statement in Jardina and Piston 2021 acknowledges only Duke University and Boston University. That's a lot of internal resources for surveys conducted in a single year, and I don't think that Jardina and Piston 2021 limiting the analysis to Whites can be reasonably attributed to a lack of resources.

The Qualtrics_BJPS.dta dataset at the Dataverse page for Jardina and Piston 2021 has cases for 1,125 Whites, 242 Blacks, 88 Asians, 45 Native Americans, and 173 coded Other, with respective non-Latino cases of 980, 213, 83, 31, and 38. The Dataverse page doesn't have a codebook for that dataset, and the relevant variable names in that dataset aren't clear to me, but I'll plan to post a follow-up here if I get sufficient information to analyze responses from non-White participants.

---

3.

Jardina and Piston 2021 suggested (p. 4) that:

We also suspect that recent trends in the social and natural sciences are legitimizing beliefs about biological differences between racial groups in ways that reinforce a propensity to dehumanize Black people.

This passage did not mention the Jardina and Piston 2015/6 TESS experiment in which participants were assigned to a control condition, or a condition with a reading entitled "Genes May Cause Racial Difference in Heart Disease", or a condition with a reading entitled "Social Conditions May Cause Racial Difference in Heart Disease".

My analysis of data for that experiment found a p<0.01 difference between treatment groups in mean responses to an item about whether there are biological differences between Blacks and Whites, which suggests that the treatment worked. But the treatment didn't produce a detectable effect on key outcomes, according to the description of results on the page for the Jardina and Piston 2015/6 TESS experiment, which indicates that "Experimental conditions are not statistically associated with the distribution of responses to the outcome variables". This null result seems to be relevant for the above quoted suspicion from Jardina and Piston 2021.

---

4.

Jardina and Piston 2021 indicated that "Dehumanization has serious consequences. It places the targets of these attitudes outside of moral consideration, ..." (p. 6). But the Jardina and Piston proposal for the 2015/6 TESS experiment had proposed that some participants be exposed to a treatment that Jardina and Piston hypothesized would increase participant "biological racism", to use a term from their proposal.

Selected passages from the proposal are below:

We hypothesize that the proportion of respondents rating whites as more evolved than blacks is highest in the Race as Genetics Condition, lower in the Control Condition, and lowest in the Race as Social Construction Condition.

...Our study will also inform scholarship on news media communication, demonstrating that ostensibly innocuous messages about race, health, and genetics can have pernicious consequences.

Exposing some participants to a treatment that the researchers hypothesized as having "pernicious consequences" seems like an interesting ethical issue that the proposal didn't discuss.

Moreover, like some other research that uses the Ascent of Man measure of dehumanization, the Jardina and Piston 2015/6 TESS experiment included the statement that "People can vary in how human-like they seem". I wonder which people this statement is meant to refer to. Nothing in the debriefing indicated that this statement was deception.

---

5.

The dataset for the Jardina and Piston 2015/6 TESS experiment includes comments from participants. I thought that comments from participants with IDs 499 and 769 were worth highlighting (the statements were cut off in the dataset):

I disliked this survey as you should ask the same questions about whites. I was not willing to say blacks were not rational but whites are not rational either. But to avoid thinking I was prejudice I had to give a higher rating. All humans a

Black people are not less evolved, 'less evolved' is a meaningless term as evolution is a constant process and the only difference is what particular adaptations a group has. I don't like to claim certainty about things of which I am unsure, a

---

NOTES

1. The Table 3 header for Jardina and Piston 2021 indicates that vote choice is the outcome for that table, but the corresponding note indicates that "Higher values of the dependent variable indicate greater warmth toward Trump on the 101-point feeling thermometer". Moreover, Figure 3 of Jardina and Piston 2021 plots predicted probabilities of a vote for Trump, but the figure note indicates that the figure was "based on Table 4, Model 4", which is instead about warmth toward Obama.

2. Jardina and Piston 2021 reported results for participant responses about the "dehumanizing characteristics" of "savage", "barbaric", and "lacking self-restraint, like animals", so I checked how responses to the "violent" stereotype item associated with two-party presidential vote choice in data from the ANES 2020 Time Series Study.

Results indicated that, compared to White participants who rated Whites as being as violent on average as Blacks, White participants who rated Blacks as being more violent on average than Whites were more likely to vote for Donald Trump net of controls (p<0.05). But result also indicated that, compared to White participants who rated Whites as being as violent on average as Blacks, White participants who rated Whites as being more violent on average than Blacks were less likely to vote for Donald Trump net of controls (p<0.05). See lines 105 through 107 in the output.

3. Jardina and Piston 2021 reported that, in their 2016b YouGov survey, 42% of Whites rated Whites as more evolved than Blacks (pp. 9-10). For a comparison, the Martherus et al. 2019 study about Democrat and Republican dehumanization of outparty members reported dehumanization percentages of "nearly 77%" (2018 SSI study) and "just over 80%" (2018 CCES study).

4. Data sources:

American National Election Studies. 2021. ANES 2020 Time Series Study Preliminary Release: Combined Pre-Election and Post-Election Data [dataset and documentation]. March 24, 2021 version. www.electionstudies.org.

Ashley Jardina and Spencer Piston. 2015/6. Data for: "Explaining the Prevalence of White Biological Racism against Blacks". Time-sharing Experiments for the Social Sciences. https://www.tessexperiments.org/study/pistonBR61

Ashley Jardina and Spencer Piston. 2021. Replication Data for: The Effects of Dehumanizing Attitudes about Black People on Whites' Voting Decisions, https://doi.org/10.7910/DVN/A3XIFC, Harvard Dataverse, V1, UNF:6:nNg371BCnGaWtRyNdu0Lvg== [fileUNF]
Tagged with: , , ,

One notable finding in the racial discrimination literature is the boomerang/backlash effect reported in Peffley and Hurwitz 2007:

"...whereas 36% of whites strongly favor the death penalty in the baseline condition, 52% strongly favor it when presented with the argument that the policy is racially unfair" (p. 1001).

The racially-unfair argument shown to participants was: "[Some people say/FBI statistics show] that the death penalty is unfair because most of the people who are executed are African Americans" (p. 1002). Statistics reported in Peffley and Hurwitz 2007 Table 1 indicate that responses differed at p<=0.05 for Whites in the baseline no-argument condition compared to Whites in the argument condition.

However, the boomerang/backlash effect did not appear at p<=0.05 in large-N MTurk direct and conceptual replication attempts reported on in Butler et al. 2017 or in my analysis of a nearly-direct replication attempt using a large-N sample of non-Hispanic Whites in a TESS study by Spencer Piston and Ashley Jardina with data collection by GfK, with a similar null result for a similar racial-bias-argument experiment regarding three strikes laws.

For the weighted TESS data, on a scale from 0 for strongly oppose to 1 for strongly favor, support for the death penalty for persons convicted of murder was 0.015 units lower (p=0.313, n=2018) in the condition in which participants were told "Some people say that the death penalty is unfair because most of the people who are executed are black", compared to the condition in which participants did not receive that statement, with controls for the main experimental conditions for the TESS study, which appeared earlier in the survey. This lack of statistical significance remained when the weighted sample was limited to liberals and extreme liberals; slight liberals, liberals, and extreme liberals; conservatives and extreme conservatives; and slight conservatives, conservatives, and extreme conservatives. There was also no statistically-significant difference between conditions in my analysis of the unweighted data. Regarding missing data, 7 of 1,034 participants in the control condition and 9 of 1,000 participants in the experimental condition did not provide a response.

Moreover, in the prior item on the survey, on a 0-to-1 scale, responses were 0.013 units higher (p=0.403, n=2025) for favoring three strikes laws in the condition in which participants were told that "...critics argue that these laws are unfair because they are especially likely to affect black people", compared to the compared to the condition in which participants did not receive that statement, with controls for the main experimental conditions for the TESS study, which appeared earlier in the survey. This lack of statistical significance remained when the weighted sample was limited to liberals and extreme liberals; slight liberals, liberals, and extreme liberals; conservatives and extreme conservatives; and slight conservatives, conservatives, and extreme conservatives. There was also no statistically-significant difference between conditions in my analysis of the unweighted data. Regarding missing data, 6 of 986 participants in the control condition and 3 of 1,048 participants in the experimental condition did not provide a response.

Null results might be attributable to participants not paying attention, so it is worth noting that the main treatment in the TESS experiment was that participants in one of the three conditions were given a passage to read entitled "Genes May Cause Racial Difference in Heart Disease" and participants in another of the three conditions were given a passage to read entitled "Social Conditions May Cause Racial Difference in Heart Disease". There was a statically-significant difference between these conditions in responses to an item about whether there are biological differences between blacks and whites (p=0.008, n=2,006), with responses in the Genes condition indicating greater estimates of biological differences between blacks and whites.

---

NOTE:

Data for the TESS study are available here. My Stata code is available here.

Tagged with: , , ,

Here's part of the abstract from Rios Morrison and Chung 2011, published in the Journal of Experimental Social Psychology:

In both studies, nonminority participants were randomly assigned to mark their race/ethnicity as either "White" or "European American" on a demographic survey, before answering questions about their interethnic attitudes. Results demonstrated that nonminorities primed to think of themselves as White (versus European American) were subsequently less supportive of multiculturalism and more racially prejudiced, due to decreases in identification with ethnic minorities.

So asking white respondents to select their race/ethnicity as "European American" instead of "White" influenced whites' attitudes toward and about ethnic minorities. The final sample for study 1 was a convenience sample of 77 self-identified whites and 52 non-whites, and the final sample for study 2 was 111 white undergraduates.

Like I wrote before, if you're thinking that it would be interesting to see whether results hold in a nationally representative sample with a large sample size, well, that was tried, with a survey experiment as part of the Time Sharing Experiments in the Social Sciences. Here are the results:

mc2011reanalysis

I'm mentioning these results again because in October 2014 the journal that published Rios Morrison and Chung 2011 desk rejected the manuscript that I submitted describing these results. So you can read in the Journal of Experimental Social Psychology about results for the low-powered test on convenience samples for the "European American" versus "White" self-identification hypothesis, but you won't be able to read in the JESP about results when that hypothesis was tested with a higher-powered test on a nationally-representative sample with data collected by a disinterested third party.

I submitted a revision of the manuscript to Social Psychological and Personality Science, which extended a revise-and-resubmit offer conditional on inclusion of a replication of the TESS experiment. I planned to conduct an experiment with an MTurk sample, but I eventually declined the revise-and-resubmit opportunity for various reasons.

The most recent version of the manuscript is here. Links to data and code.

Tagged with: , , , , , ,

In the Political Behavior article, "The Public's Anger: White Racial Attitudes and Opinions Toward Health Care Reform", Antoine J. Banks presented evidence that "anger uniquely pushes racial conservatives to be more opposing of health care reform while it triggers more support among racial liberals" (p. 493). Here is how the outcome variable was measured in the article's reported analysis (p. 511):

Health Care Reform is a dummy variable recoded 0-1 with 1 equals opposition to reform. The specific item is "As of right now, do you favor or oppose Barack Obama and the Democrats' Health Care reform bill". The response options were yes = I favor the health care bill or no = I oppose the health care bill.

However, the questionnaire for the study indicates that there were multiple items used to measure opinions of health care reform:

W2_1. Do you approve or disapprove of the way Barack Obama is handling Health Care? Please indicate whether you approve strongly, approve somewhat, neither approve nor disapprove, disapprove somewhat, or disapprove strongly.

W2_2. As of right now, do you favor or oppose Barack Obama and the Democrats' Health Care reform bill?

[if "favor" on W2_2] W2_2a. Do you favor Barack Obama and the Democrats' Health Care reform bill very strongly, or not so strongly?

[if "oppose" on W2_2] W2_2b. Do you oppose Barack Obama and the Democrats' Health Care reform bill very strongly, or not so strongly?

The bold item above is the only item reported on as an outcome variable in the article. The reported analysis omitted results for one outcome variable (W2_1) and reported dichotomous results for the other outcome variable (W2_2) for which the apparent intention was to have a four-pronged outcome variable from oppose strongly to favor strongly.

---

Here is the manuscript that I submitted to Political Behavior in March 2015 describing the results using the presumed intended outcome variables and a straightforward research design (e.g., no political discussion control, no exclusion of cases, cases from all conditions analyzed at the same time). Here's the main part of the main figure:

Banks2014Reproduction

The takeaway is that, with regard to opposition to health care reform, the effect of the fear condition on symbolic racism differed at a statistically significant level from the effect of the baseline relaxed condition on symbolic racism; however, contra Banks 2014, the effect of anger on symbolic racism did not differ at a statistically significant level from the effect of the relaxed condition on symbolic racism. The anger condition had a positive effect on symbolic racism, but it was not a unique influence.

The submission to Political Behavior was rejected after peer review. Comments suggested analyzing the presumed intended outcome variables while using the research design choices in Banks 2014. Using the model in Table 2 column 1 of Banks 2014, the fear interaction term and the fear condition term are statistically significant at p<0.05 for predicting the two previously-unreported non-dichotomous outcome variables and for predicting the scale of these two variables; the anger interaction term and the anger condition term are statistically significant at p<0.05 for predicting two of these three outcome variables, with p-values for the residual "Obama handling" outcome variable at roughly 0.10. The revised manuscript describing these results is here.

---

Data are here, and code for the initial submission is here.

---

Antoine Banks has published several studies on anger and racial politics (here, for example) that should be considered when making inferences about the substance of the effect of anger on racial attitudes. Banks had a similar article published in the AJPS, with Nicholas Valentino. Data for that article are here. I did not see any problems with that analysis, but I didn't look very hard, because the posted data were not the raw data: the posted data that I checked omitted, for example, the variables used to construct the outcome variable.

Tagged with: , , , , , , ,

Here is the manuscript that I plan to present at the 2015 American Political Science Association conference in September: revised version here. The manuscript contains links to locations of the data; a file of the reproduction code for the revised manuscript  is here.

Comments are welcome!

Abstract and the key figure are below:

Racial bias is a persistent concern in the United States, but polls have indicated that whites and blacks on average report very different perceptions of the extent and aggregate direction of this bias. Meta-analyses of results from a population of sixteen federally-funded survey experiments, many of which have never been reported on in a journal or academic book, indicate the presence of a moderate aggregate black bias against whites but no aggregate white bias against blacks.

Metan w mcNOTE:

I made a few changes since submitting the manuscript: [1] removing all cases in which the target was not black or white (e.g., Hispanics, Asians, control conditions in which the target did not have a race); [2] estimating meta-analyses without removing cases based on a racial manipulation check; and [3] estimating meta-analyses without the Cottrell and Neuberg 2004 survey experiment, given that that survey experiment was more about perceptions of racial groups instead of a test for racial bias against particular targets.

Numeric values in the figure are for a meta-analysis that reflects [1] above:

* For white respondents: the effect size point estimate was 0.039 (p=0.375), with a 95% confidence interval of [-0.047, 0.124].
* For black respondents: the effect size point estimate was 0.281 (p=0.016), with a 95% confidence interval of [0.053, 0.509].

---

The meta-analysis graph includes five studies for which a racial manipulation check was used to restrict the sample: Pager 2006, Rattan 2010, Stephens 2011, Pedulla 2011, and Powroznik 2014. Inferences from the meta-analysis were the same when these five studies included respondents who failed the racial manipulation checks:

* For white respondents: the effect size point estimate was 0.027 (p=0.499), with a 95% confidence interval of [-0.051, 0.105].
* For black respondents: the effect size point estimate was 0.268 (p=0.017), with a 95% confidence interval of [0.047, 0.488].

---

Inferences from the meta-analysis were the same when the Cottrell and Neuberg 2004 survey experiment was removed from the meta-analysis. For the residual 15 studies using the racial manipulation check restriction:

* For white respondents: the effect size point estimate was 0.063 (p=0.114), with a 95% confidence interval of [-0.015, 0.142].
* For black respondents: the effect size point estimate was 0.210 (p=0.010), with a 95% confidence interval of [0.050, 0.369].

---

For the residual 15 studies not using the racial manipulation check restriction:

* For white respondents: the effect size point estimate was 0.049 (p=0.174), with a 95% confidence interval of [-0.022, 0.121].
* For black respondents: the effect size point estimate was 0.194 (p=0.012), with a 95% confidence interval of [0.044, 0.345].

Tagged with: , ,