1.

In May, I published a blog post about deviations from the pre-analysis plan for the Stephens-Dougan 2022 APSR letter, and I tweeted a link to the blog post that tagged @LaFleurPhD and asked her directly about the deviations from the pre-analysis plan. I don't recall receiving a response from Stephens-Dougan, and, a few days later, on May 31, I emailed the APSR about my post, listing three concerns:

* The Stephens-Dougan 2022 description of racially prejudiced Whites not matching how the code for Stephens-Dougan 2022 calculated estimates for racially prejudiced Whites.

* The substantial deviations from the pre-analysis plan.

* Figure 1 of the APSR letter reporting weighted estimates, but the evidence being much weaker in unweighted analyses.

Six months later (December 5), the APSR has published a correction to Stephens-Dougan 2022. The correction addresses each of my three concerns, but not perfectly, which I'll discuss below, along with other discussion about Stephens-Dougan 2022 and its correction. I'll refer to the original APSR letter as "Stephens-Dougan 2022" and the correction as "the correction".

---

2.

The pre-analysis plan associated with Stephens-Dougan 2022 listed four outcomes at the top of its page 4, but only one of these outcomes (referred to as "Individual rights and freedom threatened") was reported on in Stephens-Dougan 2022. However, Table 1 of Stephens-Dougan 2022 reported results for three outcomes that were not mentioned in the pre-analysis plan.

The t-statistics for the key interaction term for the three outcomes included in Table 1 of Stephens-Dougan 2022 but not mentioned in pre-analysis plan were 2.6, 2.0, and 2.1, all of which indicate sufficient evidence. The t-statistics for the key interaction term mentioned in pre-analysis plan but omitted from Stephens-Dougan 2022 were 0.6, 0.6, and 0.6, none of which indicate sufficient evidence.

I calculated the t-statistics of 2.6, 2.0, and 2.1 from Table 1 of Stephens-Dougan 2022, by dividing a coefficient by its standard error. I wasn't able to use the correction to calculate the t-statistics of 0.6, 0.6, and 0.6, because the relevant data for these three omitted pre-analysis plan outcomes are not in the correction but instead are in Table A12 of a "replication-final.pdf" file hosted at the Dataverse.

That's part of what I meant about an imperfect correction: a reader cannot use information published in the APSR itself to calculate the evidence provided by the outcomes that were planned to be reported on in the pre-analysis plan, or, for that matter, to see how there is substantially less evidence in the unweighted analysis. Instead, a reader needs to go to the Dataverse and dig through table after table of results.

The correction refers to deviations from the pre-analysis plan, but doesn't indicate the particular deviations and doesn't indicate what happens when these deviations are not made.  The "Supplementary Materials Correction-Final.docx" file at the Dataverse for Stephens-Dougan 2022 has a discussion of deviations from the pre-analysis plan, but, as far as I can tell, the discussion does not provide a reason why the results should not be reported for the three omitted outcomes, which were labeled in Table A12 as "Slow the Spread", "Stay Home", and "Too Long to Loosen Restrictions".

It seems to me to be a bad policy to permit researchers to deviate from a pre-analysis plan without justification and to merely report results from a planned analysis on, say, page 46 of a 68-page file on the Dataverse. But a bigger problem might be that, as far as I can tell, many journals don't even attempt to prevent misleading selective reporting for survey research for which there is no pre-analysis plan. Journals could require researchers reporting on surveys to submit or link to the full questionnaire for the surveys or at least to declare that the main text reports on results for all plausible measured outcomes and moderators.

---

3.

Next, let me discuss a method used in Stephens-Dougan 2022 and the correction, which I think is a bad method.

The code for Stephens-Dougan 2022 used measures of stereotypes about Whites and Blacks on the traits of hard working and intelligent, to create a variable called "negstereotype_endorsement". The code divided respondents into three categories, coded 0 for respondents who did not endorse a negative stereotype about Blacks relative to Whites, 0.5 for respondents who endorsed exactly one of the two negative stereotypes about Blacks relative to Whites, and 1 for respondents who endorsed both negative stereotypes about Blacks relative to Whites. For both Stephens-Dougan 2022 and the correction, Figure 3 reported for each reported outcome an estimate of how much the average treatment effect among prejudiced Whites (defined as those coded 1) differed from the average treatment effect among unprejudiced Whites (defined as those coded 0).

The most straightforward way to estimate this difference in treatment effects is to [1] calculate the treatment effect for prejudiced Whites coded 1, [2] calculate the treatment effect for unprejudiced Whites coded 0, and [3] calculate the difference between these treatment effects. The code for Stephens-Dougan 2022 instead estimated this difference using a logit regression that had three predictors: the treatment, the 0/0.5/1 measure of prejudice, and an interaction of the prior two predictors. But, by this method, the estimated difference in treatment effect between the 1 respondents and the 0 respondents depends on the 0.5 respondents. I can't think of a valid reason why responses from the 0.5 respondents should influence an estimated difference between the 0 respondents and the 1 respondents.

See my Stata output file for more on that. The influence of the 0.5 respondents might not be major in most or all cases, but an APSR reader won't know, based on Stephens-Dougan 2022 or its correction, the extent to which the 0.5 respondents influenced the estimates for the comparison of the 0 respondents to the 1 respondents.

Now about those 0.5 respondents…

---

4.

Remember that the Stephens-Dougan 2022 "negative stereotype endorsement" variable has three levels: 0 for the 74% of respondents who did not endorse a negative stereotype about Blacks relative to Whites, 0.5 for the 16% of respondents who endorsed exactly one of the two negative stereotypes about Blacks relative to Whites, and 1 for the 10% of respondents who endorsed both negative stereotypes about Blacks relative to Whites.

The correction indicates that "I discovered an error in the description of the variable, negative stereotype endorsement" and that "there was no error in the code used to create the variable". So was the intent for Stephens-Dougan 2022 to measure racial prejudice so that only the 1 respondents are considered prejudiced? Or was the intent to consider the 0.5 respondents and the 1 respondents to be prejudiced?

The pre-analysis plan seems to indicate a different method for measuring the moderator of negative stereotype endorsement:

The difference between the rating of Blacks and Whites is taken on both dimensions (intelligence and hard work) and then averaged.

But the pre-analysis plan also indicates that:

For racial predispositions, we will use two or three bins, depending on their distributions.

So, even ignoring the plan to average the stereotype ratings, the pre-analysis plan is inconclusive about whether the intent was to use two or three bins. Let's try this passage from Stephens-Dougan 2022:

A nontrivial fraction of the nationally representative sample—26%—endorsed either the stereotype that African Americans are less hardworking than whites or that African Americans are less intelligent than whites.

So that puts the 16% of respondents at the 0.5 level of negative stereotype endorsement into the same bin as the 10% at the 1 level of negative stereotype endorsement. Stephens-Dougan 2022 doesn't report the percentage that endorsed both negative stereotypes about Blacks. Reporting the percentage of 26% is what would be expected if the intent was to place into one bin any respondent who endorsed at least one of the negative stereotypes about Blacks, so I'm a bit skeptical of the claim in the correction that the description is in error and the code was correct. Maybe I'm missing something, but I don't see how someone who intends to have three bins reports the 26% and does not report the 10%.

For another thing, Stephens-Dougan 2022 has only three figures: Figure 1 reports results for racially prejudiced Whites, Figure 2 reports results for non-racially prejudiced Whites, and Figure 3 reports on the difference between racially prejudiced Whites and non-racially prejudiced Whites. Did Stephens-Dougan 2022 intend to not report results for the group of respondents who endorsed exactly one of the negative stereotypes about Blacks? Did Stephens-Dougan 2022 intend to suggest that respondents who rate Blacks as lazier in general than Whites aren't racially prejudiced as long as they rate Blacks equal to or higher than Whites in general on intelligence?

---

5.

Stephens-Dougan 2022 and the correction depict 84% confidence intervals in all figures. Stephens-Dougan 2022 indicated (footnote omitted) that:

For ease of interpretation, I plotted the predicted probability of agreeing with each pandemic measure in Figure 1, with 84% confidence intervals, the graphical equivalent to p < 0.05.

The 84% confidence interval is good for assessing a p=0.05 difference between estimates, but not for assessing at p=0.05 whether an estimate differs from a particular number such as zero. So 84% confidence intervals make sense for Figures 1 and 2, in which the key comparisons are of the control estimate to the treatment estimate. But 84% confidence intervals don't make as much sense for Figure 3, which plot only one estimate and for which the key assessment is whether the estimate differs from zero (Figure 3 in Stephens-Dougan 2022) or from 1 (the correction).

---

6.

I didn’t immediately realize why, in Figure 3 in Stephens-Dougan 2022, two of the four estimates cross zero, but in Figure 3 in the correction, none of the four estimates cross zero. Then I realized that the estimates plotted in Figure 3 of the correction (but not Figure 3 in Stephens-Dougan 2022) are odds ratios.

The y-axis for odds ratios for Figure 3 of the correction ranges from 0 to 30-something, using a linear scale. The odds ratio that indicates no effect is 1, and an odds ratio can't be negative, so that it why none of the four estimates cross zero in the corrected Figure 3.

It seems like a good idea for a plot of odds ratios to have a guideline for 1, so that readers can assess whether an odds ratio indicating no effect is a plausible value. And a log scale seems like a good idea for odds ratios, too. Relevant prior post that mentions that Fenton and Stephens-Dougan 2021 described a "very small" 0.01 odds ratio as "not substantively meaningful".

None of the 84% confidence intervals for Figure 3 capture an odds ratio that crosses 1, but an 84% confidence interval for Figure A3 in "Supplementary Materials Correction-Final.docx" does.

---

7.

Often, when I alert an author or journal to an error in a publication, the subsequent correction doesn't credit me for my work. Sometimes the correction even suggests that the authors themselves caught the error, like the correction to Stephens-Dougan 2022 seems to do:

After reviewing my code, I discovered an error in the description of the variable, negative stereotype endorsement.

I guess it's possible that Stephens-Dougan "discovered" the error. For instance, maybe after she submitted page proofs, for some reason she decided to review her code, and just happened to catch the error that she had missed before, and it's a big coincidence that this was the same error that I blogged about and alerted the APSR to.

And maybe Stephens-Dougan also discovered that her APSR letter misleadingly deviated from the relevant pre-analysis plan, so that I don't deserve credit for alerting the APSR to that.

Tagged with: , , , , , , , ,

PS: Political Science & Politics recently published Hartnett and Haver 2022 "Unconditional support for Trump's resistance prior to Election Day".

Hartnett and Haver 2022 reported on an experiment conducted in October 2020 in which likely Trump voters were asked to consider the hypothetical of a Biden win in the Electoral College and in the popular vote, with a Biden popular vote percentage point win randomly assigned to be from 1 percentage point through 15 percentage points. These likely Trump voters were then asked whether the Trump campaign should resist or concede.

Data were collected before the election, but Hartnett and Haver 2022 did not report anything about a corresponding experiment involving likely Biden voters. Hartnett and Haver 2022 discussed a Reuters/Ipsos poll that "found that 41% of likely Trump voters would not accept a Biden victory and 16% of all likely Trump voters 'would engage in street protests or even violence' (Kahn 2020)". The Kahn 2020 source indicates that the corresponding percentages for Biden voters for a Trump victory were 43% and 22%, so it didn't seem like there was a good reason to not include a parallel experiment for Biden voters, especially because data on only Trump voters wouldn't permit valid inferences about the characteristics on which Trump voters were distinctive.

---

But text for a somewhat corresponding experiment involving likely Biden voters is hidden in the Hartnett and Haver 2022 codebook under white boxes or something like that. The text of the hidden items can be highlighted, copied, and pasted from the bottom of pages 19 and 20 of the codebook PDF (or more hidden text can be copied, using ctrl+A, then ctrl-C, and then pasted with ctrl-V).

The hidden codebook text indicates that the hartnett_haver block of the survey had a "bidenlose" item that asked likely Biden voters whether, if Biden wins the popular vote by the randomized percentage points and Trump wins the electoral college, the Biden campaign should "Resist the results of the election in any way possible" or "Concede defeat".

There might be an innocent explanation for Hartnett and Haver 2022 not reporting the results for those items, but that innocent explanation hasn't been shared with me yet on Twitter. Maybe Hartnett and Haver 2022 have a manuscript in progress about the "bidenlose" item.

---

NOTES

1. Hartnett and Haver 2022 seems to be the survey that Emily Badger at the New York Times referred to as "another recent survey experiment conducted by Brian Schaffner, Alexandra Haver and Brendan Hartnett at Tufts". The copied-and-pasted codebook text indicates that this was for the "2020 Tufts Class Survey".

2. On page 18 of the Hartnett and Haver 2022 codebook, above the hidden item about socialism, part of the text of the "certain advantages" item is missing, which seems to be a should-be-obvious indication that text has been covered.

3. The codebook seems to be missing pages of the full survey: in the copied-and-pasted text, page numbers jump from "Page 21 of 43" to "Page 24 of 43" to "Page 31 of 43" to "Page 33 of 43". Presumably at least some missing items were for other members of the Tufts class, although I'm not sure what happened to page 32, which seems to be part of the hartnett_haver block that started on page 31 and ended on page 33.

4. The dataset for Hartnett and Haver 2022 includes a popular vote percentage point win from 1 percentage point through 15 percentage points assigned to likely Biden voters, but the dataset has no data on a resist-or-concede outcome or on a follow-up open-ended item.

Tagged with: , , , ,

The American Political Science Review recently published a letter: Stephens-Dougan 2022 "White Americans' reactions to racial disparities in COVID-19".

Figure 1 of the Stephens-Dougan 2022 APSR letter reports results for four outcomes among racially prejudiced Whites, with the 84% confidence interval in the control overlapping with the 84% confidence interval in the treatment for only one of the four reported outcomes (zooming in on Figure 1, the confidence intervals for the parks outcome don't seem to overlap, and the code returns 0.1795327 for the upper bound for the control and 0.18800818 for the lower bound for the treatment). And results for the most obviously overlapping 84% confidence intervals seem to be interpreted as sufficient evidence of an effect, with all four reported outcomes discussed in the passage below:

When racially prejudiced white Americans were exposed to the racial disparities information, there was an increase in the predicted probability of indicating that they were less supportive of wearing face masks, more likely to feel their individual rights were being threatened, more likely to support visiting parks without any restrictions, and less likely to think African Americans adhere to social distancing guidelines.

---

There are at least three things to keep track of: [1] the APSR letter, [2] the survey questionnaire, located at the OSF site for the Time-sharing Experiments for the Social Sciences project; and [3] the pre-analysis plan, located at the OSF and in the appendix of the APSR article. I'll use the PDF of the pre-analysis plan. The TESS site also has the proposal for the survey experiment, but I won't discuss that in this post.

---

The pre-analysis plan does not mention all potential outcome variables that are in the questionnaire, but the pre-analysis plan section labeled "Hypotheses" includes the passage below:

Specifically, I hypothesize that White Americans with anti-Black attitudes and those White Americans who attribute racial disparities in health to individual behavior (as opposed to structural factors), will be more likely to disagree with the following statements:

The United States should take measures aimed at slowing the spread of the coronavirus while more widespread testing becomes available, even if that means many businesses will have to stay closed.

It is important that people stay home rather than participating in protests and rallies to pressure their governors to reopen their states.

I also hypothesize that White Americans with anti-Black attitudes and who attribute racial health disparities to individual behavior will be more likely to agree with the following statements:

State and local directives that ask people to "shelter in place" or to be "safer at home" are a threat to individual rights and freedom.

The United States will take too long in loosening restrictions and the economic impact will be worse with more jobs being lost

The four outcomes mentioned in the passage above correspond to items Q15, Q18, Q16, and Q21 in the survey questionnaire, but, of these four outcomes, the APSR letter reported on only Q16.

The outcome variables in the APSR letter are described as: "Wearing facemasks is not important", "Individual rights and freedom threatened", "Visit parks without any restrictions", and "Black people rarely follow social distancing guidelines". These outcome variables correspond to survey questionnaire items Q20, Q16, Q23A, and Q22A.

---

The pre-analysis plan PDF mentions moderators, with three moderators about racial dispositions: racial resentment, negative stereotype endorsement, and attributions for health disparities. The plan indicates that:

For racial predispositions, we will use two or three bins, depending on their distributions. For ideology and party, we will use three bins. We will include each bin as a dummy variable, omitting one category as a baseline.

The APSR letter reported on only one racial predispositions moderator: negative stereotype endorsement.

---

I'll post a link in the notes below to some of my analyses about the "Specifically, I hypothesize" outcomes, but I don't want to focus on the results, because I wanted this post to focus on deviations from the pre-analysis plan, because -- regardless of whether the estimates from the analyses in the APSR letter are similar to the estimates from the planned analyses in the pre-analysis plan -- I think that it's bad that readers can't trust the APSR to ensure that a pre-analysis plan is followed or at least to provide an explanation about why a pre-analysis plan was not followed, especially given that this APSR letter described itself as reporting on "a preregistered survey experiment" and included the pre-analysis plan in the appendix.

---

NOTES

1. The Stephens-Dougan 2022 APSR letter suggests that the negative stereotype endorsement variable was coded dichotomously ("a variable indicating whether the respondent either endorsed the stereotype that African Americans are less hardworking than whites or the stereotype that African Americans are less intelligent than whites"), but the code and the appendix of the APSR letter indicate that the negative stereotype endorsement variable was measured so that the highest level is for respondents who reported a negative relative stereotype about Blacks for both stereotypes. From Table A7:

(unintelligentstereotype 2 + lazystereotype2 )/2

In the data after running the code for the APSR letter, the negative stereotype endorsement variable is a three-level variable coded 0 for respondents who did not report a negative relative stereotype about Blacks for either stereotype, 0.5 for respondents who reported a negative stereotype about Blacks for one stereotype, and 1 for respondents who reported a negative relative stereotype about Blacks for both stereotypes.

2. The APSR letter indicated that:

The likelihood of racially prejudiced respondents in the control condition agreeing that shelter-in-place orders threatened their individual rights and freedom was 27%, compared with a likelihood of 55% in the treatment condition (p < 0.05 for a one-tailed test).

My analysis using survey weights got 44% and 29% among participants who reported a negative relative stereotype about Blacks for at least one of the two stereotype items, and my analysis got 55% and 26% among participants who reported negative relative stereotypes about Blacks for both stereotype items, with a trivial overlap in 84% confidence intervals.

But the 55% and 26% in a weighted analysis were 43% and 37% in an unweighted analysis with a large overlap in 84% confidence intervals, suggesting that at least some of the results in the APSR letter might be limited to the weighted analysis. I ran the code for the APSR letter removing the weights from the glm command and got the revised Figure 1 plot below. The error bars in the APSR letter are described as 84% confidence intervals.

I think that it's fine to favor the weighted analysis, but I'd prefer that publications indicate when results from an experiment are not robust to the application or non-application of weights. Relevant publication.

3. Given the results in my notes [1] and [2], maybe the APSR letter's Figure 1 estimates are for only respondents who reported negative relative stereotype about Blacks for both stereotypes. If so, the APSR letter's suggestion that this population is the 26% that reported anti-Black stereotypes for either stereotype might be misleading, if the Figure 1 analyses were estimated for only the 10% that reported negative relative stereotype about Blacks for both stereotypes.

For what it's worth, the R code for the APSR letter has code that doesn't use the 0.5 level of the negative stereotype endorsement variable, such as:

# Below are code for predicted probabilities using logit model

# Predicted probability "individualrights_dichotomous"

# Treatment group, negstereotype_endorsement = 1

p1.1 <- invlogit(coef(glm1)[1] + coef(glm1)[2] * 1 + coef(glm1)[3] * 1 + coef(glm1)[4] * 1)

It's possible to see what happens to the Figure 1 results when the negative stereotype endorsement variable is coded 1 for respondents who endorsed at least one of the stereotypes. Run this at the end of the Stata code for the APSR letter:

replace negstereotype_endorsement = ceil((unintelligentstereotype2 + lazystereotype2)/2)

Then run the R code for the APSR letter. Below is the plot I got for a revised Figure 1, with weights applied and the sample limited to respondents who endorsed at least one of the stereotypes:

Estimates in the figure above were close to estimates in my analysis using these Stata commands after running the Stata code from the APSR letter. Stata output.

4. Data, Stata code, and Stata output for my analysis about the "Specifically, I hypothesize" passage of the Stephens-Dougan pre-analysis plan.

My analysis in the Stata output had seven outcomes: the four outcomes mentioned in the "Specifically, I hypothesize" part of the pre-analysis plan as initially measured (corresponding to questionnaire items Q15, Q18, Q16, and Q21), with no dichotomization of five-point response scales for Q15, Q18, and Q16; two of these outcomes (Q15 and Q16) dichotomized as mentioned in the pre-analysis plan (e.g., "more likely to disagree" was split into disagree / not disagree categories, with the not disagree category including respondent skips); and one outcome (Q18) dichotomized so that one category has "Not Very Important" and "Not At All Important" and the other category has the other responses and skips, given that the pre-analysis plan had this outcome dichotomized as disagree but response options in the survey were not on an agree-to-disagree scale. Q21 was measured as a dichotomous variable.

The analysis was limited to presumed racially prejudiced Whites, because I think that that's what the pre-analysis plan hypotheses quoted above focused on. Moreover, that analysis seems more important than a mere difference between groups of Whites.

Note that, for at least some results, a p<0.05 treatment effect might be in the unintuitive direction, so be careful before interpreting a p<0.05 result as evidence for the hypotheses.

My analyses aren't the only analyses that can be conducted, and it might be a good idea to combine results across outcomes mentioned in the pre-analysis plan or across all outcomes in the questionnaire, given that the questionnaire had at least 12 items that could serve as outcome variables.

For what it's worth, I wouldn't be surprised if a lot of people who respond to survey items in an unfavorable way about Blacks backlashed against a message about how Blacks were more likely than Whites to die from covid-19.

5. The pre-analysis plan included a footnote that:

Given the results from my pilot data, it is also my expectation that partisanship will moderate the effect of the treatment or that the treatment effects will be concentrated among Republican respondents.

Moreover, the pre-analysis plan indicated that:

The condition and treatment will be blocked by party identification so that there are roughly equal numbers of Republicans and Democrats in each condition.

But the lone mention of "Repub-" in the APSR letter is:

The sample was 39% self-identified Democrats (including leaners) and 46% self-identified Republicans (including leaners).

6. Link to tweets about the APSR letter.

Tagged with: , , , , , , , ,

PLOS ONE recently published Gillooly et al 2021 "Having female role models correlates with PhD students' attitudes toward their own academic success".

Colleen Flaherty at Inside Higher Ed quoted Gillooly et al 2021 co-author Amy Erica Smith discussing results from the article. From the Flaherty story, with "she" being Amy Erica Smith:

"When we showed students a syllabus with a low percentage of women authors, men expressed greater confidence than women in their ability to do well in the class" she said. "When we showed students syllabi with more equal gender representation, men's self-confidence declined, but women and men still expressed equal confidence in their ability to do well. So making the curriculum more fair doesn't actually hurt men relative to women."

Figure 1 of Gillooly et al 2021 presented evidence of this male student backlash, with the figure note indicating that the analysis controlled for "orientations toward quantitative and qualitative methods". Gillooly et al 2021 indicated that these "orientation" measures incorporate respondent ratings of their interest and ability in quantitative methods and qualitative methods.

But the "Grad_Experiences_Final Qualtrics Survey" file indicates that these "orientation" measures appeared on the survey after respondents received the treatment. And controlling for such post-treatment "orientation" measures is a bad idea, as discussed in Montgomery et al 2018 "How Conditioning on Posttreatment Variables Can Ruin Your Experiment and What to Do about It".

The "orientation" items were located on the same Qualtrics block as the treatment and the self-confidence/self-efficacy item, so it seems possible that these "orientation" items might have been intended as outcomes and not as controls. I didn't find any preregistration that indicates the Gillooly et al plan for the analysis.

---

I used the Gillooly et al 2021 data to assess whether there is sufficient evidence that this "male backlash" effect occurs in straightforward analyses that omit the post-treatment controls. The p-value is about p=0.20 for the command...

ologit q14recode treatment2 if female==0, robust

...which tests the null hypothesis that male students' course-related self-confidence/self-efficacy as measured on the five-point scale did not differ by the difference in percentage of women authors on the syllabus.

See the output file below for more analysis. For what it's worth, the data provided sufficient evidence at p<0.05 that, among men students, the treatment affected responses to three of the four items that Gillooly et al 2021 used to construct the "orientation" controls.

---

NOTES

1. Data. Stata code. Output file.

2. Prior post discussing a biased benchmark in research by two of the Gillooly et al 2021 co-authors.

3. Figure 1 of Gillooly et al 2021 reports 76% confidence intervals to help assess a p<0.10 difference between estimates, and Figure 2 of Gillooly et al 2021 reports 84% confidence intervals to help assess a p<0.05 difference between estimates. I would be amazed if this p=0.05 / p=0.10 variation was planned before Gillooly et al analyzed the data.

Tagged with: , , , ,

PS: Political Science & Politics published Utych 2020 "Powerless Conservatives or Powerless Findings?", which responded to arguments in my 2019 "Left Unchecked" PS symposium entry. From Utych 2020:

Zigerell (2019) presented arguments that research supporting a conservative ideology is less likely to be published than research supporting a liberal ideology, focusing on the most serious accusations of ideological bias and research malfeasance. This article considers another less sinister explanation—that research about issues such as anti-man bias may not be published because it is difficult to show conclusive evidence that it exists or has an effect on the political world.

I wasn't aware of the Utych 2020 PS article until I saw a tweet that it was published, but the PS editors kindly permitted me to publish a reply, which discussed evidence that anti-man bias exists and has an effect on the political world.

---

One of the pieces of evidence for anti-man bias mentioned in my PS reply was the Schwarz and Coppock meta-analysis of candidate choice experiments involving male candidates and female candidates. This meta-analysis was accepted at the Journal of Politics, and Steve Utych indicated on Twitter that it was a "great article" and that he was a reviewer of the article. The meta-analysis detected a bias favoring female candidates over male candidates, so I asked Steve Utych whether it is reasonable to characterize the results from the meta-analysis as reasonably good evidence that anti-man bias exists and has an effect in the political realm.

I thought that the exchange that I had with Steve Utych was worth saving (archived: https://archive.is/xFQvh). According to Steve Utych, this great meta-analysis of candidate choice experiments "doesn't present information about discrimination or biases". In the thread, Steve Utych wouldn't describe what he would accept as evidence of anti-man bias in the political realm, but he was willing to equate anti-man bias with alien abduction.

---

Suzanne Schwarz, who is the lead author of the Schwarz and Coppock meta-analysis, issued a series of tweets (archived: https://archive.is/pFSJ0). The thread was locked before I could respond, so I thought that I would blog about my comments on her points, which she labeled "first" through "third".

Her first point, about majority preference, doesn't seem to be relevant about whether anti-man bias exists and has an effect in the political realm.

For her second point, that voting in candidate choice experiments might differ from voting in real elections, I think that it's within reason to dismiss results from survey experiments, and I think that it's within reason to interpret results from survey experiments as offering evidence about the real world. But I think that each person should hold no more than one of those positions at a given time.

So if Suzanne Schwarz doesn't think that the meta-analysis provides evidence about voter behavior in real elections, there might still be time for her and her co-author to remove language from their JOP article that suggests that results from the meta-analysis provide evidence about voter behavior in real elections, such as:

Overall, our findings offer evidence against demand-side explanations of the gender gap in politics. Rather than discriminating against women who run for office, voters on average appear to reward women.

And instead of starting the article with "Do voters discriminate against women running for office?", maybe the article could instead start by quoting language from Suzanne Schwarz's tweets. Something such as:

Do "voters support women more in experiments that simulate hypothetical elections with hypothetical candidates"? And should anyone care, given that this "does not necessarily mean that those voters would support female politicians in real elections that involve real candidates and real stakes"?

I think that Suzanne Schwarz's third point is that a person's preference for A relative to B cannot be interpreted as an "anti" bias against B, without information about that person's attitudinal bias, stereotypes, or animus regarding B.

Suzanne Schwarz claimed that we would not interpret a preference for orange packaging over green packaging as evidence of an "anti-green" bias, but let's use a hypothetical involving people, of an employer who always hires White applicants over equally qualified Black applicants. I think that it would be at least as reasonable to describe that employer as having an anti-Black bias, compared to applying the Schwarz and Coppock language quoted above, to describe that employer as "appear[ing] to reward" White applicants.

---

The Schwarz and Coppock meta-analysis of 67 survey experiments seems like it took a lot of work, was published in one of the top political science journals, and, according to its abstract, was based on an experimental methodology that "[has] become a standard part of the political science toolkit for understanding the effects of candidate characteristics on vote choice", with results that add to the evidence that "voter preferences are not a major factor explaining the persistently low rates of women in elected office".

So it's interesting to see the "doesn't present information about discrimination or biases" and "does not necessarily mean that those voters would support female politicians in real elections that involve real candidates and real stakes" reactions on Twitter archived above, respectively from a peer reviewer who described the work as "great" and from one of the co-authors.

---

NOTES

1. Zach Goldberg and I have a manuscript presenting evidence that anti-man bias exists and has a political effect, based on participant feeling thermometer ratings about men and about women in data from the 2019 wave of the Democracy Fund Voter Study Group VOTER survey. Zach tweeted about a prior version of the manuscript. The idea for the manuscript goes back at least to a Twitter exchange from March 2020 (Zach, me).

Steve Utych reported on the 2019 wave of this VOTER survey in his 2021 Electoral Studies article about sexism against women, but neither his 2021 Electoral Studies article or his PS article questioning the idea of anti-man bias reported results from the feeling thermometer ratings about men and about women.

Tagged with: ,