1.

Politics, Groups, and Identities recently published Cravens 2022 "Christian nationalism: A stained-glass ceiling for LGBT candidates?". The key predictor is a Christian nationalism index that ranges from 0 to 1, with a key result that:

In both cases, a one-point increase in the Christian nationalism index is associated with about a 40 percent decrease in support for both lesbian/gay and transgender candidates in this study.

But the 40 percent estimates are based on Christian nationalism coefficients in models in which Christian nationalism is interacted with partisanship, race, and religion, and I don't think that these coefficients can be interpreted as associations across the sample. The estimates across the sample should be from models in which Christian nationalism is not included in an interaction, of -0.167 for lesbian and gay political candidates and -0.216 for transgender political candidates. So about half of 40 percent.

Check Cravens 2022 Figure 2, which reports results for support for lesbian and gay candidates: eyeballing from the figure, the drop across the range of Christian nationalism is about 14 percent for Whites, about 18 percent for Blacks, about 9 percent for AAPI, and about 15 percent for persons of another race. No matter how you weight these four categories, the weighted average doesn't get close to 40 percent.

---

2.

And I think that the constitutive terms in the interactions are not always correctly described, either. From Cravens 2022:

As the figure shows, Christian nationalism is negatively associated with support for lesbian and gay candidates across all partisan identities in the sample. Christian nationalist Democrats and Independents are more supportive than Christian nationalist Republicans by about 23 and 17 percent, respectively, but the effects of Christian nationalism on support for lesbian and gay candidates are statistically indistinguishable between Republicans and third-party identifiers.

Table 2 coefficients are 0.231 for Democrats and 0.170 for Independents, with Republicans as the omitted category, with these partisan predictors interacted with Christian nationalism. But I don't think that these coefficients indicate the difference between Christian nationalist Democrats/Independents and Christian nationalist Republicans. In Figure 1, Christian nationalist Democrats are at about 0.90 and Christian nationalist Republicans are at about 0.74, which is less than a 0.231 gap.

---

3.

From Cravens 2022:

Christian nationalism is associated with opposition to LGBT candidates even among the most politically supportive groups (i.e., Democrats).

For support for lesbian and gay candidates and support for transgender candidates, the Democrat predictor interacted with Christian nationalism has a p-value less than p=0.05. But that doesn't indicate whether there is sufficient evidence that the slope for Christian nationalism is non-zero among Democrats. In Figure 1, for example, the point estimate for Democrats at the lowest level of Christian nationalism looks to be within the 95% confidence interval for Democrats at the highest level of Christian nationalism.

---

4.

From Cravens 2022:

In other words, a one-point increase in the Christian nationalism index is associated with a 40 percent decrease in support for lesbian and gay candidates. For comparison, an ideologically very progressive respondent is only about four percent more likely to support a lesbian or gay candidate than an ideologically moderate respondent; while, a one-unit increase in church attendance is only associated with a one percent decrease in support for lesbian and gay candidates. Compared to every other measure, Christian nationalism is associated with the largest and most negative change in support for lesbian and gay candidates.

The Christian nationalism index ranges from 0 to 1, so the one-point increase discussed in the passage is the full estimated effect of Christian nationalism. The church attendance predictor runs from 0 to 6, so the one-unit increase in church attendance discussed in the passage is one-sixth the estimated effect of church attendance. The estimated effect of Christian nationalism is still larger than the estimated effect of church attendance when both predictors are put on a 0-to-1 scale, but I don't know of a good reason to compare a one-unit increase on the 0-to-1 Christian nationalism predictor to a one-unit increase on the 0-to-6 church attendance predictor.

The other problem is that the Christian nationalism index combines three five-point items, so it might be a better measure of Christian nationalism than, say, the progressive predictor is a measure of political ideology. This matters because, all else equal, poorer measures of a concept are biased toward zero. Or maybe the ends of the Christian nationalism index represent more distance than the ends of the political ideology measure. Or maybe not. But I think that it's a good idea to discuss these concerns when comparing predictors to each other.

---

5.

Returning to the estimates for Christian nationalism, I'm not even sure that -0.167 for lesbian and gay political candidates and -0.216 for transgender political candidates are good estimates. For one thing, these estimates are extrapolations from linear regression lines, instead of comparisons of observed outcomes at low and high levels of Christian nationalism, so it's not clear whether the linear regression line is correctly estimating the outcome for high levels of Christian nationalism, given that, for each Christian nationalist statement, the majority of the sample falls on the side of the items opposing the statement, so that the estimated effect of Christian nationalism might be more influenced by opponents of Christian nationalism than by supporters of Christian nationalism.

For another thing, I think that the effect of Christian nationalism should be conceptualized as being caused by a change from indifference to Christian nationalism to support for Christian nationalism, which means that including observations from opponents of Christian nationalism might bias the estimated effect of Christian nationalism.

For an analogy, imagine that we are interested in the effect of being a fan of the Beatles. I think that it would be preferable to compare, net of controls, outcomes for fans of the Beatles to outcomes for people indifferent to the Beatles, instead of comparing, net of controls, outcomes for fans of the Beatles to outcomes for people who hate the Beatles. The fan/hate comparison means that the estimated effect of being a fan of the Beatles is *necessarily* the exact same size as the estimated effect of hating the Beatles, but I think that these are different phenomena. Similarly, I think that supporting Christian nationalism is a different phenomenon than opposing Christian nationalism.

---

NOTES

1. Cravens 2022 model 2 regressions in Tables 2 and 3 include controls plus a predictor for Christian nationalism, three partisanship categories plus Republican as the omitted category, three categories of race plus White as the omitted category, and five categories of religion plus Protestant as the omitted category, and interactions of Christian nationalism with the three included partisanship categories, interactions of Christian nationalism with the three included race categories, and interactions of Christian nationalism with the five included religion categories.

It might be tempting to interpret the Christian nationalism coefficient in these regressions as indicating the association of Christian nationalism with the outcome net of controls among the omitted interactions category of White Protestant Republicans, but I don't think that's correct because of the absence of higher-order interactions. Let me discuss a simplified simulation to illustrate this.

The simulation had participants that were either male (male=1) or female (male=0) and participants that were either Republican (gop=1) or Democrat (gop=0). In the simulation, I set the association of a predictor X with the outcome Y to be -1 among female Democrats, to be -3 among male Democrats, to be -6 among female Republicans, and to be -20 among male Republicans. So the association of X with the outcome was negative for all four combinations of gender and partisanship. But the coefficient on X was +2 in a linear regression with predictors only for X, the gender predictor, the partisanship predictor, an interaction of X and the gender predictor, and an interaction of X and the partisanship predictor.

Simulation for the code in Stata and in R.

2. Cravens 2022 indicated about Table 2 that "Model 2 is estimated with three interaction terms". But I'm not sure that's correct, given the interaction coefficients in the table and given that the Figure 1 slopes for Republican, Democrat, Independent, and Something Else are all negative and differ from each other and the Other Christian slope in Figure 3 is positive, which presumably means that there were more than three interaction terms.

3. Appendix C has data that I suspect is incorrectly labeled: 98 percent of atheists agreed or strongly agreed that "The federal government should declare the United States a Christian nation", 94 percent of atheists agreed or strongly agreed that "The federal government should advocate Christian values", and 94 percent of atheists agreed or strongly agreed that "The success of the United States is part of God's plan".

4. I guess that it's not an error per se, but Appendix 2 reports means and standard deviations for nominal variables such as race and party identification, even though these means and standard deviations depend on how the nominal categories are numbered. For example, party identification has a standard deviation of 0.781 when coded from 1 to 4 for Republican, Democrat, Independent, and Other, but the standard deviation would presumably change if the numbers were swapped for Democrat and Republican, and, as far as I can tell, there is no reason to prefer the order of Republican, Democrat, Independent, and Other.

Tagged with: , , , , ,

I posted earlier about Filindra et al 2022 "Beyond Performance: Racial Prejudice and Whites' Mistrust of Government". This post discusses part of the code for Filindra et al 2022.

---

Tables in Filindra et al 2022 have a pair of variables called "conservatism (ideology)" and "conservatism not known" and a pair of variables called "income" and "income not known". For an example of what the "not known" variables are for, if a respondent in the 2016 data did not provide a substantive response to the ideology item, Filindra et al 2022 coded that respondent as 1 in the dichotomous 0-or-1 "conservatism not known" variable and imputed a value of zero for the seven-level "conservatism (ideology)" variable, with zero indicating "extremely liberal".

I don't recall seeing that method before, so I figured I would post about it. I reproduced the Filindra et al. 2022 Table 1 results for the 2016 data and then changed the imputed value for "conservatism (ideology)" from 0 (extremely liberal) to 1 (extremely conservative). That changed the coefficient and t-statistic for the "conservatism not known" predictor but not the coefficient or t-statistic for the "conservatism (ideology)" predictor or for any other predictor (log of the Stata output).

---

I think that it might have been from Schaffner et al 2018 that I picked up the use of categories as a way to not lose observations from an analysis merely because the observation has a missing value for a predictor. For example, if a respondent doesn't indicate their income, then income can be coded as a series of categories with non-response as a category (such as income $20,000 or lower; income $20,001 to $40,000; ...; income $200,001 and higher; and income missing). Thus, in a regression with this categorical predictor for income, observations are not lost merely because of not having a substantive value for income. Another nice feature of this categorical approach is permitting nonuniform associations, in which, for example, the association of income might level off at higher categories.

But dealing with missing values on a control by using categorical predictors can produce long regression output, with, for example, fifteen categories of income, eight categories of ideology, ten categories of age, etc. The Filindra et al 2022 method seems like a reasonable shortcut, as long as it's understood that results for the "not known" predictors depend on the choice of imputed value. But these "not known" predictors aren't common in the research that I read, so maybe there is another flaw in that method that I'm not aware of.

---

NOTE

1. I needed to edit line 1977 in the Filindra et al 2022 code to:

recode V162345 V162346 V162347 V162348 V162349 V162350 V162351 V162352 (-9/-5=.)

Tagged with: ,

Broockman 2013 "Black politicians are more intrinsically motivated to advance Blacks' interests: A field experiment manipulating political incentives" reported results from an experiment in which U.S. state legislators were sent an email from "Tyrone Washington", which is a name that suggests that the email sender is Black. The experimental manipulation was that "Tyrone" indicated that the city that he lived in was a city in the legislator's district or was a well-known city far from the legislator's district.

Based on Table 2 column 2, response percentages were:

  • 56.1% from in-district non-Black legislators
  • 46.4% from in-district Black legislators (= 0.561 - 0.097)
  • 28.6% from out-of-district non-Black legislators (= 0.561 - 0.275)
  • 41.4% from out-of-district Black legislators (= 0.561 - 0.275 + 0.128)

---

Broockman 2013 lacked another emailer to serve as comparison for response rates to Tyrone, such as an emailer with a stereotypical White name. Broockman 2013 discusses this:

One challenge in designing the experiment was that there were so few black legislators in the United States (as of November 2010) that a set of white letter placebo conditions could not be implemented due to a lack of adequate sample size.

So all emails in the Broockman 2013 experiment were signed "Tyrone Washington".

---

But here is how Broockman 2013 was cited by Rhinehar 2020 in American Politics Research:

A majority of this work has explored legislator responsiveness by varying the race or ethnicity of the email sender (Broockman, 2013;...

---

Costa 2017 in the Journal of Experimental Political Science:

As for variables that do have a statistically significant effect, minority constituents are almost 10 percentage points less likely to receive a response than non-minority constituents (p < 0.05). This is consistent with many individual studies that have shown requests from racial and ethnic minorities are given less attention overall, and particularly when the recipient official does not share their race (Broockman, 2013;...

But Broockman 2013 didn't vary the race of the requester, so I'm not sure of the basis for the suggestion that Broockman 2013 provided evidence that requests from racial and ethnic minorities are given less attention overall.

---

Mendez and Grose 2018 in Legislative Studies Quarterly:

Others argue or show, through experimental audit studies, that political elites have biases toward minority constituents when engaging in nonpolicy representation (e.g.,Broockman 2013...

I'm not sure how Broockman 2013 permits an inference of political elite bias toward minority constituents, when the only constituent was Tyrone.

---

Lajevardi 2018 in Politics, Groups, and Identities:

Audit studies have previously found that public officials are racially biased in whether and how they respond to constituent communications (e.g., Butler and Broockman 2011; Butler, Karpowitz, and Pope 2012; Broockman 2013;...

---

Dinesen et al 2021 in the American Political Science Review:

In the absence of any extrinsic motivations, legislators still favor in-group constituents (Broockman 2013), thereby indicating a role for intrinsic motivations in unequal responsiveness.

Again, Tyrone was the only constituent in Broockman 2013.

---

Hemker and Rink 2017 in the American Journal of Political Science:

White officials in both the United States and South Africa are more likely to respond to requests from putative whites, whereas black politicians favor putative blacks (Broockman 2013, ...

---

McClendon 2016 in the Journal of Experimental Political Science:

Politicians may seek to favor members of their own group and to discriminate against members of out-groups (Broockman, 2013...

---

Gell-Redman et al 2018 in American Politics Research:

Studies that explore other means of citizen and legislator interaction have found more consistent evidence of bias against minority constituents. Notably, Broockman (2013) finds that white legislators are significantly less likely to respond to black constituents when the political benefits of doing so were diminished.

But the only constituent was Tyrone, so you can't properly infer bias against Tyrone or minority constituents more generally, because the experiment didn't indicate whether the out-of-district drop-off for Tyrone differed from the out-of-district drop-off for a putative non-Black emailer.

---

Broockman 2014 in the American Journal of Political Science:

Outright racial favoritism among politicians themselves is no doubt real (e.g., Broockman 2013b;...

But who was Tyrone favored more than or less than?

---

Driscoll et al 2018 in the American Journal of Political Science:

Broockman (2013) finds that African American state legislators expend more effort to improve the welfare of black voters than white state legislators, irrespective of whether said voters reside in their districts.

Even ignoring the added description of the emailer as a "voter", response rates to Tyrone were not "irrespective" of district residence. Broockman 2013 even plotted data for the matched case analysis, in which the bar for in-district Black legislators was not longer than the bar for in-district non-Black legislators:

---

Shoub et al 2020 in the Journal of Race, Ethnicity, and Politics:

Black politicians are more likely to listen and respond to black constituents (Broockman 2013),...

The prior context in Shoub et al 2020 suggests that the "more likely" comparison is to non-Black politicians, but this description loses the complication in which Black legislators were not more likely than non-Black legislators to respond to in-district Tyrone, which is especially important if we reasonably assume that in-district Tyrone was perceived to be a constituent and out-of-district Tyrone wasn't. Same problem with Christiani et al 2021 in Politics, Groups, and Identities:

Black politicians are more likely to listen and respond to black constituents than white politicians (Broockman
2013)...

The similar phrasing for the above two passages might be due to the publications having the same group of authors: Shoub Epp Baumgartner Christiani Roach, and Christiani Shoub Baumgartner Epp Roach.

---

Gleason and Stout 2014 in the Journal of Black Studies:

Recent experimental studies conducted by Butler and Broockman (2011) and Broockman (2013) confirm these findings. These studies show that Black elected officials are more likely to help co-racial constituents in and outside of their districts gain access to the ballot more than White elected officials.

This passage, from what I can tell, describes both citations incorrectly: in Broockman 2013, Tyrone was asking for help getting unemployment benefits, and I'm not sure what the basis is for the "in...their districts" claim: in-district response rates were 56.1% from non-Black legislators and 46.4% from Black legislators. The Butler and Broockman 2011 appendix reports results such as DeShawn receiving responses from 41.9%, 22.4%, and 44.0% of Black Democrat legislators when DeShawn respectively asked about a primary, a Republican primary, and a Democratic primary and, respectively, from 54.3%, 56.1%, and 62.1% of White Democrat legislators.

But checking citations to Butler and Broockman 2011 would be another post.

---

NOTES

1. The above isn't a systematic analysis of citations of Broockman 2013, so no strong inferences should be made about the percentage of times Broockman 2013 was cited incorrectly, other than maybe too often, especially in these journals.

2. I think that, for the Broockman 2013 experiment, a different email could have been sent from a putative White person, without sample size concerns. Imagine that "Billy Bob" emailed each legislator asking for help with, say, welfare benefits. If, like with Tyrone, Black legislator response rates were similar for in-district Billy Bob and for out-of-district Billy Bob, that would provide a strong signal to not attribute the similar rates to an intrinsic motivation to advance Blacks' interests. But if the out-of-district drop off in Black legislator response rates was much larger for Billy Bob than for Tyrone, that would provide a strong signal to attribute the similar Black legislator response rates for in-district Tyrone and out-of-district Tyrone to an intrinsic motivation to advance Blacks' interests.

3. I think that the error bars in Figure 1 above might be 50% confidence intervals, given that the error bars seems to match the Stata command "reg code_some treat_out treatXblack leg_black [iweight=cem_weights], level(50)" that I ran on the Broockman 2013 data after line 17 in the Stata do file.

4. I shared this post with David Broockman, who provided the following comments:

Hi LJ,

I think you're right that some of these citations are describing my paper incorrectly and probably meant to cite my 2011 paper with Butler. (FWIW, in that study, we find legislators of all races seem to just discriminate in favor of their race, across both parties, so some of the citations don't really capture that either....)

The experiment would definitely be better with a white control, there was just a bias-variance trade-off here -- adding a putative race of constituent factor in the experiment would mean less bias but more variance. I did the power calculations and didn't think the experiment would be well-powered enough if I made the cells that small and were looking for a triple interaction between legislator race X letter writer putative race X in vs. out of district. In the paper I discuss a few alternative explanations that the lack of a white letter introduces and do some tests for them (see the 3 or 4 paragraphs starting with "One challenge..."). Essentially, I didn't see any reason why we should expect black legislators to just be generically less sensitive to whether a person is in their district, especially given in our previous paper we found they reacted pretty strongly to the race of the email sender (so it's not like the black legislators who do respond to emails just don't read emails carefully). Still, I definitely still agree with what I wrote then that this is a weakness of the study. It would be nice for someone to replicate this study, and I like the idea you have in footnote 2 for doing this. Someone should do that study!

Tagged with: , ,

Research involves a lot of decisions, which in turn provides a lot of opportunities for research to be incorrect or substandard, such as mistakes in recoding a variable, not using the proper statistical method, or not knowing unintuitive elements of statistical software such as how Stata treats missing values in logical expressions.

Peer and editorial review provides opportunities to catch flaws in research, but some journals that publish political science don't seem to be consistently doing a good enough job at this. Below, I'll provide a few examples that I happened upon recently and then discuss potential ways to help address this.

---

Feinberg et al 2022

PS: Political Science & Politics published Feinberg et al 2022 "The Trump Effect: How 2016 campaign rallies explain spikes in hate", which claims that:

Specifically, we established that the words of Donald Trump, as measured by the occurrence and location of his campaign rallies, significantly increased the level of hateful actions directed toward marginalized groups in the counties where his rallies were held.

After Feinberg et al published a similar claim in the Monkey Cage in 2019, I asked the lead author about the results when the predictor of hosting a Trump rally is replaced with a predictor of hosting a Hillary Clinton rally.

I didn't get a response from Ayal Feinberg, but Lilley and Wheaton 2019 reported that the point estimate for the effect on the count of hate-motivated events is larger for hosting a Hillary Clinton rally than for hosting a Donald Trump rally. Remarkably, the Feinberg et al 2022 PS article does not address the Lilley and Wheaton 2019 claim about Clinton rallies, even though the supplemental file for the Feinberg et al 2022 PS article discusses a different criticism from Lilley and Wheaton 2019.

The Clinton rally counterfactual is an obvious way to assess the claim that something about Trump increased hate events. Even if the reviewers and editors for PS didn't think to ask about the Clinton rally counterfactual, that counterfactual analysis appears in the Reason magazine criticism that Feinberg et al 2022 discusses in its supplemental files, so the analysis was presumably available to the reviewers and editors.

Will May has published a PubPeer comment discussing other flaws of the Feinberg et al 2022 PS article.

---

Christley 2021

The impossible "p < .000" appears eight times in Christley 2021 "Traditional gender attitudes, nativism, and support for the Radical Right", published in Politics & Gender.

Moreover, Christley 2021 indicates that (emphasis added):

It is also worth mentioning that in these data, respondent sex does not moderate the relationship between gender attitudes and radical right support. In the full model (Appendix B, Table B1), respondent sex is correlated with a higher likelihood of supporting the radical right. However, this finding disappears when respondent sex is interacted with the gender attitudes scale (Table B2). Although the average marginal effect of gender attitudes on support is 1.4 percentage points higher for men (7.3) than it is for women (5.9), there is no significant difference between the two (Figure 5).

Table B2 of Christley 2021 has 0.64 and 0.250 for the logit coefficient and standard error for the "Male*Gender Scale" interaction term, with no statistical significance asterisks; the 0.64 is the only table estimate without results reported to three decimal places, so it's not clear to me from the table if the asterisks are missing or is the estimate should be, say, 0.064 instead of 0.64. The sample size for the Table B2 regression is 19,587, so a statistically significant 1.4-percentage-point difference isn't obviously out of the question, from what I can tell.

---

Hua and Jamieson 2022

Politics, Groups, and Identities published Hua and Jamieson 2022 "Whose lives matter? Race, public opinion, and military conflict".

Participants were assigned to a control condition with no treatment, to a placebo condition with an article about baseball gloves, or to an article about a U.S. service member being killed in combat. The experimental manipulation was the name of the service member, intended to signal race: Connor Miller, Tyrone Washington, Javier Juarez, Duc Nguyen, and Misbah Ul-Haq.

Inferences from Hua and Jamieson 2022 include:

When faced with a decision about whether to escalate a conflict that would potentially risk even more US casualties, our findings suggest that participants are more supportive of escalation when the casualties are of Pakistani and African American soldiers than they are when the deaths are soldiers from other racial–ethnic groups.

But, from what I can tell, this inference of participants being "more supportive" depending on the race of the casualties is based on differences in statistical significance when each racial condition is compared to the control condition. Figure 5 indicates a large enough overlap between confidence intervals for the racial conditions for this escalation outcome to prevent a confident claim of "more supportive" when comparing racial conditions to each other.

Figure 5 seems to plot estimates from the first column in Table C.7. The largest racial gap in estimates is between the Duc Nguyen condition (0.196 estimate and 0.133 standard error) and the Tyrone Washington condition (0.348 estimate and 0.137 standard error). So this difference in means is 0.152, and I don't think that there is sufficient evidence to infer that these estimates differ from each other. 83.4% confidence intervals would be about [0.01, 0.38] and [0.15, 0.54].

---

Walker et al 2022

PS: Political Science & Politics published Walker et al 2022 "Choosing reviewers: Predictors of undergraduate manuscript evaluations", which, for the regression predicting reviewer ratings of manuscript originality, interpreted a statistically significant -0.288 OLS coefficient for "White" as indicating that "nonwhite reviewers gave significantly higher originality ratings than white reviewers". But the table note indicates that the "originality" outcome variable is coded 1 for yes, 2 for maybe, and 3 for no, so that the "higher" originality ratings actually indicate lower ratings of originality.

Moreover, Walker et al 2022 claims that:

There is no empirical linkage between reviewers' year in school and major and their assessment of originality.

But Table 2 indicates p<0.01 evidence that reviewer major associates with assessments of originality.

And the "a", "b", and "c" notes for Table 2 are incorrectly matched to the descriptions; for example, the "b" note about the coding of the originality outcome is attached to the other outcome.

The "higher originality ratings" error has been corrected, but not the other errors. I mentioned only the "higher" error in this tweet, so maybe that explains that. It'll be interesting to see if PS issues anything like a corrigendum about "Trump rally / hate" Feinberg et al 2022, given that the flaw in Feinberg et al 2022 seems a lot more important.

---

Fattore et al 2022

Social Science Quarterly published Fattore et al 2022 "'Post-election stress disorder?' Examining the increased stress of sexual harassment survivors after the 2016 election". For a sample of women participants, the analysis uses reported experience being sexually harassed to predict a dichotomous measure of stress due to the 2016 election, net of controls.

Fattore et al 2022 Table 1 reports the standard deviation for a presumably multilevel categorical race variable that ranges from 0 to 4 and for a presumably multilevel categorical marital status variable that ranges from 0 to 2. Fattore et al 2022 elsewhere indicates that the race variable was coded 0 for white and 1 for minority, but indicates that the marital status variable is coded 0 for single, 1 for married/coupled, and 2 for separated/divorced/widowed, so I'm not sure how to interpret regression results for the marital status predictor.

And Fattore et al 2022 has this passage:

With 95 percent confidence, the sample mean for women who experienced sexual harassment is between 0.554 and 0.559, based on 228 observations. Since the dependent variable is dichotomous, the probability of a survivor experiencing increased stress symptoms in the post-election period is almost certain.

I'm not sure how to interpret that passage: Is the 95% confidence interval that thin (0.554, 0.559) based on 228 observations? Is the mean estimate of about 0.554 to 0.559 being interpreted as almost certain? Here is the paragraph that that passage is from.

---

Hansen and Dolan 2022

Political Behavior published Hansen and Dolan 2022 "Cross‑pressures on political attitudes: Gender, party, and the #MeToo movement in the United States".

Table 1 of Hansen and Dolan 2022 reported results from a regression limited to 694 Republican respondents in a 2018 ANES survey, which indicated that the predicted feeling thermometer rating about the #MeToo movement was 5.44 units higher among women than among men, net of controls, with a corresponding standard error of 2.31 and a statistical significance asterisk. However, Hansen and Dolan 2022 interpreted this to not provide sufficient evidence of a gender gap:

In 2018, we see evidence that women Democrats are more supportive of #MeToo than their male co-partisans. However, there was no significant gender gap among Republicans, which could signal that both women and men Republican identifiers were moved to stand with their party on this issue in the aftermath of the Kavanaugh hearings.

Hansen and Dolan 2022 indicated that this inference of no significant gender gap is because, in Figure 1, the relevant 95% confidence interval for Republican men overlapped with the corresponding 95% confidence interval for Republican women.

Footnote 9 of Hansen and Dolan 2022 noted that assessing statistical significance using overlap of 95% confidence intervals is a "more rigorous standard" than using a p-value threshold of p=0.05 in a regression model. But Footnote 9 also claimed that "Research suggests that using non-overlapping 95% confidence intervals is equivalent to using a p < .06 standard in the regression model (Schenker & Gentleman, 2001)", and I don't think that this "p < .06" claim is correct or at least not misleading.

My Stata analysis of the data for Hansen and Dolan 2022 indicated that the p-value for the gender gap among Republicans on this item is p=0.019, which is about what would be expected given data in Table 1 of a t-statistic of 5.44/2.31 and more than 600 degrees of freedom. From what I can tell, the key evidence from Schenker and Gentleman 2001 is Figure 3, which indicates that the probability of a Type 1 error using the overlap method is about equivalent to p=0.06 only when the ratio of the two standard errors is about 20 or higher.

This discrepancy in inferences might have been avoided if 83.4% confidence intervals were more commonly taught and recommended by editors and reviewers, for visualizations in which the key comparison is between two estimates.

---

Footnote 10 of Hansen and Dolan 2022 states:

While Fig. 1 appears to show that Republicans have become more positive towards #MeToo in 2020 when compared to 2018, the confidence bounds overlap when comparing the 2 years.

I'm not sure what that refers to. Figure 1 of Hansen and Dolan 2022 reports estimates for Republican men in 2018, Republican women in 2018, Republican men in 2020, and Republican women in 2020, with point estimates increasing in that order. Neither 95% confidence interval for Republicans in 2020 overlaps with either 95% confidence interval for Republicans in 2018.

---

Other potential errors in Hansen and Dolan 2022:

[1] The code for the 2020 analysis uses V200010a, which is a weight variable for the pre-election survey, even though the key outcome variable (V202183) was on the post-election survey.

[2] Appendix B Table 3 indicates that 47.3% of the 2018 sample was Republican and 35.3% was Democrat, but the sample sizes for the 2018 analysis in Table 1 are 694 for the Republican only analysis and 1001 for the Democrat only analysis.

[3] Hansen and Dolan 2022 refers multiple times to predictions of feeling thermometer ratings as predicted probabilities, and notes for Tables 1 and 2 indicate that the statistical significance asterisk is for "statistical significance at p > 0.05".

---

Conclusion

I sometimes make mistakes, such as misspelling an author's name in a prior post. In 2017, I preregistered an analysis that used overlap of 95% confidence intervals to assess evidence for the difference between estimates, instead of a preferable direct test for a difference. So some of the flaws discussed above are understandable. But I'm not sure why all of these flaws got past review at respectable journals.

Some of the flaws discussed above are, I think, substantial, such as the political bias in Feinberg et al 2022 not reporting a parallel analysis for Hillary Clinton rallies, especially with the Trump rally result being prominent enough to get a fact check from PolitiFact in 2019. Some of the flaws discussed above are trivial, such as "p < .000". But even trivial flaws might justifiably be interpreted as reflecting a review process that is less rigorous than it should be.

---

I think that peer review is valuable at least for its potential to correct errors in analyses and to get researchers to report results that they otherwise wouldn't report, such as a robustness check suggested by a reviewer that undercuts the manuscript's claims. But peer review as currently practiced doesn't seem to do that well enough.

Part of the problem might be that peer review at a lot of political science journals combines [1] assessment of the contribution of the manuscript and [2] assessment of the quality of the analyses, often for manuscripts that are likely to be rejected. Some journals might benefit from having a (or having another) "final boss" who carefully reads conditionally accepted manuscripts only for assessment [2], to catch minor "p < .000" types of flaws, to catch more important "no Clinton rally analysis" types of flaws, and to suggest robustness checks and additional analyses.

But even better might be opening peer review to volunteers, who collectively could plausibly do a better job than a final boss could do alone. I discussed the peer review volunteer idea in this symposium entry. The idea isn't original to me; for example, Meta-Psychology offers open peer review. The modal number of peer review volunteers for a publication might be zero, but there is a good chance that I would have raised the "no Clinton rally analysis" criticism had PS posted a conditionally accepted version of Feinberg et al 2022.

---

Another potentially good idea would be for journals or an organization such as APSA to post at least a small set of generally useful advice, such as reporting results for a test for differences between estimates if the manuscript suggests a difference between estimates. More specific advice could be posted by topic, such as, for count analyses, advice about predicting counts in which the opportunity varies by observation: Lilley and Wheaton 2019 discussed this page, but I think that this page has an explanation that is easier to understand.

---

NOTES

1. It might be debatable whether this is a flaw per se, but Long 2022 "White identity, Donald Trump, and the mobilization of extremism" reported correlational results from a survey experiment but, from what I can tell, didn't indicate whether any outcomes differed by treatment.

2. Data for Hansen and Dolan 2022. Stata code for my analysis:

desc V200010a V202183

svyset [pw=weight]

svy: reg metoo education age Gender race income ideology2 interest media if partyid2=="Republican"

svy: mean metoo if partyid2=="Republican" & women==1

3. The journal Psychological Science is now publishing peer reviews. Peer reviews are also available for the journal Meta-Psychology.

4. Regarding the prior post about Lacina 2022 "Nearly all NFL head coaches are White. What are the odds?", Bethany Lacina discussed that with me on Twitter. I have published an update at that post.

5. I emailed or tweeted to at least some authors of the aforementioned publications discussing the planned comments or indicating at least some of the criticism. I received some feedback from one of the authors, but the author didn't indicate that I had permission to acknowledge the author.

Tagged with: , , , , ,

1.

In 2003, Melissa V. Harris-Lacewell wrote that (p. 222):

The defining works of White racial attitudes fail to grapple with the complexities of African American political thought and life. In these studies, Black people are a static object about which White people form opinions.

Researchers still sometimes make it difficult to analyze data from Black participants or don't report interesting data on Black participants. Helping to address this, Darren W. Davis and David C. Wilson have a new book Racial Resentment in the Political Mind (RRPM), with an entire chapter on African Americans' resentment toward Whites.

RRPM is a contribution to research on Black political attitudes, and its discussion of measurement of Whites' resentment toward Blacks is nice, especially for people who don't realize that standard measures of "racial resentment" aren't good measures of resentment. But let me discuss some elements of the book that I consider flawed.

---

2.

RRPM draws, at a high level, a parallel between Whites' resentment toward Blacks and Blacks' resentment toward Whites (p. 242):

In essence, the same model of a just world and appraisal of deservingness that guides Whites' racial resentment also guides African Americans' racial resentment.

That seems reasonable, to have the same model for resentment toward Whites and resentment toward Blacks. But RRPM proposes different items for a battery of resentment toward Blacks and for a battery of resentment toward Whites, and I think that different batteries for each type of resentment will undercut comparison of the size of the effects of these two different resentments, because one battery might capture true resentment better than another battery.

Thus, especially for general surveys such as the ANES that presumably can't or won't devote space to batteries measuring resentments tailored to each racial group, it might be better to measure resentment toward various groups with generalizable items such as agreement/disagreement to statements such as "Whites have gotten more than they deserve" and "Blacks have gotten more than they deserve", which hopefully would produce more valid comparisons of the estimated effect of resentments toward different groups, compared to comparison of batteries of different items.

---

3.

RRPM suggests that all resentment batteries not be given to all respondents (p. 241):

A clear outcome of this chapter is that African Americans should not be presented the same classic racial resentment survey items that Whites would answer (and perhaps vice versa)...

And from page 30:

African Americans and Whites have different reasons to be resentful toward each other, and each group requires a unique set of measurement items to capture resentment.

But not giving participants items measuring resentment of their own racial group doesn't seem like a good idea, because a White participant could think that Whites have received more than they deserve on average, and a Black participant could think that Blacks have received more than they deserve on average, so that omitting White resentment of Whites and similar measures could plausibly bias estimates of the effect of resentment, if resentment of one's own racial group influences a participant's attitudes about political phenomena.

---

RRPM discusses asking Blacks to respond to racial resentment items toward Blacks: "No groups other than African Americans seem to be asked questions about self-hate" (p. 249). RRPM elsewhere qualifies this with "rarely": "That is, asking African Americans to answer questions about disaffection toward their own group is a task rarely asked of other groups"  (p. 215).

The ANES 2016 pilot study did ask White participants about White guilt (e.g., "How guilty do you feel about the privileges and benefits you receive as a white American?") without asking any other racial groups about parallel guilt. Moreover, the CCES had (in 2016 and 2018 at least) an agree/disagree item asked of Whites and others that "White people in the U.S. have certain advantages because of the color of their skin", with no equivalent item about color-of-skin advantages for people who are not White.

But even if Black participants disproportionately receive resentment items directed at Blacks, the better way to address this inequality and to understand racial attitudes is to add resentment items directed at other groups.

---

4.

RRPM seems to suggest an asymmetry in that only Whites' resentment is normatively bad (p. 25):

In the end, African Americans' quest for civil rights and social justice is resented by Whites, and Whites' maintenance of their group dominance is resented by African Americans.

Davis and Wilson discussed RRPM in a video on the UC Public Policy Channel, with Davis suggesting that "a broader swath of citizens need to be held accountable for what they believe" (at 6:10) and that "...the important conversation we need to have is not about racists. Okay. We need to understand how ordinary American citizens approach race, approach values that place them in the same bucket as racists. They're not racists, but they support the same thing that racists support" (at 53:37).

But, from what I can tell, the ordinary American citizens in the same bucket as racists don't seem to be, say, people who support hiring preferences for Blacks for normatively good reasons and just happen to have the same policy preferences as people who support hiring preferences for Blacks because of racism against Whites. Instead, my sense is that the racism in question is limited to racism that causes racial inequality: David C. Wilson at 3:24 in the UC video:

And so, even if one is not racist, they can still exacerbate racial injustice and racial inequality by focusing on their values rather than the actual problem and any solutions that might be at bay to try and solve them.

---

Another apparent asymmetry is that RRPM mentions legitimizing racial myths throughout the book (vii, 3, 8, 21, 23, 28, 35, 47, 48, 50, 126, 129, 130, 190, 243, 244, 247, 261, 337, and 342), but legitimizing racial myths are not mentioned in the chapter on African Americans' resentment toward Whites (pp. 214-242). RRPM page 8 figure 1.1 is model of resentment that has an arrow from legitimizing racial myths to resentment, but RRPM doesn't indicate what, if any, legitimizing racial myths inform resentment toward Whites.

Legitimizing myths are conceptualized on page 8 as follows:

Appraisals of deservingness are shaped by legitimizing racial myths, which are widely shared beliefs and stereotypes about African Americans and other minorities that justify their mistreatment and low status. Legitimizing myths are any coherent set of socially accepted attitudes, beliefs, values, and opinions that provide moral and intellectual legitimacy to the unequal distribution of social value (Sidanius, Devereux, and Pratto 1992).

But I don't see why legitimizing myths couldn't add legitimacy to unequal *treatment*. Presumably resentment flows from beliefs about the causes of inequality, so Whites as a/the main/the only cause of Black/White inequality could serve as a belief that legitimizes resentment toward Whites and, consequently, discrimination against Whites.

---

5.

The 1991 National Race and Politics Survey had a survey experiment, asking for agreement/disagreement to the item:

In the past, the Irish, the Italians, the Jews and many other minorities overcame prejudice and worked their
way up.

Version 1: Blacks...
Version 2: New immigrants from Europe...

...should do the same without any special favors?

This experiment reflects the fact that responses to items measuring general phenomena applied to a group might be influenced by the general phenomena and/or the group.

Remarkably, the RRPM measurement of racial schadenfreude (Chapter 7) does not address this ambiguity, with items measuring participant feelings about only President Obama, such as the schadenfreude felt by "Barack Obama's being identified as one of the worst presidents in history". At least RRPM realizes this (p. 206):

Without a more elaborate research design, we cannot really determine whether the schadenfreude experienced by Republicans is due to his race or to some other issue.

---

6.

For an analysis of racial resentment in the political mind, RRPM remarkably doesn't substantively consider Asians, even if only as a target of resentment to help test alternate explanations about the cause of resentment, given that, like Whites, Asians on average have relatively positive outcomes in income and related measures, but do not seem to be blamed for U.S. racial inequality as much as Whites are.

---

NOTES

1. From RRPM (p. 241):

When items designed on one race are automatically applied to another race under the assumption of equal meaning, it creates measurement invariance.

Maybe the intended meaning is something such as "When items designed on one race are automatically applied to another race, it assumes measurement invariance".

2. RRPM Figure 2.1 (p. 68) reports how resentment correlates with feeling thermometer ratings about Blacks and with feeling thermometer ratings about Whites, but not with the more intuitive measure of the *difference* in feeling thermometer ratings about Blacks and about Whites.

Tagged with: , , , ,

I posted earlier about Jardina and Piston 2021 "The Effects of Dehumanizing Attitudes about Black People on Whites' Voting Decisions".

Jardina and Piston 2021 limited the analysis to White respondents, even though the Qualtrics_BJPS dataset at the Dataverse page for Jardina and Piston 2021 contained observations for non-White respondents. The Qualtrics_BJPS dataset had variables such as aofmanpic_1 and aofmanpic_6, and I didn't know which of these variables corresponded to which target groups.

My post indicated a plan to follow up if I got sufficient data to analyze responses from non-White participants. Replication code has now been posted at version 2 of the Dataverse page for Jardina and Piston 2021, so this is that planned post.

---

Version 2 of the Jardina and Piston 2021 Dataverse page has a Qualtrics dataset (Qualtrics_2016_BJPS_raw) that differs from the version 1 Qualtrics dataset (Qualtrics_BJPS): for example, the version 2 Qualtrics dataset doesn't contain data for non-White respondents, doesn't contain respondent ID variables V1 and uid, and doesn't contain variables such as aofmanpic_2.

I ran the Jardina and Piston 2021 "aofman" replication code on the Qualtrics_BJPS dataset to get a variable named "aofmanwb". In the version 2 dataset, this produced the output for the Trump analysis in Table 1 of Jardina and Piston 2021, so this aofmanwb variable is the "Ascent of man" dehumanization measure, coded so that rating Blacks as equally evolved as Whites is 0.5, rating Whites as more evolved than Blacks runs from just above 0.5 to 1, and rating Blacks more evolved than Whites runs from just under 0.5 down to zero.

The version 2 replication code for Jardina and Piston 2021 suggests that aofmanpic_1 is for rating how evolved Blacks are and aofmanpic_4 is for rating how evolved Whites are. So unless these variable names were changed between versions of the dataset, the version 2 replication code should produce the "Ascent of man" dehumanization measure when applied to the version 1 dataset, which is still available at the Jardina and Piston 2021 Dataverse page.

To check, I ran commands such as "reg aofmanwb ib4.ideology if race==1 & latino==2" in both datasets, and got similar but not exact results, with the difference presumably due to the differences between datasets discussed in the notes below.

---

The version 1 Qualtrics dataset didn't contain a variable that I thought was a weight variable, so my analyses below are unweighted.

In the version 1 dataset, the medians of aofmanwb were 0.50 among non-Latino Whites in the sample (N=450), 0.50 among non-Latino Blacks in the sample (N=98), and 0.50 among respondents coded Asian, Native American, or Other (N=125). Respective means were 0.53, 0.48, and 0.51.

Figure 1 of Jardina and Piston 2021 mentions the use of sliders to select responses to the items about how evolved target groups are, and I think that some unequal ratings might be due to respondent imprecision instead of an intent to dehumanize, such as if a respondent intended to select 85 for each group in a pair, but moved the slider to 85 for one group and 84 for the other group, and then figured that this was close enough. So I'll report percentages below with a strict dehumanization definition of anything differing from 0.5 on the 0-to-1 scale as dehumanization, but I'll also report percentages with a tolerance for potential unintentional dehumanization.

---

For the strict coding of dehumanization, I recoded aofmanwb into a variable that had levels for [1] rating Blacks as more evolved than Whites, [2] equal ratings of how evolved Blacks and Whites are, and [3] rating Whites as more evolved than Blacks.

In the version 1 dataset, 13% of non-Latino Whites in the sample rated Blacks more evolved than Whites, with an 83.4% confidence interval of [11%, 16%], and 39% rated Whites more evolved than Blacks [36%, 43%]. 42% of non-Latino Blacks in the sample rated Blacks more evolved than Whites [35%, 49%], and 23% rated Whites more evolved than Blacks [18%, 30%]. 19% of respondents not coded Black or White in the sample rated Blacks more evolved than Whites [15%, 25%], and 38% rated Whites more evolved than Blacks [32%, 45%].

---

For the non-strict coding of dehumanization, I recoded aofmanwb into a variable that had levels that included [1] rating Blacks at least 3 units more evolved than Whites on a 0-to-100 scale, and [5] rating Whites at least 3 units more evolved than Blacks on a 0-to-100 scale.

In the version 1 dataset, 8% of non-Latino Whites in the sample rated Blacks more evolved than Whites [7%, 10%], and 30% rated Whites more evolved than Blacks [27%, 34%]. 34% of non-Latino Blacks in the sample rated Blacks more evolved than Whites [27%, 41%], and 21% rated Whites more evolved than Blacks [16%, 28%]. 13% of respondents not coded Black or White in the sample rated Blacks more evolved than Whites [9%, 18%], and 31% rated Whites more evolved than Blacks [26%, 37%].

---

NOTES

1. Variable labels in the Qualtrics dataset ("male" coded 0 for "Male" and 1 for "Female") and associated replication commands suggest that Jardina and Piston 2021 might have reported results for a "Female" variable coded 1 for male and 0 for female, which would explain why Table 1 Model 1 of Jardina and Piston 2021 indicates that females were predicted to have higher ratings about Trump net of controls at p<0.01 compared to males, even though the statistically significant coefficients for "Female" in the analyses from other datasets in Jardina and Piston 2021 are negative when predicting positive outcomes for Trump.

The "Female" variable in Jardina and Piston 2021 Table 1 Model 1 is right above the statistically significant coefficient and standard error for age, of "0.00" and "0.00". The table note indicates that "All variables are transformed onto a 0 to 1 scale.", but that isn't correct for the age predictor, which ranges from 19 to 86.

2. I produced a plot like Jardina and Piston 2021 Figure 3, but with a range from most dehumanization of Whites relative to Blacks to most dehumanization of Blacks relative to Whites. The 95% confidence interval for Trump ratings at most dehumanization of Whites relative to Blacks did not overlap with the 95% confidence interval for Trump ratings at no / equal dehumanization of Whites and Blacks. But, as indicated in my later analyses, that might merely be due to the Jardina and Piston 2021 use of aofmanwb as a continuous predictor: the aforementioned inference wasn't supported using 83.4% confidence intervals when the aofmanwb predictor was trichotomized as described above.

3. Regarding differences between Qualtrics datasets posted to the Jardina and Piston 2021 Dataverse page, the Stata command "tab race latino, mi" returns 980 respondents who selected "White" for the race item and "No" for the Latino item in the version 1 Qualtrics dataset, but returns 992 respondents who selected "White" for the race item and "No" for the Latino item in the version 2 Qualtrics dataset.

Both version 1 and version 2 of the Qualtrics datasets contain exactly one observation with a 1949 birth year and a state of Missouri. In both datasets, this observation has codes that indicate a White non-Latino neither-liberal-nor-conservative male Democrat with some college but no degree who has an income of $35,000 to $39,999. That observation has values of 100 for aofmanvinc_1 and 100 for aofmanvinc_4 in the version 2 Qualtrics dataset, but, in the version 1 Qualtrics dataset, that observation has no numeric values for aofmanvinc_1, aofmanvinc_4, or any other variable starting with "aofman".

I haven't yet received an explanation about this from Jardina and/or Piston.

4. Below is a description of more checking about whether aofmanwb is correctly interpreted above, given that the Dataverse page for Jardina and Piston 2021 doesn't have a codebook.

I dropped all cases in the original dataset not coded race==1 and latino==2. Case 7 in the version 2 dataset is from New York, born in 1979, has an aofmanpic_1 of 84 , and an aofmanpic_4 of 92; this matches Case 7 in the version 1 dataset when dropping aforementioned cases. Case 21 in the version 1 dataset is from South Carolina, born in 1966, has an aofmanvinc_1 of 79, and an aofmanvinc_4 of 75; this matches Case 21 in the version 2 dataset when dropping aforementioned cases. Case 951 in the version 1 dataset is from Georgia, born in 1992, has an aofmannopi_1 of 77, and an aofmannopi_4 of 65; this matches case *964* in the version 2 dataset when dropping aforementioned cases.

5. From what I can tell, for anyone interested in analyzing the data, thermind_2 in the version 2 dataset is the feeling thermometer about Donald Trump, and thermind_4 is the feeling thermometer about Barack Obama.

6. Stata code and output from my analysis.

Tagged with: ,

The Monkey Cage recently published "Nearly all NFL head coaches are White. What are the odds?" [archived], by Bethany Lacina.

Lacina reported analyses that compared observed racial percentages of NFL head coaches to benchmark percentages that are presumably intended to represent what racial percentages of NFL head coaches would occur absent racial bias. For example, Lacina compared the percentage of Whites among NFL head coaches hired since February 2021 (8 of 10, or 80%) to the percentage of Whites among the set of NFL offensive coordinators, defensive coordinators, and recently fired head coaches (which was between 70% and 80% White).

Lacina indicated that:

If the hiring process did not favor White candidates, the chances of hiring eight White people from that pool is only about one in four — or plus-322 in sportsbook terms.

I think that Lacina might have reported the probability that *exactly* eight of the ten recent NFL coach hires were White. But for assessing unfair bias favoring White candidates, it makes more sense to report the probability that *at least* eight of the ten recent NFL coach hires were White: that probability is 38% using a 70% White pool and is 67% using an 80% White pool. See Notes 1 through 3 below.

---

Lacina also conducted an analysis for the one Black NFL head coach among the 14 NFL head coaches in 2021 to 2022 who were young enough to have played in the NCAA between 1999 and 2007, given that demographic data from her source were available starting in 1999. Benchmark percentages were 30% Black from NCAA football players and 44% Black from NCAA Division I football players.

The correctness of Lacina's calculations for this analysis doesn't seem to matter, because the benchmark does not seem to be a reasonable representation of how NFL head coaches are selected. For example, quarterback is the most important player position, and quarterbacks presumably need to know football strategy relatively well compared to players at most or all other positions, so I think that the per capita probability of a college quarterback becoming an NFL head coach is likely nontrivially higher than the per capita probability of players from other positions becoming an NFL head coach; however, Lacina's benchmark doesn't adjust for player position.

---

None of the above analysis should be interpreted to suggest that selection of NFL head coaches has been free from racial bias. But I think that it's reasonable to suggest that the Lacina analysis isn't very informative either way.

---

NOTES

1. Below is R code for a simulation that returns a probability of about 24%, for the probability that *exactly* eight of ten candidates are White, drawn without replacement from a candidate pool of 32 offensive coordinators and 32 defensive coordinators that is overall 70% White:

SET  <- c(rep_len(1,45),rep_len(0,19))
LIST <- c()
for (i in 1:100000){
   LIST[i] <- sum(sample(SET,10,replace=F))
}
table(LIST)
length(LIST[LIST==8])/length(LIST)

The probability is about 32% if the pool of 64 is 80% White. Adding in a few recently fired head coaches doesn't change the percentage much.

2. In reality, 8 White candidates were hired for the 10 NFL head coaching positions. So how do we assess the extent to which this observed result suggests unfair bias in favor of White candidates? Let's first get results from the simulation...

For my 100,000-run simulation using the above code and a random seed of 123, the simulation produced exactly zero White head coaches zero times, exactly 1 White head coach 5 times, exactly 2 White head coaches 52 times, exactly 3 White head coaches 461 times, exactly 4 White head coaches 2654 times, exactly 5 White head coaches 9255 times, exactly 6 White head coaches 20987 times, exactly 7 White head coaches 29307 times, exactly 8 White head coaches 24246 times, exactly 9 White head coaches 10978 times, and exactly 10 White head coaches 2055 times.

The simulation indicated that, if candidates were randomly drawn from a 70% White pool, exactly 8 of 10 coaches would be White about 24% of the time (24,246/100,000). This 8-of-10 result represents a selection of candidates from the pool that is perfectly fair with no evidence of bias for *or against* White candidates.

The 8-of-10 result would be the proper focus if our interest were bias for *or against* White candidates. But the Lacina post didn't seem concerned about evidence of bias against White candidates, so the 9 White of 10 simulation result and the 10 White of 10 simulation result should be added to the totals to get 37%: the 9 of 10 and 10 of 10 represent simulated outcomes in which White candidates were underrepresented in reality relative to that outcome from the simulation. So the 8 of 10 represents no bias and the 9 of 10 and the 10 of 10 represent bias against Whites, so that everything else represents bias favoring Whites.

3. Below is R code for a simulation that returns a probability of about 37%, for the probability that *at least* eight of ten candidates are White, drawn with replacement from a candidate pool of 32 offensive coordinators and 32 defensive coordinators that is overall 70% White:

SET <- c(rep_len(1,45),rep_len(0,19))
LIST <- c()
for (i in 1:100000){
   LIST[i] <- sum(sample(SET,10,replace=F))
}
table(LIST)
length(LIST[LIST>=8])/length(LIST)

---

UPDATE

I corrected some misspellings of "Lacinda" to "Lacina" in the post.

---

UPDATE 2 (March 18, 2022)

Bethany Lacina discussed with me her calculation. She indicated that she did calculate at least eight of ten, but she used a joint probability method that I don't think is correct because random error would bias the inference toward unfair selection of coaches by race. Given the extra information that Bethany provided, here is a revised calculation that produces a probability of about 60%:

# In 2021: 2 non-Whites hired of 6 hires.
# In 2022: 0 non-Whites hired of 4 hires (up to the point of the calculation).
# The simulation below is for the probability that at least 8 of the 10 hires are White.

SET.2021 <- c(rep_len(0,12),rep_len(1,53)) ## 1=White candidate
SET.2022 <- c(rep_len(0,20),rep_len(1,51)) ## 1=White candidate
LIST <- c()

for (i in 1:100000){
DRAW.2021 <- sum(sample(SET.2021,6,replace=F)) 
DRAW.2022 <- sum(sample(SET.2022,4,replace=F)) 
LIST[i] <- DRAW.2021 + DRAW.2022
}

table(LIST)
length(LIST[LIST>=8])/length(LIST)
Tagged with: , ,