The plot below is based on data from the ANES 2022 Pilot Study, plotting the percentage of particular populations that rated the in-general intelligence of Whites higher than the in-general intelligence of Blacks (black dots) and the percentage of these populations that rated the in-general intelligence of Asians higher than the in-general intelligence of Whites (white dots). For the item wording, see the notes below or page 44 of the questionnaire.

My understanding is that, based on a straightforward / naïve interpretation of educational data such as NAEP scores as good-enough measures of intelligence [*], there isn't much reason to be in the white dot and not in the black dot or vice versa. But, nonetheless, there is a gap between dots in the overall population and in certain populations.

In the plot above, estimated percentages are similar among very conservative Whites and among U.S. residents who attributed to biological differences at least some of the Black-American/Hispanic-American-vs-White-American difference in outcomes in things such as jobs and income. But similar percentages can mask inconsistencies.

For example, among U.S. residents who attributed to biological differences at least some of the Black-American/Hispanic-American-vs-White-American difference in outcomes in things such as jobs and income, about 37% rated Asians' intelligence higher than Whites' intelligence, about 34% rated Whites' intelligence higher than Blacks' intelligence, but only about 14% fell into both of these groups, as illustrated in the second panel below:

The plot below indicates corresponding comparisons for the estimated percentages that rated the in-general intelligence of Whites higher than the in-general intelligence of Blacks (black dots) and the percentage of these populations that rated the in-general intelligence of Asians higher than the in-general intelligence of Blacks (white dots).

---

[*] I can imagine reasons to not be in one or both dots, such as perceptions about the influence of past or present racial discrimination, the relative size of the gaps, flaws in the use of educational data as measures of intelligence, and imperfections in the wording of the ANES item. But I nonetheless thought that it would be interesting to check respondent ratings about racial group intelligence.

---

NOTES

1. Relevant item wording from the ANES 2022 Pilot Study:

Next, we're going to show you a seven-point scale on which the characteristics of the people in a group can be rated. In the first statement a score of '1' means that you think almost all of the people in that group tend to be intelligent.' A score of '7' means that you think most people in the group are 'unintelligent.' A score of '4' means that you think that most people in the group are not closer to one end or the other, and of course, you may choose any number in between. Where would you rate each group in general on this scale?

2. The ANES 2022 Pilot Study had a parallel item about Hispanic-Americans that I didn't analyze, to avoid complicating the presentation.

3. In the full sample, weighted, 13% rated in-general Black intelligence higher than in-general White intelligence (compared to 25% the other way), 8% rated in-general Black intelligence higher than in-general Asian intelligence (compared to 38% the other way), and 10% rated in-general White intelligence higher than in-general Asian intelligence (compared to 35% the other way). Respective equal ratings of in-general intelligence were 62% White/Black, 54% Asian/Black, and 55% Asian/White.

Respondents were coded into a separate category if the respondent didn't provide a rating of intelligence for at least one of the racial groups in a comparison, but almost all respondents provided a rating of intelligence for each racial group.

4. Plots created with R packages: tidyverse, waffle, and patchwork.

5. Data for the ANES 2022 Pilot Study. Stata code and output for my analysis.

6. An earlier draft of the first plot is below, which I didn't like as much, because I thought that it was too wide and not as visually attractive:

7. The shading in the plot below is intended to emphasize the size of the gaps between the estimates within a population, with red indicating reversal of the typical pattern:

8. Plot replacing the legend with direct labels:

9. Bonus plot, while I'm working on visualizations, with this plot comparing ratings about men and women on 0-to-100 feeling thermometers, with confidence intervals for each category, as if the category were plotted as its own percentage:

Tagged with: , , , , ,

Political Research Quarterly published Garcia and Sadhwani 2022 "¿Quien importa? State legislators and their responsiveness to undocumented immigrants", about an experiment in which state legislators were sent messages, purportedly from a Latinx person such as Juana Martinez or from an Eastern European person such as Anastasia Popov, with message senders describing themselves as "residents", "citizens", or "undocumented immigrants".

I'm not sure of the extent to which response rates to the purported undocumented immigrants were due to state legislative offices suspecting that this was yet another audit study. Or maybe it's common for state legislators to receive messages from senders who invoke their undocumented status, as in this experiment ("As undocumented immigrants in your area we are hoping you can help us").

But that's not what this post is about.

---

1.

Garcia and Sadhwani 2022 Table 1 Model 2 reports estimates from a logit regression predicting whether a response was received from the state legislator, with predictors such as legislative professionalism. The coefficient was positive for legislative professionalism, indicating that, on average and other model variables held constant, legislators from states with higher levels of legislative professionalism were more likely to respond, compared to legislators from states with lower levels of legislative professionalism.

Another Model 2 predictor was "state", which had a coefficient of 0.007, a standard error of 0.002, and three statistical significance asterisks indicating that, on average and other model variables held constant -- what? -- legislators from states with more "state-ness" were more likely to respond? I'm pretty sure that this "state" predictor was coded with states later in the alphabet such as Wyoming assigned a higher number than states earlier in the alphabet such as Alabama. I don't think makes any sense as a predictor of response rates, but the predictor was statistically significant, so that's interesting.

The "state" variable was presumably meant to be included as a categorical predictor, based on the Garcia and Sadhwani 2022 text (emphasis added):

For example, we include the Squire index for legislative professionalism (Squire 2007), the chamber in which the legislator serves, and a fixed effects variable for states.

I think this is something that a peer reviewer or editor should catch, especially because Garcia and Sadhwani 2022 doesn't report that many results in tables or figures.

---

2.

Garcia and Sadhwani 2022 Table 1 model 2 omits the sender category of undocumented Latinx, so that results for the five included sender categories can be interpreted relative to omitted sender category of undocumented Latinx. So far so good.

But then Garcia and Sadhwani 2022 interprets the other predictors as applying to only the omitted sender category of undocumented Latinx, such as (sic for "respond do a request"):

To further examine the potential impact of sentiments toward immigrants and immigration at the state level, we included a variable ("2012 Romney states") to examine if legislators in states that went to Romney in the 2012 presidential election were less likely to respond do a request from an undocumented immigrant. We found no such relationship in the data.

This apparent misinterpretation appears in the abstract (emphasis added):

We found that legislators respond less to undocumented constituents regardless of their ethnicity and are more responsive to both the Latinx and Eastern European-origin citizen treatments, with Republicans being more biased in their responsiveness to undocumented residents.

I'm interpreting that emphasized part to mean that the Republican legislator gap in responsiveness to undocumented constituents compared to citizen constituents was larger than the non-Republican legislator gap in responsiveness to undocumented constituents compared to citizen constituents. And I don't think that's correct based on the data for Garcia and Sadhwani 2022.

My analysis used an OLS regression to predict whether a legislator responded, with only a predictor for "undocCITIZ" coded 1 for undocumented senders and 0 for citizen senders. Coefficients were -0.07 among Republican legislators and -0.11 among non-Republican legislators, so the undocumented/citizen gap was not larger among Republican legislators compared to non-Republican legislators. Percentage responses are in the table below:

SENDER         GOP NON-GOP 
Citizen EEurop 21  23
Citizen Latina 26  29
Control EEurop 25  33
Control Latina 18  20
Undocum EEurop 18  12
Undocum Latina 15  17
OVERALL        20  22

---

NOTE

1. No response yet to my Nov 17 tweet to a co-author of Garcia and Sadhwani 2022.

Tagged with: , , , ,

The Journal of Politics recently published Butler et al 2022 "Constituents ask female legislators to do more".

---

1. PREREGISTRATION

The relevant preregistration plan for Butler et al 2022 has an outcome that the main article does not mention, for the "Lower Approval for Women" hypothesis. Believe it or not, the Butler et al 2022 analysis didn’t find sufficient evidence in its "Lower Approval for Women" tests. So instead of reporting that in the JOP article or its abstract or its title, Butler et al mentioned the insufficient evidence in appendix C of the online supplement to Butler et al 2022.

---

2. POSSIBLE ERROR FOR THE APPROVAL HYPOTHESIS

The Butler et al 2022 online appendix indicates that the dependent variable for Table C2 is a four-point scale that was predicted using ordered probit. Table C2 reports results for four cut points, even though a four-point dependent variable should have only three cut points. The dependent variable was drawn from a 5-point scale in which the fifth point was "Not sure", so I think that someone forgot to recode the "Not sure" responses to missing.

Butler et al 2022 online appendix C indicates that:

Constituents chose among 5 response options for the question: Strongly approve, Somewhat approve, Somewhat disapprove, Strongly disapprove, Not sure.

So I think that the "Not sure" responses were coded as if being not sure was super strongly disapprove.

---

3. PREREGISTRATION + RESEARCH METHOD

The image below has a tabulation of the dependent variable for the preregistered hypothesis of Butler et al 2022 that is reported in the main text, the abstract, and the title:

That's a very large percentage of zeros.

The Butler et al 2022 experiment involved male legislators and female legislators sending letters to constituents asking the constituents to complete an online survey, and, in that online survey, the legislator asked "What policy issues do you think I should work on during the current session?".

Here is a relevant passage from the Butler et al 2022 preregistration reported in the online appendix, with my emphasis added and [sic] for "...condition the code...":

Coding the Dependent Variable. This would be an open-ended question where voters could list multiple issues. We will have RAs who are blind to the hypothesis and treatment condition the code the number of issues given in the open response. We will use that number as the dependent variable. We will then an OLS regression where the DV is the number of issues and the IV is the gender treatment.

That passage seems to indicate that the dependent variable was preregistered to be a measure about what constituents provided in the open response. From what I can tell based on the original coding of the "NumberIssues" dependent variable, the RAs coded 14 zeros based on what respondents provided in the open response, out of a total of 1,203 observations. I ran the analysis on only these 1,203 observations, and the coefficient for the gender of the legislator (fem_treatment) was p=0.29 without controls and p=0.29 with controls.

But Butler et al 2022 coded the dependent variable to be zero for the 29,386 people who didn't respond to the survey at all or at least didn't respond in the open response. Converting these 29,386 observations to zero policy issues asked about produces corresponding p-values of p=0.06 and p=0.09. But it seems potentially misleading to focus on a dependent variable that conflates [1] the number of issues that a constituent asked about and [2] the probability that the constituent responded to the survey.

Table D2 of Butler et al 2022 indicates that constituents were more likely to respond to the female legislators' request to respond to the online survey (p<0.05). Butler et al 2022 indicates that "Women are thus contacted more often but do not receive more requests per contact" (p. 2281). But it doesn't seem correct to describe a higher chance of responding to a female legislator's request to complete a survey as contacting female legislators more, especially if the suggestion is that the experimental results about contact initiated by the legislator applies to contact that is not initiated by the legislator.

If anything, constituents being more likely to respond to female legislator requests than male legislator requests seems like a constituent bias in favor of female legislators.

---

NOTE

1. To date, no responses to tweets about the potential error or the research method.

Tagged with: , ,

1.

In May, I published a blog post about deviations from the pre-analysis plan for the Stephens-Dougan 2022 APSR letter, and I tweeted a link to the blog post that tagged @LaFleurPhD and asked her directly about the deviations from the pre-analysis plan. I don't recall receiving a response from Stephens-Dougan, and, a few days later, on May 31, I emailed the APSR about my post, listing three concerns:

* The Stephens-Dougan 2022 description of racially prejudiced Whites not matching how the code for Stephens-Dougan 2022 calculated estimates for racially prejudiced Whites.

* The substantial deviations from the pre-analysis plan.

* Figure 1 of the APSR letter reporting weighted estimates, but the evidence being much weaker in unweighted analyses.

Six months later (December 5), the APSR has published a correction to Stephens-Dougan 2022. The correction addresses each of my three concerns, but not perfectly, which I'll discuss below, along with other discussion about Stephens-Dougan 2022 and its correction. I'll refer to the original APSR letter as "Stephens-Dougan 2022" and the correction as "the correction".

---

2.

The pre-analysis plan associated with Stephens-Dougan 2022 listed four outcomes at the top of its page 4, but only one of these outcomes (referred to as "Individual rights and freedom threatened") was reported on in Stephens-Dougan 2022. However, Table 1 of Stephens-Dougan 2022 reported results for three outcomes that were not mentioned in the pre-analysis plan.

The t-statistics for the key interaction term for the three outcomes included in Table 1 of Stephens-Dougan 2022 but not mentioned in pre-analysis plan were 2.6, 2.0, and 2.1, all of which indicate sufficient evidence. The t-statistics for the key interaction term mentioned in pre-analysis plan but omitted from Stephens-Dougan 2022 were 0.6, 0.6, and 0.6, none of which indicate sufficient evidence.

I calculated the t-statistics of 2.6, 2.0, and 2.1 from Table 1 of Stephens-Dougan 2022, by dividing a coefficient by its standard error. I wasn't able to use the correction to calculate the t-statistics of 0.6, 0.6, and 0.6, because the relevant data for these three omitted pre-analysis plan outcomes are not in the correction but instead are in Table A12 of a "replication-final.pdf" file hosted at the Dataverse.

That's part of what I meant about an imperfect correction: a reader cannot use information published in the APSR itself to calculate the evidence provided by the outcomes that were planned to be reported on in the pre-analysis plan, or, for that matter, to see how there is substantially less evidence in the unweighted analysis. Instead, a reader needs to go to the Dataverse and dig through table after table of results.

The correction refers to deviations from the pre-analysis plan, but doesn't indicate the particular deviations and doesn't indicate what happens when these deviations are not made.  The "Supplementary Materials Correction-Final.docx" file at the Dataverse for Stephens-Dougan 2022 has a discussion of deviations from the pre-analysis plan, but, as far as I can tell, the discussion does not provide a reason why the results should not be reported for the three omitted outcomes, which were labeled in Table A12 as "Slow the Spread", "Stay Home", and "Too Long to Loosen Restrictions".

It seems to me to be a bad policy to permit researchers to deviate from a pre-analysis plan without justification and to merely report results from a planned analysis on, say, page 46 of a 68-page file on the Dataverse. But a bigger problem might be that, as far as I can tell, many journals don't even attempt to prevent misleading selective reporting for survey research for which there is no pre-analysis plan. Journals could require researchers reporting on surveys to submit or link to the full questionnaire for the surveys or at least to declare that the main text reports on results for all plausible measured outcomes and moderators.

---

3.

Next, let me discuss a method used in Stephens-Dougan 2022 and the correction, which I think is a bad method.

The code for Stephens-Dougan 2022 used measures of stereotypes about Whites and Blacks on the traits of hard working and intelligent, to create a variable called "negstereotype_endorsement". The code divided respondents into three categories, coded 0 for respondents who did not endorse a negative stereotype about Blacks relative to Whites, 0.5 for respondents who endorsed exactly one of the two negative stereotypes about Blacks relative to Whites, and 1 for respondents who endorsed both negative stereotypes about Blacks relative to Whites. For both Stephens-Dougan 2022 and the correction, Figure 3 reported for each reported outcome an estimate of how much the average treatment effect among prejudiced Whites (defined as those coded 1) differed from the average treatment effect among unprejudiced Whites (defined as those coded 0).

The most straightforward way to estimate this difference in treatment effects is to [1] calculate the treatment effect for prejudiced Whites coded 1, [2] calculate the treatment effect for unprejudiced Whites coded 0, and [3] calculate the difference between these treatment effects. The code for Stephens-Dougan 2022 instead estimated this difference using a logit regression that had three predictors: the treatment, the 0/0.5/1 measure of prejudice, and an interaction of the prior two predictors. But, by this method, the estimated difference in treatment effect between the 1 respondents and the 0 respondents depends on the 0.5 respondents. I can't think of a valid reason why responses from the 0.5 respondents should influence an estimated difference between the 0 respondents and the 1 respondents.

See my Stata output file for more on that. The influence of the 0.5 respondents might not be major in most or all cases, but an APSR reader won't know, based on Stephens-Dougan 2022 or its correction, the extent to which the 0.5 respondents influenced the estimates for the comparison of the 0 respondents to the 1 respondents.

Now about those 0.5 respondents…

---

4.

Remember that the Stephens-Dougan 2022 "negative stereotype endorsement" variable has three levels: 0 for the 74% of respondents who did not endorse a negative stereotype about Blacks relative to Whites, 0.5 for the 16% of respondents who endorsed exactly one of the two negative stereotypes about Blacks relative to Whites, and 1 for the 10% of respondents who endorsed both negative stereotypes about Blacks relative to Whites.

The correction indicates that "I discovered an error in the description of the variable, negative stereotype endorsement" and that "there was no error in the code used to create the variable". So was the intent for Stephens-Dougan 2022 to measure racial prejudice so that only the 1 respondents are considered prejudiced? Or was the intent to consider the 0.5 respondents and the 1 respondents to be prejudiced?

The pre-analysis plan seems to indicate a different method for measuring the moderator of negative stereotype endorsement:

The difference between the rating of Blacks and Whites is taken on both dimensions (intelligence and hard work) and then averaged.

But the pre-analysis plan also indicates that:

For racial predispositions, we will use two or three bins, depending on their distributions.

So, even ignoring the plan to average the stereotype ratings, the pre-analysis plan is inconclusive about whether the intent was to use two or three bins. Let's try this passage from Stephens-Dougan 2022:

A nontrivial fraction of the nationally representative sample—26%—endorsed either the stereotype that African Americans are less hardworking than whites or that African Americans are less intelligent than whites.

So that puts the 16% of respondents at the 0.5 level of negative stereotype endorsement into the same bin as the 10% at the 1 level of negative stereotype endorsement. Stephens-Dougan 2022 doesn't report the percentage that endorsed both negative stereotypes about Blacks. Reporting the percentage of 26% is what would be expected if the intent was to place into one bin any respondent who endorsed at least one of the negative stereotypes about Blacks, so I'm a bit skeptical of the claim in the correction that the description is in error and the code was correct. Maybe I'm missing something, but I don't see how someone who intends to have three bins reports the 26% and does not report the 10%.

For another thing, Stephens-Dougan 2022 has only three figures: Figure 1 reports results for racially prejudiced Whites, Figure 2 reports results for non-racially prejudiced Whites, and Figure 3 reports on the difference between racially prejudiced Whites and non-racially prejudiced Whites. Did Stephens-Dougan 2022 intend to not report results for the group of respondents who endorsed exactly one of the negative stereotypes about Blacks? Did Stephens-Dougan 2022 intend to suggest that respondents who rate Blacks as lazier in general than Whites aren't racially prejudiced as long as they rate Blacks equal to or higher than Whites in general on intelligence?

---

5.

Stephens-Dougan 2022 and the correction depict 84% confidence intervals in all figures. Stephens-Dougan 2022 indicated (footnote omitted) that:

For ease of interpretation, I plotted the predicted probability of agreeing with each pandemic measure in Figure 1, with 84% confidence intervals, the graphical equivalent to p < 0.05.

The 84% confidence interval is good for assessing a p=0.05 difference between estimates, but not for assessing at p=0.05 whether an estimate differs from a particular number such as zero. So 84% confidence intervals make sense for Figures 1 and 2, in which the key comparisons are of the control estimate to the treatment estimate. But 84% confidence intervals don't make as much sense for Figure 3, which plot only one estimate and for which the key assessment is whether the estimate differs from zero (Figure 3 in Stephens-Dougan 2022) or from 1 (the correction).

---

6.

I didn’t immediately realize why, in Figure 3 in Stephens-Dougan 2022, two of the four estimates cross zero, but in Figure 3 in the correction, none of the four estimates cross zero. Then I realized that the estimates plotted in Figure 3 of the correction (but not Figure 3 in Stephens-Dougan 2022) are odds ratios.

The y-axis for odds ratios for Figure 3 of the correction ranges from 0 to 30-something, using a linear scale. The odds ratio that indicates no effect is 1, and an odds ratio can't be negative, so that it why none of the four estimates cross zero in the corrected Figure 3.

It seems like a good idea for a plot of odds ratios to have a guideline for 1, so that readers can assess whether an odds ratio indicating no effect is a plausible value. And a log scale seems like a good idea for odds ratios, too. Relevant prior post that mentions that Fenton and Stephens-Dougan 2021 described a "very small" 0.01 odds ratio as "not substantively meaningful".

None of the 84% confidence intervals for Figure 3 capture an odds ratio that crosses 1, but an 84% confidence interval for Figure A3 in "Supplementary Materials Correction-Final.docx" does.

---

7.

Often, when I alert an author or journal to an error in a publication, the subsequent correction doesn't credit me for my work. Sometimes the correction even suggests that the authors themselves caught the error, like the correction to Stephens-Dougan 2022 seems to do:

After reviewing my code, I discovered an error in the description of the variable, negative stereotype endorsement.

I guess it's possible that Stephens-Dougan "discovered" the error. For instance, maybe after she submitted page proofs, for some reason she decided to review her code, and just happened to catch the error that she had missed before, and it's a big coincidence that this was the same error that I blogged about and alerted the APSR to.

And maybe Stephens-Dougan also discovered that her APSR letter misleadingly deviated from the relevant pre-analysis plan, so that I don't deserve credit for alerting the APSR to that.

Tagged with: , , , , , , , ,

I reached ten new publications to comment on that I didn't think were worth a separate blog post, so here goes:

---

1.

The Twitter account for the journal Politics, Groups, and Identities retweeted R.G. Cravens linking to two of his articles in Politics, Groups, and Identities. I blogged about one of these articles, discussing, among other things, the article's erroneous interpretation of interaction terms. The other article that R.G. Cravens linked to in that tweet ("The view from the top: Social acceptance and ideological conservatism among sexual minorities") also misinterpreted an interaction term:

However, the coefficient estimate for the interaction term between racial minority identity and racial identity group consciousness (β = −.312, p = .000), showing the effect of racial identity group consciousness only among racial minority respondents, indicates a negative relationship between racial minority group consciousness and conservatism at the 99% confidence level.

The corresponding Table 1 coefficient for RI Consciousness is 0.117, indicating the estimated effect of racial identity consciousness when the "Racial minority" variable is set to zero. The -0.312 interaction term indicates how much the estimated effect of racial identity consciousness *differs* between non-racial minorities and racial minorities, so that the estimated effect of racial identity consciousness among racial minorities is 0.117 plus -0.312, which is -0.195.

Two articles by one author in the same journal within three years, and each article misinterpreted an interaction term.

---

2.

PS: Political Science & Politics published another article about student evaluations of teaching: Foster 2022 "Instructor name preference and student evaluations of instruction". The key finding seems plausible, that "SEIs were higher for instructors who preferred going by their first name...than for instructors who preferred going by 'Dr. Moore'" (p. 4).

But a few shortcomings about the reporting on the experiment in Study 2, which manipulated the race of an instructor, the gender of the instructor, and the instructor's stated preference for using his/her first name versus using his/her title and last name:

* Hypothesis 5 is about conservative Republicans:

Moderated mediation: We predict that female instructors who express a preference for going by "Dr. Moore" will have lower teacher ratings through decreased perceived immediacy, but only for students who identify as conservative and Republican.

But, as far as I can tell, the article doesn't report any data about Hypothesis 5.

* Table 2 indicates a positive p<0.05 correlation between the race of the instructor and SEIs (student evaluations of instruction) and a positive p<0.05 correlation between the race of the instructor and course evaluations. But, as far as I can tell, the article doesn't report how the race variable was coded, so it's not clear whether the White instructors or the Black instructors had the higher SEIs and course evaluations.

* The abstract indicates that:

Study 2 found the highest SEIs for Black male instructors when instructors asked students to call them by their first name, but there was a decrease in SEI scores if they went by their professional title.

But, as far as I can tell, the article doesn't report sufficient evidence about whether the estimated influence of the name preference among the Black male instructor targets differed from the estimated influence of the name preference among any of the comparison instructors. The p-value being under p=0.05 for the Black male instructor targets and not being under p=0.05 for the other instructor targets isn't enough evidence to infer at p<0.05 that participants treated the Black male instructor targets differently than participants treated the comparison instructor targets, so that the article doesn't report sufficient evidence to permit an inference of racial discrimination.

---

---

5.

I wasn't the only person to notice this next one (see tweets from Tom Pepinsky and Brendan Nyhan), but Politics & Gender recently published Forman-Rabinovici and Mandel 2022 "The prevalence and implications of gender blindness in quantitative political science research", which indicated that:

Our findings show that gender-sensitive analysis yields more accurate and useful results. In two out of the three articles we tested, gender-sensitive analysis indeed led to different outcomes that changed the ramifications for theory building as a result.

But the inferential technique in the analysis reflected a common error.

For the first of the three aforementioned articles (Gandhi and Ong 2019), Table 1a of Forman-Rabinovici and Mandel 2022 reported results with a key coefficient that was -.308 across the sample, was -.294 (p=.003) among men in the sample, and was -.334 (p=.154) among women in the sample. These estimates are from a linear probability model predicting a dichotomous "Support PH" outcome, so the point estimates were 29 percentage points among men and 33 percentage points among women.

The estimate was more extreme among women than among men, but the estimate was less precise among women than among men, at least partly because the sample size among men (N=1902) was about three times the sample size among women (N=652).

Figure 1 of Forman-Rabinovici and Mandel 2022 described these results as:

Male voters leave PH coalition

Female voters continue to vote for PH coalition

But, in my analysis of the data, the ends of the 95% confidence interval for the estimate among women indicated an 82 percentage point decrease and a 15 percentage point increase [-0.82, +0.15], so that's not nearly enough evidence to infer a lack of an effect among women.

---

6.

Politics & Gender published another article that has at least a misleading interpretation of interaction terms: Kreutzer 2022 "Women's support shaken: A study of women's political trust after natural disasters".

Table 1 reports results for three multilevel mixed-effects linear regressions, with coefficients on a "Number of Disasters Present" predictor of 0.017, 0.009, and 0.022. The models have a predictor for "Female" and an interaction of "Female" and "Number of Disasters Present" with interaction coefficients of –0.001, –0.002, and –0.001. So the combination of coefficients indicates that the associations of "Number of Disasters Present" and the "trust" outcomes are positive among women, but not as positive as the associations are among men.

Kreutzer 2022 discusses this correctly in some places, such as indicating that the interaction term "allows a comparison of how disasters influence women's political trust compared with men's trust" (p. 15). But in other places the interpretation is, I think, incorrect or at least misleading, such as in the abstract (emphasis added):

I investigate women's trust in government institutions when natural disasters have recently occurred and argue that because of their unique experiences and typical government responses, women's political trust will decline when there is a natural disaster more than men's. I find that when there is a high number of disasters and when a larger percentage of the population is affected by disasters, women's political trust decreases significantly, especially institutional trust.

Or from page 23:

I have demonstrated that natural disasters create unique and vulnerable situations for women that cause their trust in government to decline.

And discussing Figure 5, referring to a different set of three regressions (reference to footnote 12 omitted):

The figure shows a small decline in women's trust (overall, institutional, organizational) as the percentage of the population affected by disasters in the country increases. The effect is significantly different from 0, but the percentage affected seems not to make a difference.

That seems to say that the percentage of the population affected has an effect that is simultaneously not zero and does not seem to make a difference. I think Figure 5 marginal effects plots indicate that women have lower trust than men (which is why each point estimate line falls in the negative range), but that this gender difference in trust does not vary much by the percentage of the population affected (which is why the each point estimate line is pretty much flat).

---

The "Women's Political Empowerment Index" coefficient and standard error are –0.017 and 0.108 in Model 4, so maybe the ** indicating a two-tailed p<0.01 is an error.

Tweet to the author (Oct 3). No reply yet.

---

7, 8.

Let's return to Politics, Groups, and Identities, for Ditonto 2019 "Direct and indirect effects of prejudice: sexism, information, and voting behavior in political campaigns". From the abstract:

I also find that subjects high in sexism search for less information about women candidates...

At least in the reported analyses, the comparison for "less" is to participants low in sexism instead of to male candidates. So we get this result discussing Table 2 (pp. 598-599):

Those who scored lowest in sexism are predicted to look at approximately 13 unique information boxes for the female candidate, while those who scored highest are predicted to access about 10 items, or almost 1/3 less.

It should be obvious to peer reviewers and any editors that a comparison to the male candidates in the experiment would be a more useful comparison for assessing the effect of sexism, because, for all we know, respondents high in sexism might search for less information than respondents low in sexism search for, no matter the gender of the candidate.

Ditonto has another 2019 article in a different journal (Political Psychology) based on the same experiment: "The mediating role of information search in the relationship between prejudice and voting behavior". From that abstract:

I also find that subjects high in prejudice search for less information about minority candidates...

But, again, Table 2 in that article merely indicates that symbolic racism negatively associates with information search for a minority candidate, with no information provided about information search for a non-minority candidate.

---

And I think that the Ditonto 2019 abstracts include claims that aren't supported by results reported in the article. The PGI abstract claims that "I find that subjects with higher scores on items measuring modern sexism...rate female candidates more negatively than their male counterparts", and the PP abstract claims that "I find that subjects higher in symbolic racism...rate minority candidates more negatively than their white counterparts".

By the way, claims about respondents high in sexism or racism should be assessed using data only from respondents high in sexism or racism, because the association of a sexism or racism measure with an outcome might be completely due to respondents low in sexism or racism.

Tweet to the author (Oct 9). No reply yet.

---

9.

Below is a passage from "Lower test scores from wildfire smoke exposure", by Jeff Wen and Marshall Burke, published in 2022 in Nature Sustainability:

When we consider the cumulative losses over all study years and across subgroups (Fig. 4b), we estimate the net present value of lost future income to be roughly $544 million (95% CI: −$999 million to −$100 million) from smoke PM2.5 exposure in 2016 for districts with low economic disadvantage and low proportion of non-White students. For districts with high economic disadvantage and high proportion of non-White students, we estimate cumulative impacts to be $1.4 billion (95% CI: −$2.3 billion to −$477 million) from cumulative smoke PM2.5 exposure in 2016. Thus, of the roughly $1.7 billion in total costs during the smokiest year in our sample, 82% of the costs we estimate were borne by economically disadvantaged communities of colour.

So, in 2016, the lost future income was about $0.5 billion for low economic disadvantage / low non-White districts and $1.4 billion for high economic disadvantage / high non-White districts; that gets us to $1.9 billion, without even including the costs from low/high districts and high/low districts. But total costs were cited as roughly $1.7 billion.

From what I can tell from Figure 4b, the percentage of total costs attributed to economically disadvantaged communities of color (the high / high category) is 59%. It's not a large inferential difference from 82%, in that both estimates are a majority, but it's another example of an error that could have been caught by careful reading.

Tweet to the authors about this (Oct 17). No reply yet.

---

10.

Political Research Quarterly published "Opening the Attitudinal Black Box: Three Dimensions of Latin American Elites' Attitudes about Gender Equality", by Amy Alexander, Asbel Bohigues, and Jennifer M. Piscopo.

I was curious about the study's measurement of attitudes about gender equality, and, not unexpectedly, the measurement was not good, using items such as "In general, men make better political leaders than women", in which respondents can agree that men make better political leaders, can disagree that men make better political leaders, and can be neutral about the claim that men make better political leaders...but respondents cannot report the belief that, in general, women make better political leaders than men do.

I checked the data, in case almost no respondent disagreed with the statement that "In general, men make better political leaders than women", in which case presumably no respondent would think that women make better political leaders than men do. But disagreement with the statement was pretty high, with 69% strongly disagreeing, another 15% disagreeing, and another 11% selecting neither agree nor disagree.

I tweeted a question about this to some of the authors (Oct 21). No reply yet.

Tagged with: , , , ,

The American Political Science Review recently published a letter: Stephens-Dougan 2022 "White Americans' reactions to racial disparities in COVID-19".

Figure 1 of the Stephens-Dougan 2022 APSR letter reports results for four outcomes among racially prejudiced Whites, with the 84% confidence interval in the control overlapping with the 84% confidence interval in the treatment for only one of the four reported outcomes (zooming in on Figure 1, the confidence intervals for the parks outcome don't seem to overlap, and the code returns 0.1795327 for the upper bound for the control and 0.18800818 for the lower bound for the treatment). And results for the most obviously overlapping 84% confidence intervals seem to be interpreted as sufficient evidence of an effect, with all four reported outcomes discussed in the passage below:

When racially prejudiced white Americans were exposed to the racial disparities information, there was an increase in the predicted probability of indicating that they were less supportive of wearing face masks, more likely to feel their individual rights were being threatened, more likely to support visiting parks without any restrictions, and less likely to think African Americans adhere to social distancing guidelines.

---

There are at least three things to keep track of: [1] the APSR letter, [2] the survey questionnaire, located at the OSF site for the Time-sharing Experiments for the Social Sciences project; and [3] the pre-analysis plan, located at the OSF and in the appendix of the APSR article. I'll use the PDF of the pre-analysis plan. The TESS site also has the proposal for the survey experiment, but I won't discuss that in this post.

---

The pre-analysis plan does not mention all potential outcome variables that are in the questionnaire, but the pre-analysis plan section labeled "Hypotheses" includes the passage below:

Specifically, I hypothesize that White Americans with anti-Black attitudes and those White Americans who attribute racial disparities in health to individual behavior (as opposed to structural factors), will be more likely to disagree with the following statements:

The United States should take measures aimed at slowing the spread of the coronavirus while more widespread testing becomes available, even if that means many businesses will have to stay closed.

It is important that people stay home rather than participating in protests and rallies to pressure their governors to reopen their states.

I also hypothesize that White Americans with anti-Black attitudes and who attribute racial health disparities to individual behavior will be more likely to agree with the following statements:

State and local directives that ask people to "shelter in place" or to be "safer at home" are a threat to individual rights and freedom.

The United States will take too long in loosening restrictions and the economic impact will be worse with more jobs being lost

The four outcomes mentioned in the passage above correspond to items Q15, Q18, Q16, and Q21 in the survey questionnaire, but, of these four outcomes, the APSR letter reported on only Q16.

The outcome variables in the APSR letter are described as: "Wearing facemasks is not important", "Individual rights and freedom threatened", "Visit parks without any restrictions", and "Black people rarely follow social distancing guidelines". These outcome variables correspond to survey questionnaire items Q20, Q16, Q23A, and Q22A.

---

The pre-analysis plan PDF mentions moderators, with three moderators about racial dispositions: racial resentment, negative stereotype endorsement, and attributions for health disparities. The plan indicates that:

For racial predispositions, we will use two or three bins, depending on their distributions. For ideology and party, we will use three bins. We will include each bin as a dummy variable, omitting one category as a baseline.

The APSR letter reported on only one racial predispositions moderator: negative stereotype endorsement.

---

I'll post a link in the notes below to some of my analyses about the "Specifically, I hypothesize" outcomes, but I don't want to focus on the results, because I wanted this post to focus on deviations from the pre-analysis plan, because -- regardless of whether the estimates from the analyses in the APSR letter are similar to the estimates from the planned analyses in the pre-analysis plan -- I think that it's bad that readers can't trust the APSR to ensure that a pre-analysis plan is followed or at least to provide an explanation about why a pre-analysis plan was not followed, especially given that this APSR letter described itself as reporting on "a preregistered survey experiment" and included the pre-analysis plan in the appendix.

---

NOTES

1. The Stephens-Dougan 2022 APSR letter suggests that the negative stereotype endorsement variable was coded dichotomously ("a variable indicating whether the respondent either endorsed the stereotype that African Americans are less hardworking than whites or the stereotype that African Americans are less intelligent than whites"), but the code and the appendix of the APSR letter indicate that the negative stereotype endorsement variable was measured so that the highest level is for respondents who reported a negative relative stereotype about Blacks for both stereotypes. From Table A7:

(unintelligentstereotype 2 + lazystereotype2 )/2

In the data after running the code for the APSR letter, the negative stereotype endorsement variable is a three-level variable coded 0 for respondents who did not report a negative relative stereotype about Blacks for either stereotype, 0.5 for respondents who reported a negative stereotype about Blacks for one stereotype, and 1 for respondents who reported a negative relative stereotype about Blacks for both stereotypes.

2. The APSR letter indicated that:

The likelihood of racially prejudiced respondents in the control condition agreeing that shelter-in-place orders threatened their individual rights and freedom was 27%, compared with a likelihood of 55% in the treatment condition (p < 0.05 for a one-tailed test).

My analysis using survey weights got 44% and 29% among participants who reported a negative relative stereotype about Blacks for at least one of the two stereotype items, and my analysis got 55% and 26% among participants who reported negative relative stereotypes about Blacks for both stereotype items, with a trivial overlap in 84% confidence intervals.

But the 55% and 26% in a weighted analysis were 43% and 37% in an unweighted analysis with a large overlap in 84% confidence intervals, suggesting that at least some of the results in the APSR letter might be limited to the weighted analysis. I ran the code for the APSR letter removing the weights from the glm command and got the revised Figure 1 plot below. The error bars in the APSR letter are described as 84% confidence intervals.

I think that it's fine to favor the weighted analysis, but I'd prefer that publications indicate when results from an experiment are not robust to the application or non-application of weights. Relevant publication.

3. Given the results in my notes [1] and [2], maybe the APSR letter's Figure 1 estimates are for only respondents who reported negative relative stereotype about Blacks for both stereotypes. If so, the APSR letter's suggestion that this population is the 26% that reported anti-Black stereotypes for either stereotype might be misleading, if the Figure 1 analyses were estimated for only the 10% that reported negative relative stereotype about Blacks for both stereotypes.

For what it's worth, the R code for the APSR letter has code that doesn't use the 0.5 level of the negative stereotype endorsement variable, such as:

# Below are code for predicted probabilities using logit model

# Predicted probability "individualrights_dichotomous"

# Treatment group, negstereotype_endorsement = 1

p1.1 <- invlogit(coef(glm1)[1] + coef(glm1)[2] * 1 + coef(glm1)[3] * 1 + coef(glm1)[4] * 1)

It's possible to see what happens to the Figure 1 results when the negative stereotype endorsement variable is coded 1 for respondents who endorsed at least one of the stereotypes. Run this at the end of the Stata code for the APSR letter:

replace negstereotype_endorsement = ceil((unintelligentstereotype2 + lazystereotype2)/2)

Then run the R code for the APSR letter. Below is the plot I got for a revised Figure 1, with weights applied and the sample limited to respondents who endorsed at least one of the stereotypes:

Estimates in the figure above were close to estimates in my analysis using these Stata commands after running the Stata code from the APSR letter. Stata output.

4. Data, Stata code, and Stata output for my analysis about the "Specifically, I hypothesize" passage of the Stephens-Dougan pre-analysis plan.

My analysis in the Stata output had seven outcomes: the four outcomes mentioned in the "Specifically, I hypothesize" part of the pre-analysis plan as initially measured (corresponding to questionnaire items Q15, Q18, Q16, and Q21), with no dichotomization of five-point response scales for Q15, Q18, and Q16; two of these outcomes (Q15 and Q16) dichotomized as mentioned in the pre-analysis plan (e.g., "more likely to disagree" was split into disagree / not disagree categories, with the not disagree category including respondent skips); and one outcome (Q18) dichotomized so that one category has "Not Very Important" and "Not At All Important" and the other category has the other responses and skips, given that the pre-analysis plan had this outcome dichotomized as disagree but response options in the survey were not on an agree-to-disagree scale. Q21 was measured as a dichotomous variable.

The analysis was limited to presumed racially prejudiced Whites, because I think that that's what the pre-analysis plan hypotheses quoted above focused on. Moreover, that analysis seems more important than a mere difference between groups of Whites.

Note that, for at least some results, a p<0.05 treatment effect might be in the unintuitive direction, so be careful before interpreting a p<0.05 result as evidence for the hypotheses.

My analyses aren't the only analyses that can be conducted, and it might be a good idea to combine results across outcomes mentioned in the pre-analysis plan or across all outcomes in the questionnaire, given that the questionnaire had at least 12 items that could serve as outcome variables.

For what it's worth, I wouldn't be surprised if a lot of people who respond to survey items in an unfavorable way about Blacks backlashed against a message about how Blacks were more likely than Whites to die from covid-19.

5. The pre-analysis plan included a footnote that:

Given the results from my pilot data, it is also my expectation that partisanship will moderate the effect of the treatment or that the treatment effects will be concentrated among Republican respondents.

Moreover, the pre-analysis plan indicated that:

The condition and treatment will be blocked by party identification so that there are roughly equal numbers of Republicans and Democrats in each condition.

But the lone mention of "Repub-" in the APSR letter is:

The sample was 39% self-identified Democrats (including leaners) and 46% self-identified Republicans (including leaners).

6. Link to tweets about the APSR letter.

Tagged with: , , , , , , , ,

Broockman 2013 "Black politicians are more intrinsically motivated to advance Blacks' interests: A field experiment manipulating political incentives" reported results from an experiment in which U.S. state legislators were sent an email from "Tyrone Washington", which is a name that suggests that the email sender is Black. The experimental manipulation was that "Tyrone" indicated that the city that he lived in was a city in the legislator's district or was a well-known city far from the legislator's district.

Based on Table 2 column 2, response percentages were:

  • 56.1% from in-district non-Black legislators
  • 46.4% from in-district Black legislators (= 0.561 - 0.097)
  • 28.6% from out-of-district non-Black legislators (= 0.561 - 0.275)
  • 41.4% from out-of-district Black legislators (= 0.561 - 0.275 + 0.128)

---

Broockman 2013 lacked another emailer to serve as comparison for response rates to Tyrone, such as an emailer with a stereotypical White name. Broockman 2013 discusses this:

One challenge in designing the experiment was that there were so few black legislators in the United States (as of November 2010) that a set of white letter placebo conditions could not be implemented due to a lack of adequate sample size.

So all emails in the Broockman 2013 experiment were signed "Tyrone Washington".

---

But here is how Broockman 2013 was cited by Rhinehar 2020 in American Politics Research:

A majority of this work has explored legislator responsiveness by varying the race or ethnicity of the email sender (Broockman, 2013;...

---

Costa 2017 in the Journal of Experimental Political Science:

As for variables that do have a statistically significant effect, minority constituents are almost 10 percentage points less likely to receive a response than non-minority constituents (p < 0.05). This is consistent with many individual studies that have shown requests from racial and ethnic minorities are given less attention overall, and particularly when the recipient official does not share their race (Broockman, 2013;...

But Broockman 2013 didn't vary the race of the requester, so I'm not sure of the basis for the suggestion that Broockman 2013 provided evidence that requests from racial and ethnic minorities are given less attention overall.

---

Mendez and Grose 2018 in Legislative Studies Quarterly:

Others argue or show, through experimental audit studies, that political elites have biases toward minority constituents when engaging in nonpolicy representation (e.g.,Broockman 2013...

I'm not sure how Broockman 2013 permits an inference of political elite bias toward minority constituents, when the only constituent was Tyrone.

---

Lajevardi 2018 in Politics, Groups, and Identities:

Audit studies have previously found that public officials are racially biased in whether and how they respond to constituent communications (e.g., Butler and Broockman 2011; Butler, Karpowitz, and Pope 2012; Broockman 2013;...

---

Dinesen et al 2021 in the American Political Science Review:

In the absence of any extrinsic motivations, legislators still favor in-group constituents (Broockman 2013), thereby indicating a role for intrinsic motivations in unequal responsiveness.

Again, Tyrone was the only constituent in Broockman 2013.

---

Hemker and Rink 2017 in the American Journal of Political Science:

White officials in both the United States and South Africa are more likely to respond to requests from putative whites, whereas black politicians favor putative blacks (Broockman 2013, ...

---

McClendon 2016 in the Journal of Experimental Political Science:

Politicians may seek to favor members of their own group and to discriminate against members of out-groups (Broockman, 2013...

---

Gell-Redman et al 2018 in American Politics Research:

Studies that explore other means of citizen and legislator interaction have found more consistent evidence of bias against minority constituents. Notably, Broockman (2013) finds that white legislators are significantly less likely to respond to black constituents when the political benefits of doing so were diminished.

But the only constituent was Tyrone, so you can't properly infer bias against Tyrone or minority constituents more generally, because the experiment didn't indicate whether the out-of-district drop-off for Tyrone differed from the out-of-district drop-off for a putative non-Black emailer.

---

Broockman 2014 in the American Journal of Political Science:

Outright racial favoritism among politicians themselves is no doubt real (e.g., Broockman 2013b;...

But who was Tyrone favored more than or less than?

---

Driscoll et al 2018 in the American Journal of Political Science:

Broockman (2013) finds that African American state legislators expend more effort to improve the welfare of black voters than white state legislators, irrespective of whether said voters reside in their districts.

Even ignoring the added description of the emailer as a "voter", response rates to Tyrone were not "irrespective" of district residence. Broockman 2013 even plotted data for the matched case analysis, in which the bar for in-district Black legislators was not longer than the bar for in-district non-Black legislators:

---

Shoub et al 2020 in the Journal of Race, Ethnicity, and Politics:

Black politicians are more likely to listen and respond to black constituents (Broockman 2013),...

The prior context in Shoub et al 2020 suggests that the "more likely" comparison is to non-Black politicians, but this description loses the complication in which Black legislators were not more likely than non-Black legislators to respond to in-district Tyrone, which is especially important if we reasonably assume that in-district Tyrone was perceived to be a constituent and out-of-district Tyrone wasn't. Same problem with Christiani et al 2021 in Politics, Groups, and Identities:

Black politicians are more likely to listen and respond to black constituents than white politicians (Broockman
2013)...

The similar phrasing for the above two passages might be due to the publications having the same group of authors: Shoub Epp Baumgartner Christiani Roach, and Christiani Shoub Baumgartner Epp Roach.

---

Gleason and Stout 2014 in the Journal of Black Studies:

Recent experimental studies conducted by Butler and Broockman (2011) and Broockman (2013) confirm these findings. These studies show that Black elected officials are more likely to help co-racial constituents in and outside of their districts gain access to the ballot more than White elected officials.

This passage, from what I can tell, describes both citations incorrectly: in Broockman 2013, Tyrone was asking for help getting unemployment benefits, and I'm not sure what the basis is for the "in...their districts" claim: in-district response rates were 56.1% from non-Black legislators and 46.4% from Black legislators. The Butler and Broockman 2011 appendix reports results such as DeShawn receiving responses from 41.9%, 22.4%, and 44.0% of Black Democrat legislators when DeShawn respectively asked about a primary, a Republican primary, and a Democratic primary and, respectively, from 54.3%, 56.1%, and 62.1% of White Democrat legislators.

But checking citations to Butler and Broockman 2011 would be another post.

---

NOTES

1. The above isn't a systematic analysis of citations of Broockman 2013, so no strong inferences should be made about the percentage of times Broockman 2013 was cited incorrectly, other than maybe too often, especially in these journals.

2. I think that, for the Broockman 2013 experiment, a different email could have been sent from a putative White person, without sample size concerns. Imagine that "Billy Bob" emailed each legislator asking for help with, say, welfare benefits. If, like with Tyrone, Black legislator response rates were similar for in-district Billy Bob and for out-of-district Billy Bob, that would provide a strong signal to not attribute the similar rates to an intrinsic motivation to advance Blacks' interests. But if the out-of-district drop off in Black legislator response rates was much larger for Billy Bob than for Tyrone, that would provide a strong signal to attribute the similar Black legislator response rates for in-district Tyrone and out-of-district Tyrone to an intrinsic motivation to advance Blacks' interests.

3. I think that the error bars in Figure 1 above might be 50% confidence intervals, given that the error bars seems to match the Stata command "reg code_some treat_out treatXblack leg_black [iweight=cem_weights], level(50)" that I ran on the Broockman 2013 data after line 17 in the Stata do file.

4. I shared this post with David Broockman, who provided the following comments:

Hi LJ,

I think you're right that some of these citations are describing my paper incorrectly and probably meant to cite my 2011 paper with Butler. (FWIW, in that study, we find legislators of all races seem to just discriminate in favor of their race, across both parties, so some of the citations don't really capture that either....)

The experiment would definitely be better with a white control, there was just a bias-variance trade-off here -- adding a putative race of constituent factor in the experiment would mean less bias but more variance. I did the power calculations and didn't think the experiment would be well-powered enough if I made the cells that small and were looking for a triple interaction between legislator race X letter writer putative race X in vs. out of district. In the paper I discuss a few alternative explanations that the lack of a white letter introduces and do some tests for them (see the 3 or 4 paragraphs starting with "One challenge..."). Essentially, I didn't see any reason why we should expect black legislators to just be generically less sensitive to whether a person is in their district, especially given in our previous paper we found they reacted pretty strongly to the race of the email sender (so it's not like the black legislators who do respond to emails just don't read emails carefully). Still, I definitely still agree with what I wrote then that this is a weakness of the study. It would be nice for someone to replicate this study, and I like the idea you have in footnote 2 for doing this. Someone should do that study!

Tagged with: , ,