This year, I have discussed several errors or flaws in recent journal articles (e.g., 1, 2, 3, 4). For some new examples, I think that Figure 2 of Cargile 2021 reported estimates for the feminine factor instead of, as labeled, the masculine factor, and Fenton and Stephens-Dougan 2021 described a "very small" 0.01 odds ratio as "not substantively meaningful":

Finally, the percent Black population in the state was also associated with a statistically significant decline in responsiveness. However, it is worth noting that this decline was not substantively meaningful, given that the odds ratio associated with this variable was very small (.01).

I'll discuss more errors or flaws in the notes below, with more blog posts planned.

---

Given that peer review and/or the editing process will miss errors that readers can catch, it seems like it would be a good idea for journal editors to get more feedback before an article is published.

For example, the Journal of Politics has been posting "Just Accepted" manuscripts before the final formatted version of the manuscript is published, which I think permits the journal to correct errors that readers catch in the posted manuscripts.

The Journal of Politics recently posted the manuscript for Baum et al. "Sensitive Questions, Spillover Effects, and Asking About Citizenship on the U.S. Census". I think that some of the results reported in the text do not match the corresponding results reported in Table 1. For example, the text (numbered p. 4) indicates that:

Consistent with expectations, we also find this effect was more pronounced for Hispanics, who skipped 4.21 points more of the questions after the Citizenship Treatment was introduced (t-statistic = 3.494, p-value is less than 0.001).

However, from what I can tell, the corresponding Table 1 result indicates a 4.49 difference, with a t-statistic of 3.674.

---

Another potential flaw in the above statement is that, from what I can tell, the t-statistic for the "more pronounced for Hispanics" claim is based on a test of whether the estimate among Hispanics differs from zero. However, the t-statistic for the "more pronounced for Hispanics" claim should instead be from a test of whether the estimate among Hispanics differs from the estimate among non-Hispanics or whatever comparison category the "more pronounced" refers to.

---

So, to the extent that these aforementioned issues are errors or flaws, maybe these can be addressed before the Journal of Politics publishes the final formatted version of the Baum et al. manuscript.

---

NOTES

1. I think that this is an error, from Lucas and Silber Mohamed 2021, with emphasis added:

Moreover, while racial sympathy may lead to some respondents viewing non-white candidates more favorably, Chudy finds no relationship between racial sympathy and gender sympathy, nor between racial sympathy and attitudes about gendered policies.

That seemed a bit unlikely to me when I read it, and, sure enough, Chudy 2020 footnote 20 indicates that:

The raw correlation of the gender sympathy index and racial sympathy index was .3 for the entire sample (n = 1,000) and .28 for whites alone (n = 751).

2. [sic] errors in Jardina and Stephens-Dougan 2021. Footnote 25:

The Stereotype items were note included on the 2020 ANES Time Series study.

...and the Section 4 heading:

Are Muslim's part of a "band of others?"

... and the Table 2 note:

2016 ANES Time Serie Study

Moreover, the note for Jardina and Stephens-Dougan 2021 Figure 1 describes the data source as: "ANES Cumulative File (face-to-face respondents only) & 2012 ANES Times Series (all modes)". But, based on the text and the other figure notes, I think that this might refer to 2020 instead of 2012.

These things happen, but I think that it's worth noting, at least as evidence against the idea that peer reviews shouldn't note grammar-type errors.

3. I discussed conditional-acceptance comments in my PS symposium entry "Left Unchecked".

Tagged with: , ,

1.

Abrajano and Lajevardi 2021 "(Mis)Informed: What Americans Know About Social Groups and Why it Matters for Politics" reported (p. 34) that:

We find that White Americans, men, the racially resentful, Republicans, and those who turn to Fox and Breitbart for news strongly predict misinformation about [socially marginalized] social groups.

But their research design is biased toward many or all of these results, given their selection of items for their 14-item set of misinformation items. I'll focus below on left/right political bias, and then discuss apparent errors in the publication.

---

2.

Item #7 is a true/false item:

Most terrorist incidents on US soil have been conducted by Muslims.

This item will code as misinformed some participants who overestimate the percentage of U.S.-based terror attacks committed by Muslims, but won't code as misinformed any participants who underestimate that percentage.

It seems reasonable to me that persons on the political Left will be more likely than persons on the Right to underestimate the percentage of U.S.-based terror attacks committed by Muslims and that persons on the political Right will be more likely than persons on the Left to overestimate the percentage of U.S.-based terror attacks committed by Muslims, so I'll code this item as favoring the political Left.

---

Four items (#11 to #14) ask about Black/White differences in receipt of federal assistance, but phrased so that Whites are the "primary recipients" of food stamps, welfare, and social security.

But none of these items measured misinformation about receipt of federal assistance as a percentage. So participants who report that the *number* of Blacks who receive food stamps is higher than the number of Whites who receive food stamps get coded as misinformed. But participants who mistakenly think that the *percentage* of Whites who receive food stamps is higher than the percentage of Blacks who receive food stamps do not get coded as misinformed.

Table 2 of this U.S. government report indicates that, in 2018, non-Hispanic Whites were 67% of households, 45% of households receiving SNAP (food stamps), and 70% of households not receiving SNAP. Respective percentages for Blacks were 12%, 27%, and 11% and for Hispanics were 13.5%, 22%, and 12%. So, based on this, it's correct that Whites are the largest racial/ethnic group that receives food stamps on a total population basis...but it's also true that Whites are the largest racial/ethnic group that does NOT receive food stamps on a total population basis.

It seems reasonable to me that the omission of percentage versions of these three public assistance items favors the political Left, in the sense that persons on the political Left are more likely to rate Blacks higher than Whites than are persons on the political Right, or, for that matter, Independents and moderates, so that these persons on the Left would presumably be more likely than persons on the Right to prefer (and thus guess) that Whites and not Blacks are the primary recipients of federal assistance. So, by my count, that's at least four items that favor the political Left.

---

As far as I can tell, Abrajano and Lajevardi 2021 didn't provide citations to justify their coding of correct responses. But it seems to me that such citation should be a basic requirement for research that codes responses as correct, except for  obvious items such as, say, who the current Vice President is. A potential problem with this lack of citation is that it's not clear to me that some responses that Abrajano and Lajevardi 2021 coded as correct are truly correct or at least are the only responses that should be coded as correct.

Abrajano and Lajevardi 2021 coded "Whites" as the only correct response for the "primary recipients" item about welfare, but this government document indicates that, for 2018, the distribution of TANF recipients was 37.8% Hispanic, 28.9% Black, 27.2% White, 2.1% multi-racial, 1.9% Asian, 1.5% AIAN, and 0.6% NHOPI.

And "about the same" is coded as the only correct response for the item about the "primary recipients" of public housing (item #14), but Table 14 of this CRS Report indicates that, in 2017, 33% of public housing had a non-Hispanic White head of household and 43% had a non-Hispanic Black head of household. This webpage permits searching for "public housing" for different years (screenshot below), which, for 2016, indicates percentages of 45% for non-Hispanic Blacks and 29% for non-Hispanic Whites.

Moreover, it seems suboptimal to have the imprecise "about the same" response be the only correct response. Unless outcomes for Blacks and Whites are exactly the same, presumably selection of one or the other group should count as the correct response.

---

Does a political bias in the Abrajano and Lajevardi 2021 research design matter? I think that the misinformation rates are close enough so that it matters: Figure A2 indicates that the Republican/Democrat misinformation gap is less than a point, with misinformed means of 6.51 for Republicans and 5.83 for Democrats.

Ironically, Abrajano and Lajevardi 2021 Table A1 indicates that their sample was 52% Democrat and 21% Republican, so -- on the "total" basis that Abrajano and Lajevardi 2021 used for the federal assistance items -- Democrats were the "primary" partisan source of misinformation about socially marginalized groups.

---

NOTES

1. Abrajano and Lajevardi 2021 (pp. 24-25) refers to a figure that isn't in the main text, and I'm not sure where it is:

When we compare the misinformation rates across the five social groups, a number of notable patterns emerge (see Figure 2)...At the same time, we recognize that the magnitude of difference between White and Asian American's [sic] average level of misinformation (3.4) is not considerably larger than it is for Blacks (3.2), nor for Muslim American respondents, who report the lowest levels of misinformation.

Table A5 in the appendix indicates that Blacks had a lower misinformation mean than Muslims did, 5.583 compared to 5.914, so I'm not sure what the aforementioned passage refers to. The passage phrasing refers to a "magnitude of difference", but 3.4 doesn't seem to refer to a social group gap or to an absolute score for any of the social groups.

2. Abrajano and Lajevardi 2021 footnote 13 is:

Recall that question #11 is actually four separate questions, which brings us to a total of thirteen questions that comprise this aggregate measure of political misinformation.

Question 11 being four separate questions means that there are 14 questions, and Abrajano and Lajevardi 2021 refers to "fourteen" questions elsewhere (pp. 6, 17).

Abrajano and Lajevardi 2021 indicated that "...we also observe about 11% of individuals who provided inaccurate answers to all or nearly all of the information questions" (p. 24, emphasis in the original), and it seems a bit misleading to italicize "all" if no one provided inaccurate responses to all 14 items.

3. Below, I'll discuss the full set of 14 "misinformation" items. Feel free to disagree with my count, but I would be interested in an argument that the 14 items do not on net bias results toward the Abrajano and Lajevardi 2021 claim that Republicans are more misinformed than Democrats about socially marginalized groups.

For the aforementioned items, I'm coding items #7 (Muslim terror %), #11 (food stamps), #12 (welfare), and #14 (public housing) as biased in favor of the political Left, because I think that these items are phrased so that the items will catch more misinformation among the political Right than among the political Left, even though the items could be phrased to catch more misinformation among the Left than among the Right.

I'm not sure about the item about social security (#13) , so I won't code that item as politically biased. So by my count that's 4 in favor of the Left, plus 1 neutral.

Item #5 seems to be a good item, measuring whether participants know that Blacks and Latinos are more likely to live in regions with environmental problems. But it's worth noting that this item is phrased in terms of rates and not, as for the federal assistance items, as the total number of persons by racial/ethnic group. So by my count that's 4 in favor of the Left, plus 2 neutral.

Item #1 is about the number of undocumented immigrants in the United States. I won't code that item as politically biased. So by my count that's 4 in favor of the Left, plus 3 neutral.

The correct response for item #2 is that most immigrants in the United States are here legally. I'll code this item as favoring the political Left for the same reason as the Muslim terror % item: the item catches participants who overestimate the percentage of immigrants here illegally, but the item doesn't catch participants who underestimate that percentage, and I think these errors are more likely on the Right and Left, respectively. So by my count that's 5 in favor of the Left, plus 3 neutral.

Item #6 is about whether *all* (my emphasis) U.S. universities are legally permitted to consider race in admissions. It's not clear to me why it's more important that this item be about *all* U.S. universities instead of about *some* or *most* U.S. universities. I think that it's reasonable to suspect that persons on the political Right will overestimate the prevalence of affirmative action and that persons on the political Left will underestimate the prevalence of affirmative action, so by my count that's 6 in favor of the Left, plus 3 neutral.

I'm not sure that items #9 and #10 have much of a bias (number of Muslims in the United States, and the country that has the largest number of Muslims), other than to potentially favor Muslims, given that the items measure knowledge of neutral facts about Muslims. So by my count that's 6 in favor of the Left, plus 5 neutral.

I'm not sure what "social group" item #8 is supposed to be about, which is about whether Barack Obama was born in the United States. I'm guessing that a good percentage of "misinformed" responses for this item are insincere. Even if it were a good idea to measure insincere responses to test a hypothesis about misinformation, I'm not sure why it would be a good idea to not also include a corresponding item about a false claim that, like the Obama item, is known to be more likely to be accepted among the political Left, such as items about race and killings by police. So I'll up the count to 7 in favor of the Left, plus 5 neutral.

Item #4 might reasonably be described as favoring the political Right, in the sense that I think that persons on the Right would be more likely to prefer that Whites have a lower imprisonment rate than Blacks and Hispanics. But the item has this unusual element of precision ("six times", "more than twice") that isn't present in items about hazardous waste and about federal assistance, so that, even if persons on the Right stereotypically guess correctly that Blacks and Hispanics have higher imprisonment rates than Whites, these persons still might not be sure that the "six times" and "more than twice" are correct.

So even though I think that this item (#4) can reasonably be described as favoring the political Right, I'm not sure that it's as easy for the Right to use political preferences to correctly guess this item as it is for the Left to use political preferences to correctly guess the hazardous waste item and the federal assistance items. But I'll count this item as favoring the Right, so by my count that's 7 in favor of the Left, 1 in favor of the Right, plus 5 neutral.

Item #3 is about whether the U.S. Census Bureau projects ethnic and racial minorities to be a majority in the United States by 2042. I think that it's reasonable that a higher percentage of persons on the political Left than the political Right would prefer this projection to be true, but maybe fear that the projection is true might bias this item in favor of the Right. So let's be conservative and count this item as favoring the Right, so that my coding of the overall distribution for the 14 misinformation items is: seven items favoring the Left, two items favoring the Right, and five politically neutral items.

4. The ANES 2020 Time Series Study has similar biases in its set of misinformation items.

Tagged with: , , , ,

Social Science Quarterly recently published Cooper et al. 2021 "Heritage Versus Hate: Assessing Opinions in the Debate over Confederate Monuments and Memorials". The conclusion of the article notes that:

...we uncover significant evidence that the debate over Confederate monuments can be resoundingly summarized as "hate" over "heritage"

---

In a prior post, I noted that:

...when comparing the estimated effect of predictors, inferences can depend on how well each predictor is measured, so such analyses should discuss the quality of the predictors.

Cooper et al. 2021 measured "heritage" with a dichotomous predictor and measured "hate" with a five-level predictor, and this difference in the precision of the measurements could have biased their research design toward a larger estimate for hate than for heritage. [See note 3 below for a discussion].

I'm not suggesting that the entire difference between their estimates for heritage and hate is due to the number of levels of the predictors, but I think that a better peer review would have helped eliminate that flaw in the research design, maybe by requiring the measure of hate to be dichotomized as close as possible to 70/30 like the measure of heritage was.

---

Here is the lone measure of heritage used in Cooper et al. 2021:

"Do you consider yourself a Southerner, or not?"

Table 1 of the article indicates that 70% identified as a Southerner, so even if this were a face-valid measure of Southern heritage, the measure places into the highest level of Southern heritage persons at the 35th percentile of Southern heritage.

Maybe there is more recent data that undercuts this, but data from the Spring 2001 Southern Focus Poll indicated that only about 1 in 3 respondents who identified as a Southerner indicated that being a Southerner was "very important" to them. About 1 in 3 respondents who identified as a Southerner in that 2001 poll indicated that being a Southerner was "not at all important" or "not very important" to them, and I can't think of a good reason why, without other evidence, these participants belong in the highest level of a measure of Southern heritage.

---

Wright and Esses 2017 had a more precise measure for heritage and found sufficient evidence to conclude that (p. 232):

Positive attitudes toward the Confederate battle flag were more strongly associated with Southern pride than with racial attitudes when accounting for these covariates.

How does Cooper et al. 2021 address the Wright and Esses 2017 result, which conflicts with the result from Cooper et al. 2021 and which used a related outcome variable and a better measure of heritage? The Cooper et al. 2021 article doesn't even mention Wright and Esses 2017.

---

A better peer review might have caught the minimum age of zero years old in Table 1 and objected to the description of "White people are currently under attack in this country" as operationalizing "racial resentment toward blacks" (pp. 8-9), given that this item doesn't even mention or refer to Blacks. I suppose that respondents who hate White people would be reluctant to agree that White people are under attack regardless of whether that is true. But that's not the "hate" that is supposed to be measured.

Estimating the effect of "hate" for this type of research should involve comparing estimates net of controls for respondents who have a high degree of hate for Blacks to respondents who are indifferent to Blacks. Such estimates can be biased if the estimates instead include data from respondents who have more negative feelings about Whites than about Blacks. In a prior post, I discussed Carrington and Strother 2020, which measured hate with a Black/White feeling thermometer difference and thus permitted estimation of how much of the effect of hate is due to respondents rating Blacks higher than Whites on the feeling thermometers.

---

Did Cooper et al. have access to better measures of hate than the item "White people are currently under attack in this country"? The Winthrop Poll site didn't list the Nov 2017 survey on its archived poll page for 2017. But, from what I can tell, this Winthrop University post discusses the survey, which included a better measure of racial resentment toward blacks. I don't know what information the peer reviewers of Cooper et al. 2021 had access to, but, generally, a journal reform that I would like to see for manuscripts reporting on a survey is for peer reviewers to be given access to the entire set of items for a survey.

---

In conclusion, for a study that compares the estimated effects of heritage and hate, I think that at least three things are needed: a good measure of heritage, a good measure of hate, and the good measure of heritage being of similar quality to the good measure of hate. I don't think that Cooper et al. 2021 has any of those things.

---

NOTES

1. The Spring 2001 Southern Focus Poll study was conducted by the Odum Institute for Research in Social Science of the University of North Carolina at Chapel Hill. Citation: Center for the Study of the American South, 2001, "Southern Focus Poll, Spring 2001", https://hdl.handle.net/1902.29/D-31552, UNC Dataverse, V1.

2. Stata output.

3. Suppose that mean support for leaving Confederate monuments as they are were 70% among the top 20 percent of respondents by Southern pride, 60% among the next 20 percent of respondents by Southern pride, 50% among the middle 20 percent, 40% among the next 20 percent, and 30% among the bottom 20 percent of respondents by Southern pride. And let's assume that these bottom 20 percent are indifferent about Southern pride and don't hate Southerners.

The effect of Southern pride could be estimated at 40 percentage points, which is the difference in support among the top 20 percent and bottom 20 percent by Southern pride. However, if we grouped the top 60 percent together and the bottom 40 percent together, the mean percentage support would respectively be 60% and 35%, for an estimated effect of 25 percentage points. In this illustration, the estimated effect for the five-level predictor is larger than the estimate for the dichotomous predictor, even with the same data.

Here is a visual illustration:

The above is a hypothetical to illustrate the potential bias in measuring one predictor with five levels and another predictor with two levels. I have no idea whether this had any effect on the results reported in Cooper et al. 2021. But, with a better peer review, readers would not need to worry about this type of bias in the Cooper et al. 2021 research design.

Tagged with: , , ,

I received a few questions and comments about my use of 83.4% confidence intervals on the plot in my prior post, so I thought I would post an explanation that I can refer to later.

---

Often, political scientists use a p-value of p=0.05 as a threshold for sufficient evidence of an association, such that only p-values under p=0.05 indicate sufficient evidence. Plotting 95% confidence intervals can help readers assess whether the evidence indicates that a given estimate differs from a given value.

For example, in unweighted data from the ANES 2020 Time Series Study, the 95% confidence interval for Black respondents' mean rating about Whites is [63.0, 67.0]. The number 62 falls outside the 95% confidence interval, so that indicates that there is sufficient evidence at p=0.05 that Black respondents' mean rating about Whites is not 62. However, the number 64 falls inside the 95% confidence interval, so that indicates that there is not sufficient evidence at p=0.05 that the mean rating about Whites among Black respondents is not 64.

---

But suppose that we wanted to assess whether two estimates differ *from each other*. Below is a plot of 95% confidence intervals for Black respondents' mean rating about Whites and about Asians, in unweighted data. For a test of the null hypothesis that the estimates differ from each other, the p-value is p=0.04, indicating sufficient evidence of a difference. However, the 95% confidence intervals overlap quite a bit.

The 95% confidence intervals in this case don't do a good job of permitting readers to assess differences between estimates at the p=0.05 level.

But below is a plot that instead uses 83.4% confidence intervals. The ends of the 83.4% confidence intervals come close to each other but do not overlap. If using confidence interval touching as an approximation to p=0.05 evidence of a difference, that closeness without overlapping is what we would expect from a p-value of p=0.04.

Based on whether 83.4% confidence intervals overlap, readers can often get a good sense whether estimates differ at p=0.05. So my current practice is to plot 95% confidence intervals when the comparison of interest is of an estimate to a given number and to plot 83.4% confidence intervals when the comparison of interest is of one estimate to another estimate.

---

NOTES

1. Data source: American National Election Studies. 2021. ANES 2020 Time Series Study Preliminary Release: Combined Pre-Election and Post-Election Data [dataset and documentation]. March 24, 2021 version. www.electionstudies.org.

2. R code for the plots.

Tagged with: ,

The ANES (American National Election Studies) has released the pre- and post-election questionnaires for its 2020 Time Series Study. I thought that it would be useful or at least interesting to review the survey for political bias. I think that the survey is remarkably well done on net, but I do think that ANES 2020 contains unnecessary political bias.

---

1

ANES 2020 has two gender resentment items on the pre-election survey and two modern sexism items on the post-election survey. These four items are phrased to measure negative attitudes about women, but ANES 2020 has no parallels to these four items regarding negative attitudes about men.

Even if researchers cared about only sexism against women, parallel measures of attitudes about men would still be necessary. Evidence indicates and theory suggests that participants sexist against men would cluster at the low end of a measure of sexism against women, so that sexism against women can't properly be estimated as the change from low level to high level of these measures.

This lack of parallel items about men will plausibly produce a political bias in research that uses these four items as measures of sexism, because, while a higher percentage of Republicans than of Democrats is biased against women, a higher percentage of Democrats than of Republicans is biased against men (evidence about partisanship is in in-progress research, but check here about patterns in the 2016 presidential vote).

ANES 2020 has a feeling thermometer for several racial groups, so hopefully future ANES surveys include feeling thermometers about men and women.

---

2

Another type of political bias involves inclusion of response options so that the item can detect only errors more common on the political right. Consider this post-election item labeled "misinfo":

1. Russia tried to interfere in the 2016 presidential election

2. Russia did not try to interfere in the 2016 presidential election

So the large percentage of Hillary Clinton voters who reported the belief that Russia tampered with vote tallies to help Donald Trump don't get coded as misinformed on this misinformation item about Russian interference. The only error that the item can detect is underestimating Russian interference.

Another "misinfo" example:

Which of these two statements do you think is most likely to be true?

1. World temperatures have risen on average over the last 100 years.

2. World temperatures have not risen on average over the last 100 years.

The item permits climate change "deniers" to be coded as misinformed, but does not permit coding as misinformed "alarmists" who drastically overestimate how much the climate has changed over the past 100 years.

Yet another "misinfo" example:

1. There is clear scientific evidence that the anti-malarial drug hydroxychloroquine is a safe and effective treatment for COVID-19.

2. There is not clear scientific evidence that the anti-malarial drug hydroxychloroquine is a safe and effective treatment for COVID-19.

In April 2020, the FDA indicated that "Hydroxychloroquine and chloroquine...have not been shown to be safe and effective for treating or preventing COVID-19", so the "deniers" who think that there is zero evidence available to support HCQ as a covid-19 treatment will presumably not be coded as "misinformed".

One more example (not labeled "misinfo"), from the pre-election survey:

During the past few months, would you say that most of the actions taken by protestors to get the things they want have been violent, or have most of these actions by protesters been peaceful, or have these actions been equally violent and peaceful?

[If the response is "mostly violent" or "mostly peaceful":]

Have the actions of protestors been a lot more or only a little more [violent/peaceful]?

I think that this item might refer to the well-publicized finding that "about 93% of racial justice protests in the US have been peaceful", so that the correct response combination is "mostly peaceful"/"a lot more peaceful" and, thus, the only error that the item permits is overestimating how violent the protests were.

For the above items, I think that the response options disfavor the political right, because I expect that a higher percentage of persons on the political right than the political left will deny Russian interference in the 2016 presidential election, deny climate change, overestimate the evidence for HCQ as a covid-19 treatment, and overestimate how violent recent pre-election protests were.

But I also think that persons on the political left will be more likely than persons on the political right to make the types of errors that the items do not permit to be measured, such as overestimating climate change over the past 100 years.

Other items marked "misinfo" involved vaccines causing autism, covid-19 being developed intentionally in a lab, and whether the Obama administration or the Trump administration deported more unauthorized immigrants during its first three years.

I didn't see an ANES 2020 item about whether the Obama administration or the Trump administration built the temporary holding enclosures ("cages") for migrant children, which I think would be similar to the deportations item, in that people not paying close attention to the news might get the item incorrect.

Maybe a convincing case could be made that ANES 2020 contains an equivalent number of items with limited response options disfavoring the political left as disfavoring the political right, but I don't think that it matters whether political bias in individual items cancels out, because any political bias in individual items is worth eliminating, if possible.

---

3

ANES 2020 has an item that I think alludes to President's Trump's phone call with the Ukrainian president. Here is a key passage from the transcript of the call:

The other thing, There's a lot of talk about Biden's son, that Biden stopped the prosecution and a lot of people want to find out about that so whatever you can do with the Attorney General would be great. Biden went around bragging that he stopped the prosecution so if you can look into it...It sounds horrible to me.

Here is an ANES 2020 item:

As far as you know, did President Trump ask the Ukrainian president to investigate President Trump's political rivals, did he not ask for an investigation, or are you not sure?

I'm presuming that the intent of the item is that a correct response is that Trump did ask for such an investigation. But, if this item refers to only Trump asking the Ukrainian president to look into a specific thing that Joe Biden did, it's inaccurate to phrase the item as if Trump asked the Ukrainian president to investigate Trump's political rivals *in general*, which is what the plural "rivals" indicates.

---

4

I think that the best available evidence indicates that immigrants do not increase the crime rate in the United States (pre-2020 citation) and that illegal immigration reduces the crime rate in the United States (pre-2020 citation). Here is an "agree strongly" to "disagree strongly" item from ANES 2020:

Immigrants increase crime rates in the United States.

Another ANES 2020 item:

Does illegal immigration increase, decrease, or have no effect on the crime rate in the U.S.?

I think that the correct responses to these items are the responses that a stereotypical liberal would be more likely to *want* to be true, compared to a stereotypical Trump supporter.

But I don't think that the U.S. violent crime statistics by race reflect the patterns that a stereotypical liberal would be more likely to want to be true, compared to a stereotypical Trump supporter.

Perhaps coincidentally, instead of an item about racial differences in violent crime rates for which responses could be correctly described as consistent or inconsistent with available mainstream research, ANES 2020 has stereotype items about how "violent" different racial groups are in general, which I think survey researchers will be much less likely to perceive to be addressed in mainstream research and will instead use to measure racism.

---

The above examples of what I think are political biases are relatively minor in comparison to the value that ANES 2020 looks like it will provide. For what it's worth, I think that the ANES is preferable to the CCES Common Content.

Tagged with: , , , ,

The Journal of Race, Ethnicity, and Politics published Buyuker et al 2020: "Race politics research and the American presidency: thinking about white attitudes, identities and vote choice in the Trump era and beyond".

Table 2 of Buyuker et al 2020 reported regressions predicting Whites' projected and recalled vote for Donald Trump over Hillary Clinton in the 2016 U.S. presidential election, using predictors such as White identity, racial resentment, xenophobia, and sexism. Xenophobia placed into the top tier of predictors, with an estimated maximum effect of 88 percentage points going from the lowest to the highest value of the predictor, and racial resentment placed into the second tier, with an estimated maximum effect of 58 percentage points.

I was interested in whether this difference is at least partly due to how well each predictor was measured. Here are characteristics of the predictors among Whites, which indicate that xenophobia was measured at a much more granular level than racial resentment was:

RACIAL RESENTMENT
4 items
White participants fell into 22 unique levels
4% of Whites at the lowest level of racial resentment
9% of Whites at the highest level of racial resentment

XENOPHOBIA
10 items
White participants fell into 1,096 unique levels
1% of Whites at the lowest level of xenophobia
1% of Whites at the highest level of xenophobia

So it's at least plausible from the above results that xenophobia might have outperformed racial resentment merely because the measurement of xenophobia was better than the measurement of racial resentment.

---

Racial resentment was measured with four items that each had five response options, so I created a reduced xenophobia predictor using the four xenophobia items that each had exactly five response options; these items were about desired immigration levels and agreement or disagreement with statements that "Immigrants are generally good for America's economy", "America's culture is generally harmed by immigrants", and "Immigrants increase crime rates in the United States".

I re-estimated the Buyuker et al 2020 Table 2 model replacing the original xenophobia predictor with the reduced xenophobia predictor: the maximum effect for xenophobia (66 percentage points) was similar to the maximum effect for racial resentment (66 percentage points).

---

Among Whites, vote choice correlated between r=0.50 and r=0.58 with each of the four racial resentment items and between r=0.39 and r=0.56 with nine of the ten xenophobia items. The exception was the seven-point item that measured attitudes about building a wall on the U.S. border with Mexico, which correlated with vote choice at r=0.72.

Replacing the desired immigration levels item in the reduced xenophobia predictor with the border wall item produced a larger estimated maximum effect for xenophobia (85 percentage points) than for racial resentment (60 percentage points). Removing all predictors from the model except for xenophobia and racial resentment, the reduced xenophobia predictor with the border wall item still produced a larger estimated maximum effect than did racial resentment: 90 percentage points, compared to 74 percentage points.

But the larger effect for xenophobia is not completely attributable to the border wall item: using a predictor that combined the other nine xenophobia items produced a maximum effect for xenophobia (80 percentage points) that was larger than the maximum effect for racial resentment (63 percentage points).

---

I think that the main takeaway from this post is that, when comparing the estimated effect of predictors, inferences can depend on how well each predictor is measured, so such analyses should discuss the quality of the predictors. Imbalances in which participants fall into 22 levels for one predictor and 1,096 levels for another predictor seem to be biased in favor of the more granular predictor, all else equal.

Moreover, I think that, for predicting 2016 U.S. presidential vote choice, it's at least debatable whether a xenophobia predictor should include an item about a border wall with Mexico, because including that item means that, instead of xenophobia measuring attitudes about immigrants per se, the xenophobia predictor conflates these attitudes with attitudes about a policy proposal that is very closely connected with Donald Trump.

---

It's not ideal to use regression to predict maximum effects, so I estimated a model using only the racial resentment predictor and the reduced four-item xenophobia predictor with the border wall item, but including a predictor for each level of the predictors. That model predicted failure perfectly for some levels of the predictors, so I recoded the predictors until those errors were eliminated, which involved combining the three lowest racial resentment levels (so that racial resentment ran from 2 through 16) and combining the 21st and 22nd levels of the xenophobia predictor (so that xenophobia ran from 0 through 23). In a model with only those two recoded predictors, the estimated maximum effects were 81 percentage points for xenophobia and 76 percentage points for racial resentment. Using all Buyuker et al 2020 predictors, the respective percentage points were 65 and 63.

---

I then predicted Trump/Clinton vote choice using only the 22-level racial resentment predictor and the full 1,096-level xenophobia predictor, but placing the values of the predictors into ten levels; the original scale for the predictors ran from 0 through 1, and, for the 10-level predictors, the first level for each predictor was from 0 to 0.1, a second level was from above 0.1 to 0.2, and a tenth level was from above 0.9 to 1. Using these predictors as regular predictors without "factor" notation, the gap in maximum effects was about 24 percentage points, favoring xenophobia. But using these predictors with "factor" notation, the gap favoring xenophobia fell to about 9.5 percentage points.

Plots below illustrate the difference in predictions for xenophobia: the left panel uses a regular 10-level xenophobia predictor, and the right panel uses each of the 10 levels of that predictor as a separate predictor.

---

So I'm not sure that these data support the inference that xenophobia is in a higher tier than racial resentment, for predicting Trump/Clinton vote in 2016. The above analyses seem to suggest that much or all of the advantage for xenophobia over racial resentment in the Buyuker et al 2020 analyses was due to model assumptions and/or better measurement of xenophobia.

---

Another concern about Buyuker et al 2020 is with the measurement of predictors such as xenophobia. The xenophobia predictor is more accurately described as something such as attitudes about immigrants. If some participants are more favorable toward immigrants than toward natives, and if these participants locate themselves at low levels of the xenophobia predictor, then the effect of xenophilia among these participants is possibly being added to the effect of xenophobia.

Concerns are similar for predictors such as racial resentment and sexism. See here and here for evidence that low levels of similar predictors associate with bias in the opposite direction.

---

NOTES

1. Thanks to Beyza Buyuker for sending me replication materials for Buyuker et al 2020.

2. Stata code for my analyses. Stata output for my analyses.

3. ANES 2016 citations:

The American National Election Studies (ANES). 2016. ANES 2012 Time Series Study. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2016-05-17. https://doi.org/10.3886/ICPSR35157.v1.

ANES. 2017. "User's Guide and Codebook for the ANES 2016 Time Series Study". Ann Arbor, MI, and Palo Alto, CA: The University of Michigan and Stanford University.

Tagged with: , , , ,

1.

Researchers reporting results from an experiment often report estimates of the treatment effect at particular levels of a predictor. For example, a panel of Figure 2 in Barnes et al. 2018 plotted, over a range of hostile sexism, the estimated difference in the probability of reporting being very unlikely to vote for a female representative target involved in a sex scandal relative to the probability of reporting being very unlikely to vote for a male representative target involved in a sex scandal. For another example, Chudy 2020 plotted, over a range of racial sympathy, estimated punishments for a Black culprit target and a White culprit target. Both of these plots report estimates derived from a regression. However, as indicated in Hainmueller et al. 2020, regression can nontrivially misestimate a treatment effect at particular levels of a predictor.

This post presents another example of this phenomenon, based on data from the experiment in Costa et al. 2020 "How partisanship and sexism influence voters' reactions to political #MeToo scandals" (link to a correction to Costa et al. 2020).

---

2.

The Costa et al. 2020 experiment had a control condition, two treatment conditions, and multiple outcome variables, but my illustration will focus on only two conditions and only one outcome variable. Participants were asked to respond to four items measuring participant sexism and to rate a target male senator on a 0-to-10 scale. Participants who were then randomized to the "sexual assault" condition were provided a news story indicating that the senator had been accused of groping two women without consent. Participants who were instead randomized to the control condition were provided a news story about the senator visiting a county fair. The outcome variable of interest for this illustration is the percent change in the favorability of the senator, from the pretest to the posttest.

Estimates in the left panel of Figure 1 are based on a linear regression predicting the outcome variable of interest, using predictors of a pretest measure of participant sexism from 0 for low sexism to 16 for high sexism, a dichotomous variable coded 1 for participants in the sexual assault condition and 0 for participants in the control, and an interaction of these predictors. The panel plots the point estimates and 95% confidence intervals for the estimated difference in the outcome variable, between the control condition and the sexual assault condition, at each observed level of the participant sexism index.

The leftmost point indicates that the "least sexist" participants in the sexual assault condition were estimated to have a value of the outcome variable that was about 52 units less than the "least sexist" participants in the control condition; the "least sexist" participants in the control were estimated to have increased their rating of the senator by 4.6 percent, and the "least sexist" participants in the sexual assault condition were estimated to have reduced their rating of the senator by 47.6 percent.

The rightmost point of the plot indicates that the "most sexist" participants in the sexual assault condition were estimated to have a value of the outcome variable that was about 0 units less than did the "most sexist" participants in the control condition; the "most sexist" participants in the control were estimated to have increased their rating of the senator by 1.7 percent, and the "least sexist" participants in the sexual assault condition were estimated to have increased their rating of the senator by 2.1 percent. Based on this rightmost point, a reader could conclude about the sexual assault allegations, as Costa et al. 2020 suggested, that:

...the most sexist subjects react about the same way to sexual assault and sexist jokes allegations as they do to the control news story about the legislator attending a county fair.

However, the numbers at the inside bottom of the Figure 1 panels indicate the sample size at that level of the sexism index, across the control condition and the sexual assault condition. These numbers indicate that the regression-based estimate for the "most sexist" participants was nontrivially based on the behavior of other participants.

Estimates in the right panel of Figure 1 are instead based on t-tests conducted for participants at only the indicated level of the sexism index. As in the left panel, the estimate for the "least sexist" participants falls between -50 and -60, and, for the next few higher observed values of the sexism index, estimates tend to rise and/or tend to get closer to zero. But the tendency does not persist above the midpoint of the sexism index. Moreover, the point estimates in the right panel for the three highest values of the sexism index do not fall within the corresponding 95% confidence intervals in the left panel.

The p-value fell below p=0.05 for the 28 participants at 15 or 16 on the sexism index, with a point estimate of -22. The sample size was 1,888 across these two conditions, so participants at 15 or 16 on the sexism index represent the top 1.5% of participants on the sexism index across these two conditions. Therefore, the sexual assault treatment appears to have had an effect on these "very sexist" participants.

---

3.

Regression can reveal patterns in data. For example, linear regression estimates correctly indicated that, in the Costa et al. 2020 experiment, the effect of the sexual assault treatment relative to the control was closer to zero for participants at higher levels of a sexism index than for participants at lower level of the sexism index. However, as indicated in the illustration above, regression can produce misestimates of an effect at particular levels of a predictor. Therefore, inferences about an estimated effect at a particular level of a predictor should be based only on cases at or around that level of the predictor and should not be influenced by other cases.

---

NOTES

1. Costa et al. 2020 data.

2. Stata code for the analysis.

3. R code for the plot. CSV file for the R plot.

4. The interflex R package (Hainmueller et al. 2020) produced the plot below, using six bins. The leveling off at higher values of the sexism index also appears in this interflex plot:

R code to add to the corrected Costa et al. 2020 code:

dat$sexism16 <- (dat$pre_sexism-1)*4

summary(dat$sexism16)

p1 <- inter.binning(data=dat, Y="perchange_vote", D="condition2", X="sexism16", nbins=6, base="Control")

plot(p1)

Tagged with: ,