I drafted a manuscript entitled "Six Things Peer Reviewers Can Do To Improve Political Science". It was rejected once in peer review, so I'll post at least some of the ideas to my blog. This first blog post is about comments on the Valentino et al. 2018 "Mobilizing Sexism" Public Opinion Quarterly article. I sent this draft of the manuscript to Valentino et al. on June 11, 2018, limited to the introduction and parts that focus on Valentino et al. 2018; the authors emailed me back comments on June 12, 2018, which Dr. Valentino asked me to post and that I will post after my discussion.

1. Unreported tests for claims about group differences

Valentino et al. (2018) report four hypotheses, the second of which is:

Second, compared to recent elections, the impact of sexism should be larger in 2016 because an outwardly feminist, female candidate was running against a male who had espoused disdain for women and the feminist project (pp. 219-220).

Here is the discussion of their Study 2 results in relation to that expectation:

The pattern of results is consistent with expectations, as displayed in table 2. Controlling for the same set of predispositions and demographic variables as in the June 2016 online study, sexism was significantly associated with voting for the Republican candidate only in 2016 (b = 1.69, p < .05) (p.225).

However, as Gelman and Stern 2006 observed, "comparisons of the sort, 'X is statistically significant but Y is not,' can be misleading" (p. 331). In Table 2 of Valentino et al. 2018, the sexism predictor in the 2016 model had a logit coefficient of 1.69 and a standard error of 0.81, and the p-value under .05 for this sexism predictor provides information about only whether the 2016 sexism coefficient differs from zero; this p-value under .05 does not indicate whether, at p<.05, the 2016 sexism coefficient differs from the imprecisely estimated sexism coefficients of 0.23, 0.94, and 0.34 for 2012, 2008, and 2004. That difference in coefficients between sexism in 2016 and sexism in the other years is what would be needed to test the second hypothesis about the impact of sexism being larger in 2016.

2. No summary statistics reported for a regression-based inference about groups

Valentino et al. 2018 Table 2 indicates that, compared to lower levels of participant modern sexism, higher levels of participant modern sexism associate with a greater probability of a participant reported vote for Donald Trump in 2016. But the article does not report the absolute mean levels of modern sexism among Trump voters or Clinton voters. These absolute mean levels are in the figure below, limited to participants in face-to-face interviews (per Valentino et al. 2019 footnote 8):

Results in the above image indicate that the mean response across Trump voters represented beliefs:

  • that the news media should pay the same amount of attention to discrimination against women that they have been paying lately;
  • that, when women complain about discrimination, they cause more problems than they solve less than half the time;
  • and that, when women demand equality these days, less than half of the time they are actually seeking special favors.

These don't appear to be obviously sexist beliefs in the sense that I am aware of evidence that the beliefs incorrectly or unfairly disadvantage or disparage women or men, but comments are open below if you know of evidence or have an argument that the mean Trump voter response is sexist for any of these three items. Moreover, it's not clear to me that sexism can be inferred based on measures about only one sex; if, for instance, a participant believes that, when women complain about discrimination, they cause more problems than they solve, and the participant also believes that, when men complain about discrimination, they cause more problems than they solve, then it does not seem reasonable to code that person as a sexist, without more information.

---

Response from Valentino et al.

Here is the response that I received from Valentino et al.

1) Your first concern was that we did not discuss one of the conditions in our MTurk study, focusing on disgust. The TESS reference is indeed the same study. However, we did not report results from the disgust condition because we did not theorize about disgust in this paper. Our theory focuses on the differential effects of fear vs. anger. We are in fact quite transparent throughout, indicating where predicted effects are non-significant. We also include a lengthy appendix with several robustness checks, etc. 

2) We never claim all Trump voters are sexist. We do claim that in 2016 gender attitudes are a powerful force, and more conservative scores on these measures significantly increase the likelihood of voting for Trump. The evidence from our work and several other studies supports this simple claim handsomely. Here is a sample of other work that replicates the basic finding in regarding the power of sexism in the 2016 election. Many of these studies use ANES data, as we do, but there are also several independent replications using different datasets. You might want to reference them in your paper. 

Blair, K. L. (2017). Did Secretary Clinton lose to a ‘basket of deplorables’? An examination of Islamophobia, homophobia, sexism and conservative ideology in the 2016 US presidential election. Psychology & Sexuality8(4), 334-355. 

Bock, J., Byrd-Craven, J., & Burkley, M. (2017). The role of sexism in voting in the 2016 presidential election. Personality and Individual Differences119, 189-193. 

Bracic, A., Israel-Trummel, M., & Shortle, A. F. (2018). Is sexism for white people? Gender stereotypes, race, and the 2016 presidential election. Political Behavior, 1-27.

Cassese, E. C., & Barnes, T. D. (2018). Reconciling Sexism and Women's Support for Republican Candidates: A Look at Gender, Class, and Whiteness in the 2012 and 2016 Presidential Races. Political Behavior, 1-24. 

Cassese, E., & Holman, M. R. Playing the woman card: Ambivalent sexism in the 2016 US presidential race. Political Psychology

Frasure-Yokley, L. (2018). Choosing the Velvet Glove: Women Voters, Ambivalent Sexism, and Vote Choice in 2016. Journal of Race, Ethnicity and Politics3(1), 3-25. 

Ratliff, K. A., Redford, L., Conway, J., & Smith, C. T. (2017). Engendering support: Hostile sexism predicts voting for Donald Trump over Hillary Clinton in the 2016 US presidential election. Group Processes & Intergroup Relations, 1368430217741203. 

Schaffner, B. F., MacWilliams, M., & Nteta, T. (2018). Understanding white polarization in the 2016 vote for president: The sobering role of racism and sexism. Political Science Quarterly133(1), 9-34. 

3) We do not statistically compare the coefficients across years, but neither do we claim to do so. We claim the following:

"Controlling for the same set of predispositions and demographic variables as in the June 2016 online study, sexism was significantly associated with voting for the Republican candidate only in 2016 (b = 1.69, p < .05). ...In conclusion, evidence from two nationally representative surveys demonstrates sexism to be powerfully associated with the vote in the 2016 election, for the first time in at least several elections, above and beyond the impact of other typically influential political predispositions and demographic characteristics."

Therefore, we predict (and show) sexism was a strong predictor in 2016 but not in other years. Our test is also quite conservative, since we include in these models all manner of predispositions that are known to be correlated with sexism. In Table 2, the confidence interval around our 2016 estimate for sexism in these most conservative models contains the estimate for 2008 in that analysis, and is borderline for 2004 and 2012, where the impact of sexism was very close to zero. However, the bivariate logit relationships between sexism and Trump voting are much more distinct, with 2016 demonstrating a significantly larger effect than the other years. These results are easy to produce with ANES data.

---

Regarding the response from Valentino et al.:

1. My concern is that the decision about what to focus on in a paper is influenced by the results of the study. If a study has a disgust condition, then a description of the results of that disgust condition should be reported when results of that study are reported; otherwise, selective reporting of conditions could bias the literature.

2. I'm not sure that anything in their point 2 addresses anything my manuscript.

3. I realize that Valentino et al. 2018 did not report or claim to report results for a statistical test comparing the sexism coefficient in 2016 to sexism coefficients in prior years. But that reflects my criticism: that, for the hypothesis that "compared to recent elections, the impact of sexism should be larger in 2016…" (Valentino et al. 2018: 219-220), the article should have reported a statistical test to assess the evidence that the sexism coefficient in 2016 was different than than the sexism coefficient in prior recent elections.

---

NOTE

Code for the figure.

Tagged with:

The 2018 CCES (Cooperative Congressional Election Study, Schaffner et al. 2019) has two items to measure respondent sexism and, in the same grid, two items to measure respondent racism, with responses measured on a five-point scale from strongly agree to strongly disagree:

  • White people in the U.S. have certain advantages because of the color of their skin.
  • Racial problems in the U.S. are rare, isolated situations.
  • When women lose to men in a fair competition, they typically complain about being discriminated against.
  • Feminists are making entirely reasonable demands of men.

The figure below reports the predicted probability of selecting the more liberal policy preference (support or oppose) on the CCES's four environmental policy items, weighted, limited to White respondents, and controlling for respondents' reported sex, age, education, partisan identification, ideological identification, and family income. Blue columns indicate predicted probabilities when controls are set to their means and respondent sexism and racism are set to their minimum values, and black columns indicate predicted probabilities when controls are set to their means and respondent sexism and racism are set to their maximum values.

Rplot01

Below are results replacing the two-item racism measure with the traditional four-item racial resentment measure:

rresent

One possibility is that these strong associations are flukes; but similar patterns appear for the racism items on the 2016 CCES (the 2016 CCES did not have sexism items).

If the strong associations above are not flukes, then I think three possibilities remain: [1] sexism and racism combine to be a powerful *cause* of environmental policy preferences among Whites, [2] this type of associational research design with these items cannot be used to infer causality generally speaking, and [3] this type of associational research design with these items cannot be used to infer causality about environmental policy preferences but could be used to infer causality about other outcome variables, such as approval of the way that Donald Trump is handling his job as president.

If you believe [1], please post in a comment below a theory about how sexism and racism cause substantial changes in these environmental policy preferences. If you believe [3], please post in a comment an explanation why this type of associational research design with these items can be used to make causal inferences for only certain outcome variables and, if possible, a way to determine for which outcome variables a causal inference could be made. If I have omitted a possibility, please also post a comment with that omitted possibility.

NOTES

Stata code.

Tagged with: ,

Gronke et al. (2018) reported in Table 6 that "Gender Bias in Student Evaluations" (Mitchell and Martin 2018, hereafter MM) was, as of 25 July 2018, the PS: Political Science & Politics article with the highest Altmetric score, described as "a measure of attention an article receives" (p. 906, emphasis removed).

The MM research design compared student evaluations of and comments on Mitchell (a woman) to student evaluations of and comments on Martin (a man) in official university course evaluations and on the Rate My Professors website. MM reported evidence that "the language students use in evaluations regarding male professors is significantly different than language used in evaluating female professors" and that "a male instructor administering an identical online course as a female instructor receives higher ordinal scores in teaching evaluations, even when questions are not instructor-specific" (p. 648).

I think that there are errors in the MM article that warrant a correction. I mention or at least allude to some or all of these things in a forthcoming symposium piece in PS: Political Science & Politics, but I elaborate below. Comments are open if you see an error in my analyses or inferences.

---

1.

MM Table 1 reports on comparisons of official university course evaluations for Mitchell and for Martin. The table indicates that the sample size was 68, and the file that Dr. Mitchell sent me upon my request has 23 of these comments for Martin and 45 of these comments for Mitchell. Table 1's "Personality" row indicates 4.3% for Martin and 15.6% for Mitchell, which correspond to 1 personality-related comment of 23 comments for Martin and 7 personality-related comments of 45 comments for Mitchell. The table has three asterisks to indicate a p-value less than 0.01 for the comparison of the 4.3% and the 15.6%, but it is not clear how such a low p-value was derived.

I conducted a simulation in R to estimate, given 8 personality-related comments across 68 comments, how often random distribution of these 8 personality-related comments would results in Martin's 23 comments having 1 or fewer personality-related comment. For the simulation, for 10 million trials, I started with eight 1s and sixty 0s, drew 23 of these 68 numbers to represent comments on Martin, and calculated the difference between the proportion of 1s for Martin and the proportion of 1s in the residual numbers (representing comments on Mitchell):

list <- rep_len(NA,10000000)
for (i in 1:10000000){
   comments <- c(rep_len(1,8),rep_len(0,60))
   martin <- sample(comments,23,replace=FALSE)
   diff.prop <- sum(martin)/23 - (8-sum(martin))/45
   list[i] <- diff.prop
}
stack(table(list))

Here are results from the simulation:

   values                 ind
1  290952  -0.177777777777778
2 1412204   -0.11207729468599
3 2788608 -0.0463768115942029
4 2927564  0.0193236714975845
5 1782937   0.085024154589372
6  646247   0.150724637681159
7  135850   0.216425120772947
8   14975   0.282125603864734
9     663   0.347826086956522

The -0.1778 in line 1 represents 0 personality-related comments of 23 comments for Martin and 8 personality-related comments of 45 comments for Mitchell (0% to 17.78%), which occurred 290,952 times in the 10 million simulations (2.9 percent of the time). The -0.1121 in line 2 represents 1 personality-related comment of 23 comments for Martin and 7 personality-related comments of 45 comments for Mitchell (4.3% to 15.6%), which occurred 1,412,204 times in the 10 million simulations (14.1 percent of the time). So the simulation indicated that Martin receiving only 1 or fewer of the 8 personality-related comments would be expected to occur about 17 percent of the time if the 8 personality-related comments were distributed randomly. But recall that the MM Table 1 asterisks for this comparison indicate a p-value less than 0.01.

MM Table 2 reports on comparisons of Rate My Professors comments for Mitchell and for Martin, with a reported sample size of N=54, which is split into sample sizes of 9 for Martin and 45 for Mitchell in the file that Dr. Mitchell sent me upon my request; the nine comments for Martin are still available at the Rate My Professors website. I conducted another simulation in R for the incompetency-related comments, in which corresponding proportions were 0 of 9 for Martin and 3 of 45 for Mitchell (0% to 6.67%).

list <- rep_len(NA,10000000)
for (i in 1:10000000){
   comments <- c(rep_len(1,3),rep_len(0,51))
   martin <- sample(comments,9,replace=FALSE)
   diff.prop <- sum(martin)/9 - (3-sum(martin))/45
   list[i] <- diff.prop
}
stack(table(list))

Here are results from the simulation:

   values                 ind
1 5716882 -0.0666666666666667
2 3595302  0.0666666666666667
3  653505                 0.2
4   34311   0.333333333333333

The -0.0667 in line 1 represents 0 incompetency-related comments of 9 comments for Martin and 3 incompetency-related comments of 45 comments for Mitchell (0% to 6.67%), which occurred 5,716,882 times in 10 million simulations (57 percent of the time). So the simulation indicated that Martin's 9 comments having zero of the 3 incompetency-related comments would be expected to occur about 57 percent of the time if the 3 incompetency-related comments were distributed randomly. The MM Table 2 asterisk for this comparison indicates a p-value less than 0.1.

I have concerns about other p-value asterisks in MM Table 1 and MM Table 2, but I will not report simulations for those comparisons here.

---

2.

MM Table 4 inferential statistics appear to be unadjusted for the lack of independence of some observations. Click here, then Search by Course > Spring 2015 > College of Arts and Sciences > Political Science > POLS 2302 (or click here). Each "Total Summary" row at the bottom has 218 evaluations; for example, the first item of "Overall the instructor(s) was (were) effective" has 43 strongly agrees, 55 agrees, 75 neutrals, 24 disagrees, and 21 strongly disagrees, which suggests that 218 students completed these evaluations. But the total Ns reported in MM Table 4 are greater than 218. For example, the "Course" line in MM Table 4 has an N of 357 for Martin and an N of 1,169 for Mitchell, which is a total N of 1,526. That 1,526 is exactly seven times 218, and the MM appendix indicates that the student evaluations had 7 "Course" items.

Using this code, I reproduced MM Table 4 t-scores closely or exactly by treating each observation as independent and conducting a t-test assuming equal variances, suggesting that MM Table 4 inferential statistics were not adjusted for the lack of independence of some observations. However, for the purpose of calculating inferential statistics, multiple ratings from the same student cannot be treated as if these were independent ratings.

The aforementioned code reports p-values for individual-item comparisons of evaluations for Mitchell and for Martin, which avoids the problem of a lack of independence for some student responses. But I'm not sure that much should be made of any differences detected or not detected between evaluations for Mitchell and evaluations for Martin, given the lack of randomization of students to instructors or any evidence that the students in Mitchell's sections were sufficiently equal before the course to the students in Martin's sections, and given the possibility that students in these sections might have already has courses or interactions with Mitchell and/or Martin and that the evaluations reflected these prior experiences.

---

3.

Corrected inferential statistics for MM Table 1 and MM Table 2 would ideally reflect consideration of whether non-integer counts of comments should be used, as MM appears to have done. Multiplying proportions in MM Table 1 and MM Table 2 by sample sizes from the MM data produces some non-integer counts of comments. For example, the 15.2% for Martin in the MM Table 1 "Referred to as 'Teacher'" row corresponds to 3.5 of 23 comments, and the 20.9% for Mitchell in the MM Table 2 "Personality" row corresponds to 9.4 of 45 comments. Based on the data that Dr. Mitchell sent me, it seems that a comment might have been discounted by the number of sentences in the comment; for example, four of the official university course evaluations comments for Martin contain the word "Teacher", but the percentage for Martin is not 4 of 23 comments (17.4%) but is instead 3.5 of 23 comments (15.2%), presumably because one of the "teacher" comments had two sentences, only one of which referred to Martin as a teacher; the other three comments that referred to Martin as a teacher did not have multiple sentences.

Corrected inferential statistics for MM Table 1 and MM Table 2 for the frequency of references to the instructors as a professor should reflect consideration of the instructors' titles and job titles. For instance, for MM Table 1, the course numbers in the MM data match course listings for the five courses that Mitchell or Martin taught face-to-face at Texas Tech University in Fall 2015 or Spring 2015 (see here):

Mitchell
POLS 3312 Game Theory [Fall 2015]
POLS 3361 International Politics: Honors [Spring 2015]
POLS 3366 International Political Economy [Spring 2015]

Martin
POLS 3371 Comparative Politics [Fall 2015]
POLS 3373 Governments of Western Europe [Spring 2015]

Online CVs indicated that Mitchell's CV listed her Texas Tech title in 2015 as Instructor and that Martin's CV listed his Texas Tech title in 2015 as Visiting Professor.

A correction could also discuss the fact that, while Mitchell is referred to as "Dr." 19 times across all MM Table 1 and MM Table 2 comments, none of these comments refer to Martin as "Dr.". Martin's CV indicated that he earned his Ph.D. in 2014, so I do not see how non-reporting of references to Mitchell and Martin as "Dr." in the official student evaluations in MM Table 1 can be attributed to some comments being made before Martin received his Ph.D. Rate My Professors comments for Martin date to November 2014; however, even if the non-reporting of references to Mitchell and Martin as "Dr." in MM Table 2 can be attributed to some comments being made before Martin received his Ph.D., any use of "Professor" for Martin must be discounted because students presumably more titles to refer to Mitchell (e.g., "Dr.", "Professor") than to refer to Martin (e.g., "Professor").

---

Other notes:

---

4.

PS: Political Science & Politics should require authors to upload data and code so that readers can more clearly assess what the authors did.

---

5.

MM Table 4 data appear to have large percentages of enrolled students who did not evaluate Mitchell or Martin. Texas Tech data for Spring 2015 courses here indicate that enrollment for Mitchell's four sections of the course used in the study was 247 (section D6), 247 (section D7), 243 (section D8), and 243 (section D9), and that enrollment for Martin's two sections of the course was 242 (section D10) and 199 students (section D11). Mitchell's evaluations had ratings for 167 students of the 980 students in her courses, for a 17.0 response rate, and Martin's evaluations had ratings for 51 students of his 441 students, for an 11.6 percent response rate. It's possible that Mitchell's nearly 50 percent higher response rate did not affect differences in mean ratings between the instructors, but the difference in response rates would have been relevant information for the article to include.

---

6.

MM state (p. 652, emphasis in the original):

"To reiterate, of the 23 questions asked, there were none in which a female instructor received a higher rating."

My calculations indicate that Mitchell received a higher rating than Martin did on 3 of the 23 MM Table 4 items: items 17, 21, and 23. Moreover, MM Table 4 indicates that the mean for Mitchell was higher than the mean for Martin across the three Technology items. I think that the "there were none" statement is intended to indicate that Mitchell did not receive a higher rating than Martin did on any of the items for which the corresponding p-value was sufficiently low, but, if that's the case, then that should be stated clearly because the statement can otherwise be misleading.

But I'm curious how MM could have reported a difference in favor of Mitchell if MM were reporting results using one-tailed statistical tests to detect a difference in favor of Martin, as I read the MM Table 4 Technology line to indicate, with a t-score of 1.93 and a p-value of 0.027.

---

7.

MM reports that the study indicated that "a male instructor administering an identical online course as a female instructor receives higher ordinal scores in teaching evaluations, even when questions are not instructor-specific" (p. 648). But that was not always true: as indicated above, MM Table 4 even indicates that the mean for Mitchell was higher than the mean for Martin across the three not-instructor-specific Technology items.

---

8.

The MM appendix (p. 4) indicated that:

Students had a tendency to enroll in the sections with the lowest number initially (merely because those sections appeared first in the registration list). This means that section 1 tended to fill up earlier than section 3 or 4. It may also be likely that students who enroll in courses early are systematically different than those who enroll later in the registration period; for example, they may be seniors, athletes, or simply motivated students. For this reason, we examined sections in the mid- to high- numerical order: sections 6, 7, 8, 9, and 10.

The last line should indicate that data were from sections 6 to 11. See the sample sizes for Martin in the Texas Tech website data: item 1 for section D10 has student evaluation sample sizes of 6, 12, 10, 1, and 3, for a total of 32; adding the sample for item 1 from section D11 (7, 5, 6, 1, 0) raises that to 51; multiplying 51 times 7 produces 357, which is the sample size for Martin in the "Course" section of MM Table 4.

---

9.

I think that Blåsjö (2018) interpreted the statement that "For this reason, we examined sections in the mid- to high- numerical order: sections 6, 7, 8, 9, and 10" as if Mitchell and Martin collected data for other sections but did not analyze these data. Blåsjö: "Actually the researchers threw away at least half of the actual data". I think that that is a misreading of the (perhaps unclear) statement quoted above from the MM appendix. From what I can tell based on the data at the Texas Tech site, data were collected for only sections 6 to 11.

---

NOTE:

Thanks to representatives from the Texas Tech IRB and the Illinois State University IRB, respectively, for providing and forwarding the link to the Texas Tech student evaluations.

Tagged with:

According to the 20 Dec 2018 Samuel Perry and Andrew Whitehead Huffington Post article "What 'Make America Great Again' And 'Merry Christmas' Have In Common":

Christian theology, identity or faithfulness have nothing to do with an insistence on saying "Merry Christmas." To be more precise, when we analyzed public polling data, we found that there was no correlation between being an evangelical Christian, believing in the biblical Nativity story, attending church, or participating in charitable giving and rejecting "Season's Greetings" for "Merry Christmas." [emphasis added]

The referenced data are from a December 2013 Public Religion Research Initiative survey. Item Q5 is the "Merry Christmas" item:

Do you think stores and businesses should greet their customers with 'Happy Holidays' or 'Seasons Greetings' instead of 'Merry Christmas' out of respect for people of different faiths, or not? (Q5)

Item Q6 is the biblical Nativity belief item:

Do you believe the story of Christmas -- that is, the Virgin birth, the angelic proclamation to the Shepherds, the Star of Bethlehem, and the Wise Men from the East -- is historically accurate, or is it a theological story to affirm faith in Jesus? (Q6)

Here is the crosstab for the "Merry Christmas" item and the Nativity item:

PRRI-1Contra the article, these variables are correlated: ignoring the don't knows and refusals, 57 percent of participants who believe that the gospel Nativity story is historically accurate preferred the "Merry Christmas" response ("No, should not"), but only 41 percent of participants who believe that the gospel Nativity story is a theological story preferred the "Merry Christmas" response.

Here is a logit regression using the gospel Nativity responses (gospel) to predict the Merry Christmas responses (merry), removing from the analysis the participants who were coded as don't know or refusal for at least one of the items:

PRRI-2The p-value for the logit regression is also p<0.001 in weighted analyses.

The gospel predictor still has a p-value under p=0.05 when including the demographic controls below in unweighted analyses and in weighted analyses:

PRRI-3The gospel predictor still has a p-value under p=0.05 when including the demographic controls and controls for GOP partisanship and self-reported ideology in unweighted analyses:

PRRI-4There are specifications in which the p-value for the gospel predictor is above p=0.05, such as in a weighted analysis including the above controls for demographics, partisanship, and ideology. But the gospel predictor not being robust to every possible specification, especially specifications that control for factors such as GOP partisanship and charitable giving that are plausibly influenced by religious belief, isn't the impression that I received from "...we found that there was no correlation between...believing in the biblical Nativity story...and rejecting 'Season's Greetings' for 'Merry Christmas'".

---

Here is another passage from the article:

What does this tell us? Ultimately, drawing lines in the sand over whether people say "Merry Christmas" over "Happy Holidays" has virtually nothing to do with Christian faithfulness or orthodoxy.  It has everything to do with the cultural and political insecurity white conservatives feel.

I didn't see anything in the reported analysis that permits the inference that "It has everything to do with the cultural and political insecurity white conservatives feel". Whites and conservatives being more likely than non-Whites and non-conservatives to prefer "Merry Christmas" doesn't require that this preference is due to "the cultural and political insecurity white conservatives feel" any more than a non-White or non-conservative preference for "Happy Holidays" and "Seasons Greetings" can be attributed without additional information to the cultural and political insecurity that non-White non-conservatives feel.

---

NOTES:

1. Code here. Data here. Data acknowledgment: PRRI Religion & Politics Tracking Poll, December 2013; Principal Investigators Robert P. Jones and Daniel Cox; Data were downloaded from the Association of Religion Data Archives, www.TheARDA.com [http://www.thearda.com/Archive/Files/Descriptions/PRRIRP1213.asp].

2. I had a Twitter discussion of the article and the data with co-author Samuel Perry, which can be accessed here.

Tagged with: ,

The Kearns et al. study "Why Do Some Terrorist Attacks Receive More Media Attention Than Others?" has been published in Justice Quarterly; the abstract indicates that "Controlling for target type, fatalities, and being arrested, attacks by Muslim perpetrators received, on average, 357% more coverage than other attacks". A prior Kearns et al. analysis was reported on in a 2017 Monkey Cage post and a paper posted at SSRN with a "last edited" date of 3/5/17 limited to "media coverage for terrorist attacks in the United States between 2011 and 2015" (p. 7 of the paper).

Data for the Kearns et al. study published in Justice Quarterly has been expanded to cover terrorist attacks from 2006 to 2015 (instead of 2011 to 2015) and now reports a model with a predictor for "Perpetrator and group unknown", with a p-value under 0.05 for the Muslim perpetrator predictor. Footnote 9 of Kearns et al. 2019 discusses selection of 2006 as the starting point:

Starting in 2006, an increasing percentage of Americans used the Internet as their main source of news [URL provided, but omitted in this quote]. Since the news sources used for this study include both print and online newspaper articles, we started our analysis in 2006. In years prior to 2006, we may see fewer articles overall since print was more common and is subject to space constraints (p. 8).

That reason to start the analysis in 2006 does not explain why the analysis in the Monkey Cage post and the 3/5/17 paper started in 2011, given that the news sources in these earlier reports of the study also included both print and online articles.

In this 3/28/17 post, I reported that the Muslim perpetrator predictor had a 0.622 p-value in my analysis predicting the number of articles of media coverage using the Kearns et al. 2011-2015 outcome variable coding, controlling for the number of persons killed in the attack and for whether the perpetrator was unknown.

Using the 2006-2015 dataset and code that Dr. Kearns sent me upon request, I ran my three-predictor model, limiting the analysis to events from 2011 to 2015:

Kearns1The above p-value for the Muslim perpetrator predictor differs from my 0.622 p-value from the prior post, although inferences are the same. There might be multiple reasons for the difference, but the 3/5/17 Kearns et al. paper reports a different number of articles for some events; for example, the Robert Dear event was coded as 204 articles in the paper and as 178 articles in the 2019 article, and the number of articles for the Syed Rizwan Farook / Tashfeen Malik event dropped from 179 to 152.

---

The inference about the Muslim perpetrator predictor is more convincing using the 2006-2015 data from Kearns et al. 2019 than from the 2011-2015 data: the 2006-2015 data produce a 2.82 Muslim perpetrator predictor t-score using my three-predictor model above and a 4.20 t-score with a three-predictor model replacing the number killed in the event with a predictor for whether someone was killed in the event.

For what it's worth, along with higher-than-residual news coverage for events with Muslim perpetrators, the Kearns et al. data indicate that, compared to other events with a known perpetrator, events with Muslim perpetrators also have higher-than-residual numbers of deaths, numbers of logged wounded, and (at least at p=0.0766) likelihood of a death:

Kearns2Kearns3Kearns4---

NOTES

1. I could not find the 3/5/17 Kearns et al. paper online now, but I have a PDF copy from SSRN (SSRN-id2928138.pdf) that the above post references.

2. Stata code for my analyses:

gen PerpUnknown=0
replace PerpUnknown=1 if eventid==200601170007
replace PerpUnknown=1 if eventid==200606300004
replace PerpUnknown=1 if eventid==200607120007
replace PerpUnknown=1 if eventid==200705090002
replace PerpUnknown=1 if eventid==200706240004
replace PerpUnknown=1 if eventid==200710200003
replace PerpUnknown=1 if eventid==200710260003
replace PerpUnknown=1 if eventid==200802170007
replace PerpUnknown=1 if eventid==200803020012
replace PerpUnknown=1 if eventid==200803060004
replace PerpUnknown=1 if eventid==200804070005
replace PerpUnknown=1 if eventid==200804220011
replace PerpUnknown=1 if eventid==200806140008
replace PerpUnknown=1 if eventid==200807250030
replace PerpUnknown=1 if eventid==200903070010
replace PerpUnknown=1 if eventid==200909040003
replace PerpUnknown=1 if eventid==201007270013
replace PerpUnknown=1 if eventid==201011160004
replace PerpUnknown=1 if eventid==201101060018
replace PerpUnknown=1 if eventid==201102220009
replace PerpUnknown=1 if eventid==201104230010
replace PerpUnknown=1 if eventid==201105060004
replace PerpUnknown=1 if eventid==201109260012
replace PerpUnknown=1 if eventid==201110120003
replace PerpUnknown=1 if eventid==201205200024
replace PerpUnknown=1 if eventid==201205230034
replace PerpUnknown=1 if eventid==201208120012
replace PerpUnknown=1 if eventid==201301170006
replace PerpUnknown=1 if eventid==201302260036
replace PerpUnknown=1 if eventid==201304160051
replace PerpUnknown=1 if eventid==201304170041
replace PerpUnknown=1 if eventid==201304180010
replace PerpUnknown=1 if eventid==201307250065
replace PerpUnknown=1 if eventid==201308220053
replace PerpUnknown=1 if eventid==201403180089
replace PerpUnknown=1 if eventid==201403250090
replace PerpUnknown=1 if eventid==201406110089
replace PerpUnknown=1 if eventid==201410030065
replace PerpUnknown=1 if eventid==201410240071
replace PerpUnknown=1 if eventid==201411040087
replace PerpUnknown=1 if eventid==201502170127
replace PerpUnknown=1 if eventid==201502230104
replace PerpUnknown=1 if eventid==201503100045
replace PerpUnknown=1 if eventid==201506220069
replace PerpUnknown=1 if eventid==201506230056
replace PerpUnknown=1 if eventid==201506240051
replace PerpUnknown=1 if eventid==201506260046
replace PerpUnknown=1 if eventid==201507150077
replace PerpUnknown=1 if eventid==201507190097
replace PerpUnknown=1 if eventid==201508010105
replace PerpUnknown=1 if eventid==201508020114
replace PerpUnknown=1 if eventid==201508190040
replace PerpUnknown=1 if eventid==201509040048
replace PerpUnknown=1 if eventid==201509300082
replace PerpUnknown=1 if eventid==201512260016
tab PerpUnknown, mi
tab PerpUnknown PerpMuslim, mi
tab PerpUnknown PerpNonMuslim, mi
tab PerpUnknown PerpGroupUnknown, mi
nbreg TOTALARTICLES PerpMuslim numkilled PerpUnknown if eventid>=201101060018
nbreg TOTALARTICLES PerpMuslim numkilled PerpUnknown
gen kill0=0
replace kill0=1 if numkilled==0
tab numkilled kill0
nbreg TOTALARTICLES PerpMuslim kill0     PerpUnknown
ttest numkilled if PerpUnknown==0, by(PerpMuslim)
ttest numkilled                  , by(PerpMuslim)
ttest logwound  if PerpUnknown==0, by(PerpMuslim)
ttest logwound                   , by(PerpMuslim)
prtest kill0    if PerpUnknown==0, by(PerpMuslim)
prtest kill0                     , by(PerpMuslim)

3. Kearns et al. 2019 used a different "unknown" perpetrator measure than I did. My PerpUnknown predictor (in the above analysis and the prior post) coded in a dichotomous variable as 1 any perpetrator listed as "Unknown" in the Kearns et al. list. Kearns et al. 2019 has a dichotomous PerpGroupUnknown variable that differentiated between perpetrators in which the group of the perpetrator was known (such as for this case with an ID of 200807250030 in the Global Terrorism Database, in which the perpetrators were identified as Neo-Nazis) and perpetrators in which the group of the perpetrator was unknown (such as for this case with an ID of 200806140008 in the Global Terrorism Database, in which the perpetrator group was not identified). Kearns et al. 2019 footnote 17 indicates that "Even when the individual perpetrator is unknown, we often know the group responsible so 'perpetrator unknown' is not a theoretically sound category on its own, though we account for these incidents in robustness checks"; however, I'm not sure why "perpetrator unknown" is not a theoretically sound category on its own for the purpose of a control when predicting media coverage: if a perpetrator's name is not known, then there might be fewer news articles because there will be no follow-up articles that delve into the background of the perpetrator in a way that could be done if the perpetrator's name were known.

Tagged with: ,

According to a 2018-06-18 "survey roundup" blog post by Karthick Ramakrishnan and Janelle Wong (with a link to the blog post tweeted by Jennifer Lee):

Regardless of the question wording, a majority of Asian American respondents express support for affirmative action, including when it is applied specifically to the context of higher education.

However, a majority of Asian American respondents did not express support for affirmative action in data from the National Asian American Survey 2016 Post-Election Survey [data here, dataset citation: Karthick Ramakrishnan, Jennifer Lee, Taeku Lee, and Janelle Wong. National Asian American Survey (NAAS) 2016 Post-Election Survey. Riverside, CA: National Asian American Survey. 2018-03-03.]

Tables below contain item text from the questionnaire. My analysis sample was limited to participants coded 1 for "Asian American" in the dataset's race variable. The three numeric columns in the tables for each item are respectively for: [1] data that are unweighted; [2] data with the nweightnativity weight applied, described in the dataset as "weighted by race/ethnicity and state, nativity, gender, education (raking method"; and [3] data with the pidadjweight weight applied, described in the dataset as "adjusted for partyID variation by ethnicity in re-interview cooperation rate for". See slides 4 and 14 here for more details on the study methodology.

The table below reports on results for items about opinions of particular racial preferences in hiring and promotion. A majority of Asian American respondents did not support these race-based affirmative action policies:

NAAS-Post3

The next table reports on results for items about opinions of particular uses of race in university admissions decisions. A majority of Asian American respondents did not support these race-based affirmative action policies:

NAAS-Post4

I'm not sure why these post-election data were not included in the 2018-06-18 blog post survey roundup or mentioned in this set of slides. I'm also not sure why the manipulations for the university admissions decisions items include only treatments in which the text suggests that Asian American applicants are advantaged by consideration of race instead of or in addition to including treatments in which the text suggests that Asian American applicants are disadvantaged by consideration of race, which would have been perhaps as or more plausible.

---

Notes:

1. Code to reproduce my analyses is here. Including Pacific Islanders and restricting the Asian American sample to U.S. citizens did not produce majority support for any affirmative action item reported on above or for the sex-based affirmative action item (Q7.2).

2. The survey had a sex-based affirmative action item (Q7.2) and had items about whether the participant, a close relative of the participant, or a close personal friend of the participant was advantaged or was disadvantaged by affirmative action (Q7.8 to Q7.11). For the Asian American sample, support for preferential hiring and promotion of women in Q7.2 was at 46% unweighted and at 44% when either weighting variable was applied.

3. This NAAS webpage indicates a 2017-12-05 date for the pre-election survey dataset, and on 2017-12-06 the @naasurvey account tweeted a blurb about these data being available for download. However, that same NAAS webpage lists a 2018-03-03 date for the post-election survey dataset, but I did not see an @naasurvey tweet for that release, and that NAAS webpage did not have a link to the post-election data at least as late as 2018-08-16. I tweeted a question about the availability of the post-election data on 2018-08-31 and then sent in an email and later found the data available at the webpage. I think that this might be the NSF grant for the post-election survey, which indicated that the data were to be publicly released through ICPSR in June 2017.

Tagged with: ,

[Please see the March 13, 2019 update below]

Studies have indicated that there are more liberals than conservatives in the social sciences (e.g., Rothman et al. 2005, Gross and Simmons 2007). If social scientists on average are more likely to cite publications that support rather than undercut their assumptions about the world and/or are more likely to cite publications that support rather than undercut their policy preferences, then it is reasonable to expect that, all else equal, publications reporting findings that support liberal assumptions or policy preferences will receive a higher number of citations than publications reporting findings that undercut liberal assumptions or policy preferences.

---

Here is a sort-of natural experiment to assess this potential ideological citation bias. From an April 2015 Scott Alexander post at Slate Star Codex (paragraph breaks omitted):

Williams and Ceci just released National Hiring Experiments Reveal 2:1 Faculty Preference For Women On STEM Tenure Track, showing a strong bias in favor of women in STEM hiring...Two years ago Moss-Racusin et al released Science Faculty's Subtle Gender Biases Favor Male Students, showing a strong bias in favor of men in STEM hiring. The methodology was almost identical to this current study, but it returned the opposite result. Now everyone gets to cite whichever study accords with their pre-existing beliefs.

It has been more than three years since that Slate Star Codex post, so let's compare the number of citations received by the article with the finding that supports liberal assumptions or policy preferences (Moss-Racusin et al. 2012) to the number of citations received by the article with the finding that undercuts liberal assumptions or policy preferences (Williams and Ceci 2015). Both articles were published in the same journal, and both articles have a mixed-sex authorship team with a woman as the first author, and both of these factors help eliminate a few alternate explanations for any difference in citation counts to the articles.

Based on Web of Science data collected August 24, 2018, Moss-Racusin et al. 2012 has been cited these numbers of times in the given year, with the number of years from the article's publication year in square brackets:

  • 5 in 2012 [0]
  • 39 in 2013 [1]
  • 74 in 2014 [2]
  • 109 in 2015 [3]
  • 111 in 2016 [4]
  • 131 in 2017 [5]
  • 105 in 2018 to date [6]

Based on Web of Science data collected August 24, 2018, Williams and Ceci 2015 has been cited these numbers of times in the given year, with the number of years from the article's publication year in square brackets:

  • 4 in 2015 [0]
  • 21 in 2016 [1]
  • 27 in 2017 [2]
  • 15 in 2018 to date [3]

So, in the second year from the article's publication year, Williams and Ceci 2015 was cited 27 times, and Moss-Racusin et al. 2012 was cited 74 times. Over the first three years, Williams and Ceci 2015 was cited 52 times, and Moss-Racusin et al. 2012 was cited 118 times.

---

The potential citation bias against research findings that undercut liberal assumptions or policy preferences might be something that tenure-and-promotion committees should be aware of. Such a citation bias would also be relevant for assessing the status of the journal that research is published in and whether research is even published. Suppose that a journal editor were given a choice of publishing either Moss-Racusin et al. 2012 or Williams and Ceci 2015. Based on the above data, an editor publishing Williams and Ceci 2015 instead of Moss-Racusin et al. 2012 would, three years in, be forfeiting roughly 66 citations to an article in their journal (118 minus 52). Editors who prefer higher impact factors for their journal might therefore prefer to publish a manuscript with research findings that support liberal assumptions or policy preferences, compared to an equivalent manuscript with research findings that undercut liberal assumptions or policy preferences.

---

NOTES

1. Williams and Ceci 2015 was first published online or in print earlier in the year (April 8, 2015) than Moss-Racusin et al. 2012 (Sept 17, 2012), so this earlier publication date in the publication year for Williams and Ceci 2015 should bias upward citations in the publication year or in a given year from the publication year for Williams and Ceci 2015 relative to Moss-Racusin et al. 2012, given that Williams and Ceci 2015 had more time in the publication year to be cited.

2. There might be non-ideological reasons for Moss-Racusin et al. 2012 to be enjoying a 2:1 citation advantage over Williams and Ceci 2015, so comments are open for ideas about any such reasons and for other ideas on this topic. The articles have variation in the number of authors—2 for Williams and Ceci 2015, and 5 for Moss-Racusin et al. 2012—but that seems unlikely to me to be responsible for the entire citation difference.

3. Some of my publications might be considered to fall into the category of research findings that undercut liberal assumptions or policy preferences.

---

UPDATE (Nov 30, 2018)

Here is another potential article pair:

The 1996 study about items measuring sexism against women was published earlier and in a higher-ranked journal than the 1999 study about items measuring sexism against men, but there is to date an excess of 1,238 citations for the 1996 study, which I suspect cannot be completely assigned to the extra three years in circulation and the journal ranking.

---

UPDATE (Mar 13, 2019)

Lee Jussim noted that Moss-Racusin et al. (2012) has been cited much more often than Williams and Ceci (2015) has been (and note the differences in inferences between articles), before I did. Lee's tweet below is from May 28, 2018:

https://twitter.com/PsychRabble/status/1001250104676929542

Tagged with: