Page 20

Assessing the claim of disproportionate media coverage of Muslim terror attacks

By L.J Zigerell Posted on March 28, 2017 Posted in Race No Comments Tagged with inequality, race, reproductions, you're doing it wrong

I recently blogged about the Betus, Lemieux, and Kearns Monkey Cage post (based on this Kearns et al. working paper) that claimed that "U.S. media outlets disproportionately emphasize the smaller number of terrorist attacks by Muslims".

I asked Kearns and Lemieux to share their data (I could not find an email for Betus). My request was denied until the paper was published. I tweeted a few questions to the coauthors about their data, but these tweets have not yet received a reply. Later, I realized that it would be possible to recreate or at least approximate their dataset because Kearns et al. included their outcome variable coding in the appendix of their working paper. I built a dataset based on [A] their outcome variable, [B] the Global Terrorism Database that they used, and [C] my coding of whether a given perpetrator was Muslim.

My analysis indicated that these data do not appear to support the claim of disproportionate media coverage of terror attacks by Muslims. In models with no control variables, terror attacks by Muslim perpetrators were estimated to receive 5.0 times as much media coverage as other terror attacks (p=0.008), but, controlling for the number of fatalities, this effect size drops to 1.53 times as much media coverage (p=0.480), which further drops to 1.30 times as much media coverage (p=0.622) after adding a control for attacks by unknown perpetrators, so that terror attacks by Muslim perpetrators are compared to terror attacks by known perpetrators who are not Muslim. See the Stata output below, in which "noa" is the number of articles and coefficients represent incident rate ratios:

My code contains descriptions of corrections and coding decisions that I made. Data from the Global Terrorism Database is not permitted to be posted online without permission, so the code is the only information about the dataset that I am posting for now. However, the code describes how you can build your own dataset with Stata.

Below is the message that I sent to Kearns and Lemieux on March 17. Question 2 refers to the possibility that the Kearns et al. outcome variable includes news articles published before the identities of the Boston Marathon bombers were known; that lack of knowledge of who the perpetrators were makes it difficult to assign that early media coverage to the Muslim identity of the perpetrators. Question 3 refers to the fact that the coefficient on the Muslim perpetrator predictor is larger as the number of fatalities in that attack is smaller; the Global Terrorism Database lists four rows of data for the Tsarnaev case, the first of which has only one fatality, so I wanted to check to make sure that there is no error about this in the Kearns et al. data.

Hi Erin,

I created a dataset from the Global Terrorism Database and the data in the appendix of your SSRN paper. I messaged the Monkey Cage about writing a response to your post, and I received the suggestion to communicate with you about the planned response post.

For now, I have three requests:

Can you report the number of articles in your dataset for Bobby Joe Rogers [id 201201010020] and Ray Lazier Lengend? The appendix of your paper has perpetrator Ray Lazier Lengend associated with the id for Bobby Joe Rogers.

Can you report the earliest published date and the latest published date among the 474 articles in your dataset for the Tsarnaev case?

Can you report the number killed in your dataset for the Tsarnaev case?

I have attached a do file that can be used to construct my dataset and run my analyses in Stata. Let me know if you have any questions, see any errors, or have any suggestions.

Thanks,

L.J

I have not yet received a reply to this message.

I pitched a response post to the Monkey Cage regarding my analysis, but the pitch was not accepted, at least while the Kearns et al. paper is unpublished.

---

NOTES:

[1] Data from the The Global Terrorism Database have this citation: National Consortium for the Study of Terrorism and Responses to Terrorism (START). (2016). Global Terrorism Database [Data file]. Retrieved from https://www.start.umd.edu/gtd.

[2] The method for eliminating news articles in the Kearns et al. working paper included this choice:

"We removed the following types of articles most frequently: lists of every attack of a given type, political or policy-focused articles where the attack or perpetrators were an anecdote to a larger debate, such as abortion or gun control, and discussion of vigils held in other locations."

It is worth assessing the degree to which this choice disproportionately reduces the count of articles for the Dylann Roof terror attack, which served as a background for many news articles about the display of the Confederate flag. It's not entirely clear why these types of articles should not be considered when assessing whether terror attacks by Muslims receive disproportionate media coverage.

[3] Controlling for attacks by unknown perpetrators, controlling for fatalities, and removing the Tsarnaev case drops the point estimate for the incident rate ratio to 0.89 (p=0.823).

Title IX and the Visions in Methodology Conference [UPDATED]

By L.J Zigerell Posted on March 24, 2017 Posted in Sex No Comments Tagged with inequality, sex

According to its website, Visions in Methodology "is designed to address the broad goal of supporting women who study political methodology" and "serves to connect women in a field where they are under-represented." The Call for Proposals for the 2017 VIM conference indicates that submissions were restricted to women:

We invite submissions from female graduate students and faculty that address questions of measurement, causal inference, the application of advanced statistical methods to substantive research questions, as well as the use of experimental approaches (including incentivized experiments)...Please consider applying, or send this along to women you believe may benefit from participating in VIM!

Here is the program for the 2016 VIM conference, which lists activities restricted to women, lists conference participants (which appear to be only women), and has a photo that appears to be from the conference (which appears to have only women in the photo).

The 2017 VIM conference webpage indicates that the conference is sponsored by several sources such as the National Science Foundation and the Stony Brook University Graduate School. But page 118 of the NSF's Proposal & Award Policies & Procedures Guide (PAPPG) of January 2017 states:

Subject to certain exceptions regarding admission policies at certain religious and military organizations, Title IX of the Education Amendments of 1972 (20 USC §§ 1681-1686) prohibits the exclusion of persons on the basis of sex from any education program or activity receiving Federal financial assistance. All NSF grantees must comply with Title IX.

The VIM conference appears to be an education program or activity receiving Federal financial assistance and, as such, submissions and conference participation should not be restricted by sex.

---

NOTES:

1. This Title IX Legal Manual discusses what constitutes an education program or activity:

While Title IXs antidiscrimination protections, unlike Title VIs, are limited in coverage to "education" programs or activities, the determination as to what constitutes an "education program" must be made as broadly as possible in order to effectuate the purposes of both Title IX and the CRRA. Both of these statutes were designed to eradicate sex-based discrimination in education programs operated by recipients of federal financial assistance, and all determinations as to the scope of coverage under these statutes must be made in a manner consistent with this important congressional mandate.

2. I think that the relevant NSF award is SES 1324159, which states that part of the project will "continue a series of small meetings for women methodologists that deliberately mix senior leaders in the subfield with young, emerging scholars who can benefit substantially from such close personal interaction." This page indicates that the 2014 VIM conference received support from NSF grant SES 1120976.

---

UPDATE [June 20, 2019]

I learned from a National Science Foundation representative of a statute (42 U.S. Code § 1885a) that permits the National Science Foundation to fund women-only activities listed in the statute. However, the Visions in Methodology conference has been funded by host organizations such as Stony Brook University, and I have not yet uncovered any reason why host institutional covered by Title IX would not be in violation of Title IX in funding single-sex educational opportunities.

Media bias against Muslims in terrorism coverage

By L.J Zigerell Posted on March 16, 2017 Posted in Race No Comments Tagged with inequality, media bias, you're doing it wrong

The Monkey Cage published a post that claimed that "U.S. media outlets disproportionately emphasize the smaller number of terrorist attacks by Muslims". Such an inference depends on the control variables making all else equal, but the working paper on which the inference was based had few controls and few alternate specifications. The models controlled for fatalities but the Global Terrorism Database used for the key reference also lists the number of persons injured, and a measure of total casualties might be a better control than only fatalaties. For example, the Boston Marathon bombing is listed as having 1 fatality and 132 injured, but the models in the working paper would estimate the media coverage to be the same as if the bombing had had 1 fatality and 0 injured.

Moreover, as noted in the comments to the post, the Boston Marathon bombing is an outlier in terms of the outcome variable (20 percent of articles were devoted to that single event). But the working paper reported no model that omitted this outlier from the analysis, so it is not clear to what extent the estimates and inferences reflect a "Muslim perpetrator" effect or a "Boston Marathon bombing" effect. And, as also noted in the comments, proper controls would reflect the difference in expected media coverage for terrorist attacks in which the perpetrator was killed at the scene versus terrorist attacks in which there was a manhunt for the perpetrator.

Finally, from what I can tell based on the post and the working paper, the number of articles for the Boston Marathon bombing might include articles published before it was known or credibly suspected that the perpetrators were Muslim. If so, then the article count for the Boston Marathon bombing might be inflated because media coverage of the bombing before the religion of the perpetrators was known or credibly suspected cannot be attributed to the religion of the perpetrators.

My request for the data and code used for the post was declined, but hopefully I'll remember to check for the data and code after the working paper is published. In the meantime, I asked the authors on Twitter about inclusion of articles before the suspects were known and about results when the Boston Marathon bombing is excluded from the analysis.

From Twitter: Culture and the Chinese in Spain

By L.J Zigerell Posted on February 20, 2017 Posted in Race No Comments Tagged with race, twitter response

This post is a response to a question tweeted here.

---

I was responding only to the idea that poor educational outcomes for the Chinese in Spain would disprove culture as an influence on educational outcomes. Before concluding anything from the Chinese-in-Spain example about the influence of culture on educational outcomes, we'd need to estimate the level of educational outcomes that would be expected of the Chinese in Spain in the absence of cultural influence and then compare that estimate to observed educational outcomes.

So what level of educational outcomes should be expected of the Chinese in Spain? The 2014 Financial Times article "China's migrants thrive in Spain's financial crisis" reported an estimate that 70 or 80 percent of the Chinese in Spain are from Qingtian, "an impoverished rural county". Nonetheless, the FT article suggests that the Chinese in Spain are doing relatively well in employment and business, citing low unemployment and overrepresentation in business startups. Maybe culture has something to do with these things, and maybe culture and success in employment and business will translate into better future educational outcomes. Or maybe culture has no effect on these things.

Predicting sex differences in white supremacist beliefs

By L.J Zigerell Posted on February 5, 2017 Posted in Race No Comments Tagged with race

Nathaniel Bechhofer ‏linked to a tweeted question from Elizabeth Plank about whether white supremacists are more likely to be men. Ideally, for measuring white supremacist beliefs, we would define "white supremacist", develop items to measure white supremacist beliefs or actions, and then conduct a new study, but, for now, let's examine some beliefs that might provide a sense of what we'd find from an ideal survey.

ANES

I was working with the ANES Time Series Cumulative Data file last night, so I'll start there, with a measure of white ethnocentrism, coded 1 for respondents who rated whites higher than blacks, Hispanics, and Asians on feeling thermometers, and 0 for respondents who provided substantive responses to the four racial group feeling thermometers and were not coded 1. Data were available for surveys in 1992, 2000, 2002, 2004, 2008, and 2012. This is not a good measure of white supremacist beliefs, either in terms of face validity or considering the fact that 27 percent of white respondents (N=2,345 of 8,586) were coded 1. Nonetheless, in weighted analyses, 27.5 percent of white men and 28.9 percent of white women were coded 1, with a p-value for the difference of p=0.198.

I then coded a new measure as 1 for respondents who rated whites above 50 and who rated blacks, Hispanics, and Asians below 50 on the four racial group feeling thermometers, and as 0 for respondents who provided substantive responses to the four racial group feeling thermometers and were not coded 1. Data were available for surveys in 1992, 2000, 2002, 2004, 2008, and 2012. Only 1.6 percent of white respondents (N=134 of 8,586) were coded 1, and, as before, weighted analyses did not detect a sex difference: 1.9 percent of white men and 1.6 percent of white women were coded 1, with a p-value for the difference of p=0.429.

GSS

The General Social Survey 1972-2014 file contained an item measuring agreement that "On the average [Negroes/Blacks/African-Americans] have worse jobs, income, and housing than white people....Because most [Negroes/Blacks/African-Americans] have less in-born ability to learn". Data were available for surveys in 1977, 1985, 1986, 1988, 1989, 1990, 1991, 1993, 1994, 1996, 1998, 2000, 2002, 2004, 2006, 2008, 2010, 2012, and 2014. There was a detected sex difference in weighted analyses, with 13.5 percent of white men and 12.0 percent of white women agreeing with the statement (p=0.002, N=21,911).

The next measure was coded 1 for respondents who favored a close relative marrying a white person and opposed a close relative marrying a black person, a Hispanic American person, and an Asian American person, and coded 0 for white respondents with other responses, including non-substantive responses. Data were available for surveys in 2000, 2004, 2006, 2008, 2010, 2012, and 2014. There was a detected sex difference in weighted analyses, with 13.1 percent of white men and 10.4 percent of white women coded 1 (p=0.001, N=7,604).

The next measure was coded 1 for respondents coded 1 for the aforementioned marriage item and who selected 9 on a 1-to-9 scale for how close they felt to whites. Data were available for surveys in 2000, 2004, 2006, 2008, 2010, 2012, and 2014. There was no detected sex difference in weighted analyses, with 5.5 percent of white men and 4.7 percent of white women coded 1 (p=0.344, N=3,952).

In the 1972 GSS, nonblack respondents were asked: "Do you think Negroes should have as good a chance as white people to get any kind of job, or do you think white people should have the first chance at any kind of job?". Of 1,330 white respondents, 20 of 670 (3.0 percent of) white men and 23 of 660 (3.5 percent of) white women reported that white people should have first chance at any kind of job (p=0.607 in an unweighted analysis).

The next measure was based on the item asking: "If you and your friends belonged to a social club that would not let [Negroes/Blacks] join, would you try to change the rules so that [Negroes/Blacks/African-Americans] could join?" (sic for the lack of "African-Americans" in the first set of brackets). Respondents were coded 1 for reporting that they would not try to change the rules. Data were available for surveys in 1977, 1985, 1986, 1988, 1989, 1990, 1991, 1993, and 1994. There was a detected sex difference in weighted analyses, with 45.5 percent of white men and 37.8 percent of white women coded 1 (p<0.001, N=7,924).

In the 2000 GSS, respondents were given this task:

Now I'd like you to imagine a neighborhood that had an ethnic and racial mix you personally would feel most comfortable in. Here is a blank neighborhood card, which depicts some houses that surround your own. Using the letters A for Asian, B for Black, H for Hispanic or Latin American and W for White, please put a letter in each of these houses to represent your preferred neighborhood where you would most like to live. Please be sure to fill in all of the houses.

Respondents were coded 1 if the respondent marked "white" for all the houses and coded 0 otherwise, with 0 including responses of doesn't matter, no neighbors, mixed race, or non-substantive responses. There was a nontrivial sex difference in weighted point estimates, with 16.9 percent of white men and 13.5 percent of white women coded 1, but the p-value was p=0.110 (N=1,108).

---

The 1972 GSS "white people should have the first chance at any kind of job" item seems like the best measure of white supremacist beliefs among the measures above, but agreement with that belief was low enough that there was not much power to detect a sex difference.

Based on the other data above and absent other data, it appears reasonable to expect at least a slight over-representation of men among whites with white supremacist beliefs, to the extent that white supremacist beliefs positively correlate with the patterns above. Research (1, 2) has found men to score higher than women on social dominance orientation scales, so the magnitude of expected sex differences in white supremacist beliefs among whites should depend on the degree to which white supremacist beliefs are defined to include a preference for political or social dominance.

---

NOTES:

Datasets were anes_timeseries_cdf_stata12.dta and GSS7214_R1.DTA. Code here.

Funnel plot for "Racial Bias in Mock Juror Decision-Making"

By L.J Zigerell Posted on September 29, 2016 Posted in Race No Comments Tagged with file drawer problem, race, selective reporting

This post reports on publication bias analyses for the Tara L. Mitchell et al. 2005 meta-analysis: "Racial Bias in Mock Juror Decision-Making: A Meta-Analytic Review of Defendant Treatment" [gated, ungated]. The appendices for the article contained a list of sample sizes and effect sizes, but the list did not match the reported results in at least one case. Dr. Mitchell emailed me a file of the correct data (here).

VERDICTS

Here is the funnel plot for the Mitchell et al. 2005 meta-analysis of verdicts:

Egger's test did not indicate at the conventional level of statistical significance the presence of funnel plot asymmetry in any of the four funnel plots, with p-values of p=0.80 (white participants, published studies), p=0.82 (white participants, all studies), p=0.10 (black participants, published studies), and p=0.63 (black participants, all studies).

Trim-and-fill with the L0 estimator imputed missing studies for all four funnel plots to the side of the funnel plot indicating same-race favoritism:

Trim-and-fill with the R0 estimator imputed missing studies for only the funnel plots for published studies with black participants:

---

SENTENCES

Here is the funnel plot for the Mitchell et al. 2005 meta-analysis of sentences:

Egger's test did not indicate at the conventional level of statistical significance the presence of funnel plot asymmetry in any of the four funnel plots, with p-values of p=0.14 (white participants, published studies), p=0.41 (white participants, all studies), p=0.50 (black participants, published studies), and p=0.53 (black participants, all studies).

Trim-and-fill with the L0 estimator imputed missing studies for the funnel plots with white participants to the side of the funnel plot indicating same-race favoritism:

Trim-and-fill with the R0 estimator did not impute any missing studies:

---

I also attempted to retrieve and plot data for the Ojmarrh Mitchell 2005 meta-analysis ("A Meta-Analysis of Race and Sentencing Research: Explaining the Inconsistencies"), but the data were reportedly lost in a computer crash.

---

NOTES:

1. Data and code for the Mitchell et al. 2005 analyses are here: data file for verdicts, data file for sentences, R code for verdicts, and R code for sentences.

Improving journal articles via peer review requests

By L.J Zigerell Posted on September 8, 2016 Posted in Methods No Comments Tagged with list experiment, methods, race, selective reporting, sex, you're doing it wrong

Researchers often have the flexibility to report only the results they want to report, so an important role for peer reviewers is to request that researchers report results that a reasonable skeptical reader might suspect have been strategically unreported. I'll discuss two publications where obvious peer review requests do not appear to have been made and, presuming these requests were not made, how requests might have helped readers better assess evidence in the publication.

---

Example 1. Ahlquist et al. 2014 "Alien Abduction and Voter Impersonation in the 2012 U.S. General Election: Evidence from a Survey List Experiment"

Ahlquist et al. 2014 reports on two list experiments: one list experiment is from December 2012 and has 1,000 cases, and another list experiment is from September 2013 and has 3,000 cases.

Figure 1 of Ahlquist et al. 2014 reports results for the 1,000-person list experiment estimating the prevalence of voter impersonation in the 2012 U.S. general election; the 95% confidence intervals for the full sample and for each reported subgroup cross zero. Figure 2 reports results for the full sample of the 3,000-person list experiment estimating the prevalence of voter impersonation in the 2012 U.S. general election, but Figure 2 did not include subgroup results. Readers are thus left to wonder why subgroup results were not reported for the larger sample that had more power to detect an effect among subgroups.

Moreover, the main voting irregularity list experiment reported in Ahlquist et al. 2014 concerned voter impersonation, but, in footnote 15, Ahlquist et al. discuss another voting irregularity list experiment that was part of the study, about whether political candidates or activists offered the participant money or a gift for their vote:

The other list experiment focused on vote buying and closely mimicked that described in Gonzalez-Ocantos et al. (2012). Although we did not anticipate discovering much vote buying in the USA we included this question as a check, since a similar question successfully discovered voting irregularities in Nicaragua. As expected we found no evidence of vote buying in the USA. We omit details here for space considerations, though results are available from the authors and in the online replication materials...

The phrasing of the footnote is not clear whether the inference of "no evidence of vote buying in the USA" is restricted to an analysis of the full sample or also covers analyses of subgroups.

So the article leaves at least two questions unanswered for a skeptical reader:

Why report subgroup analyses for only the smaller sample?
Why not report the overall estimate and subgroup analyses for the vote buying list experiment?

Sure, for question 2, Ahlquist et al. indicate that the details of the vote buying list experiment were omitted for "space considerations"; however, the 16-page Ahlquist et al. 2014 article is shorter than the other two articles in the journal issue, which are 17 pages and 24 pages.

Peer reviewer requests that could have helped readers were to request a detailed report on the vote buying list experiment and to request a report of subgroup analyses for the 3,000-person sample.

---

Example 2. Sen 2014 "How Judicial Qualification Ratings May Disadvantage Minority and Female Candidates"

Sen 2014 reports logit regression results in Table 3 for four models predicting the ABA rating given to U.S. District Court nominees from 1962 to 2002, with ratings dichotomized into (1) well qualified or exceptionally well qualified and (2) not qualified or qualified.

Model 1 includes a set of variables such as the nominee's sex, race, partisanship, and professional experience (e.g., law clerk, state judge). Compared to model 1, model 2 omits the partisanship variable and adds year dummies. Compared to model 2, model 3 adds district dummies and interaction terms for female*African American and female*Hispanic. And compared to model 3, model 4 removes the year dummies and adds a variable for years of practice and a variable for the nominee's estimated ideology.

The first question raised by the table is the omission of the partisanship variable for models 2, 3, and 4, with no indication of the reason for that omission. The partisanship variable is not statistically significant in model 1, and Sen 2014 notes that the partisanship variable "is never statistically significant under any model specification" (p. 44), but it is not clear why the partisanship variable is dropped in the other models because other variables appear in all four models and never reach statistical significance.

The second question raised by the table is why years of practice appears in only the fourth model, in which roughly one-third of cases are lost due to the inclusion of estimated nominee ideology. Sen 2014 Table 2 indicates that male and white nominees had substantially more years of practice than female and black nominees: men (16.87 years), women (11.02 years), whites (16.76 years), and blacks (10.08 years); therefore, any model assessing whether ABA ratings are biased should account for sex and race differences in years of practice, under the reasonable expectation that nominees should receive higher ratings for more experience.

Peer reviewer requests that could have helped readers were to request a discussion of the absence of the partisanship variable from models 2, 3, and 4, and to request that years of experience be included in more of the models.

---

Does it matter?

Data for Ahlquist et al. 2014 are posted here. I reported on my analysis of the data in a manuscript rejected after peer review by the journal that published Ahlquist et al. 2014.

My analysis indicated that the weighted list experiment estimate of vote buying for the 3,000-person sample was 5 percent (p=0.387), with a 95% confidence interval of [-7%, 18%]. I'll echo my earlier criticism and note that a 25-percentage-point-wide confidence interval is not informative about the prevalence of voting irregularities in the United States because all plausible estimates of U.S. voting irregularities fall within 12.5 percentage points of zero.

Ahlquist et al. 2014 footnote 14 suggests that imputed data on participant voter registration were available, so a peer reviewer could have requested reporting of the vote buying list experiments restricted to registered voters, given that only registered voters have a vote to trade. I did not see a variable for registration in the dataset for the 1,000-person sample, but the list experiment for the 3,000-person sample produced the weighted point estimate that 12 percent of persons listed as registered to vote were contacted by political candidates or activists around the 2012 U.S. general election with an offer to exchange money or gifts for a vote (p=0.018).

I don't believe that this estimate is close to correct, and, given sufficient subgroup analyses, some subgroup analyses would be expected to produce implausible or impossible results, but peer reviewers requesting these data might have produced a more tentative interpretation of the list experiments.

---

For Sen 2014, my analysis indicated that the estimates and standard errors for the partisanship variable (coded 1 for nomination by a Republican president) inflate unusually high when that variable is included in models 2, 3, and 4: the coefficient and standard error for the partisanship variable are 0.02 and 0.11 in model 1, but inflate to 15.87 and 535.41 in model 2, 17.90 and 1,455.40 in model 3, and 18.21 and 2,399.54 in model 4.

The Sen 2014 dataset had variables named Bench.Years, Trial.Years, and Private.Practice.Years. The years of experience for these variables overlap (e.g., nominee James Gilstrap was born in 1957 and respectively has 13, 30, and 30 years for these variables); therefore, the variables cannot be summed to construct a variable for total years of legal experience that does not include double- or triple-counting for some cases. Bench.Years correlates with Trial.Years at -0.47 and with Private.Practice.Years at -0.39, but Trial.Years and Private.Practice.Years correlate at 0.93, so I'll include only Bench.Years and Trial.Years, given that Trial.Years appears more relevant for judicial ratings than Private.Practice.Years.

My analysis indicated that women and blacks had a higher Bench.Years average than men and whites: men (4.05 years), women (5.02 years), whites (4.02 years), and blacks (5.88 years). Restricting the analysis to nominees with nonmissing nonzero Bench.Years, men had slightly more experience than women (9.19 years to 8.36 years) and blacks had slightly more experience than whites (9.33 years to 9.13 years).

Adding Bench.Years and Trial.Years to the four Table 3 models did not produce any meaningful difference in results for the African American, Hispanic, and Female variables, but the p-value for the Hispanic main effect fell to 0.065 in model 4 with Bench.Years added.

---

I estimated a simplified model with the following variables predicting the dichotomous ABA rating variable for each nominee with available data: African American nominee, Hispanic nominee, female nominee, Republican nominee, nominee age, law clerk experience, law school tier (from 1 to 6), Bench0 and Trial0 (no bench or trial experience respectively), Bench.Years, and Trial.Years. These variables reflect demographics, nominee quality, and nominee experience, with a presumed penalty for nominees who lack bench and/or trial experience. Results are below:

The female coefficient was not statistically significant in the above model (p=0.789), but the coefficient was much closer to statistical significance when adding a control for the year of the nomination:

District.Court.Nomination.Year was positively related to the dichotomous ABA rating variable (r=0.16) and to the female variable (r=0.29), and the ABA rating increased faster over time for women than for men (but not at a statistically-significant level: p=0.167), so I estimated a model that interacted District.Court.Nomination.Year with Female and with the race/ethnicity variables:

The model above provides some evidence for an over-time reduction of the sex gap (p=0.095) and the black/white gap (0.099).

The next model is the second model reported above, but with estimated nominee ideology added, coded with higher values indicating higher levels of conservatism:

So there is at least one reasonable model specification that produces evidence of bias against conservative nominees, at least to the extent that the models provide evidence of bias. After all, ABA ratings are based on three criteria—integrity, professional competence, and judicial temperament—but the models include information for only professional competence, so a sex, race, and ideological gap in the models could indicate bias and/or could indicate a sex, race, and ideological gap in nonbiased ABA evaluations of integrity and/or judicial temperament and/or elements of professional competence that are not reflected in the model measures. Sen addressed the possibility of gaps in these other criteria, starting on page 47 of the article.

For what it's worth, evidence of the bias against conservatives is stronger when excluding the partisanship control:

---

The above models for the Sen reanalysis should be interpreted to reflect the fact that there are many reasonable models that could be reported. My assessment from the models that I estimated is that the black/white gap is extremely if not completely robust, the Hispanic/white gap is less robust but still very robust, the female/male gap is less robust but still somewhat robust, and the ideology gap is the least robust of the group.

I'd have liked for the peer reviewers on Sen 2014 to have requested results for the peer reviewers' preferred model, with requested models based only on available data and results reported in at least an online supplement. This would provide reasonable robustness checks for an analysis for which there are many reasonable model specifications. Maybe that happened: the appendix table in the working paper version of Sen 2014 is somewhat different than the published logit regression table. In any event, indicating which models were suggested by peer reviewers might help reduce skepticism about the robustness of reported models, to the extent that models suggested by a peer reviewer have not been volunteered by the researchers.

---

NOTES FOR AHLQUIST ET AL. 2014:

1. Subgroup analyses might have been reported for only the smaller 1,000-person sample because the smaller sample was collected first. However, that does not mean that the earlier sample should be the only sample for which subgroup analyses are reported.

2. Non-disaggregated results for the 3,000-person vote buying list experiment and disaggregated results for the 1,000-person vote buying list experiment were reported in a prior version of Ahlquist et al. 2014, which Dr. Ahlquist sent me. However, a reader of Ahlquist et al. 2014 might not be aware of these results, so Ahlquist et al. 2014 might have been improved by including these results.

---

NOTES FOR SEN 2014:

1. Ideally, models would include a control for twelve years of experience, given that the ABA Standing Committee on the Federal Judiciary "...believes that a prospective nominee to the federal bench ordinarily should have at least twelve years' experience in the practice of law" (p. 3, here). Sen 2014 reports results for a matching analysis that reflects the 12 years threshold, at least for the Trial.Years variable, but I'm less confident in matching results, given the loss of cases (e.g., from 304 women in Table 1 to 65 women in Table 4) and the loss of information (e.g., cases appear to be matched so that nominees with anywhere from 0 to 12 years on Trial.Years are matched on Trial.Years).

2. I contacted the ABA and sent at least one email to the ABA liaison for the ABA committee that handles ratings for federal judicial nominations, asking whether data could be made available for nominee integrity and judicial temperament, such as a dichotomous indication whether an interviewee had raised concerns about the nominee's integrity or judicial temperament. The ABA Standing Committee on the Federal Judiciary prepares a written statement (e.g., here) that describes such concerns for nominees rated as not qualified, if the ABA committee is asked to testify at a Senate Judiciary Committee hearing for the nominee (see p. 8 here). I have not yet received a reply to my inquiries.

---

GENERAL NOTES

1. Data for Ahlquist et al. 2014 are here. Code for my additional analyses is here.

2. Dr. Sen sent me data and R code, but the Sen 2014 data and code do not appear to be online now. Maya Sen's Dataverse is available here. R code for the supplemental Sen models described above is here.