Methods – Page 2

Improving journal articles via peer review requests

By L.J Zigerell Posted on September 8, 2016 Posted in Methods No Comments Tagged with list experiment, methods, race, selective reporting, sex, you're doing it wrong

Researchers often have the flexibility to report only the results they want to report, so an important role for peer reviewers is to request that researchers report results that a reasonable skeptical reader might suspect have been strategically unreported. I'll discuss two publications where obvious peer review requests do not appear to have been made and, presuming these requests were not made, how requests might have helped readers better assess evidence in the publication.

---

Example 1. Ahlquist et al. 2014 "Alien Abduction and Voter Impersonation in the 2012 U.S. General Election: Evidence from a Survey List Experiment"

Ahlquist et al. 2014 reports on two list experiments: one list experiment is from December 2012 and has 1,000 cases, and another list experiment is from September 2013 and has 3,000 cases.

Figure 1 of Ahlquist et al. 2014 reports results for the 1,000-person list experiment estimating the prevalence of voter impersonation in the 2012 U.S. general election; the 95% confidence intervals for the full sample and for each reported subgroup cross zero. Figure 2 reports results for the full sample of the 3,000-person list experiment estimating the prevalence of voter impersonation in the 2012 U.S. general election, but Figure 2 did not include subgroup results. Readers are thus left to wonder why subgroup results were not reported for the larger sample that had more power to detect an effect among subgroups.

Moreover, the main voting irregularity list experiment reported in Ahlquist et al. 2014 concerned voter impersonation, but, in footnote 15, Ahlquist et al. discuss another voting irregularity list experiment that was part of the study, about whether political candidates or activists offered the participant money or a gift for their vote:

The other list experiment focused on vote buying and closely mimicked that described in Gonzalez-Ocantos et al. (2012). Although we did not anticipate discovering much vote buying in the USA we included this question as a check, since a similar question successfully discovered voting irregularities in Nicaragua. As expected we found no evidence of vote buying in the USA. We omit details here for space considerations, though results are available from the authors and in the online replication materials...

The phrasing of the footnote is not clear whether the inference of "no evidence of vote buying in the USA" is restricted to an analysis of the full sample or also covers analyses of subgroups.

So the article leaves at least two questions unanswered for a skeptical reader:

Why report subgroup analyses for only the smaller sample?
Why not report the overall estimate and subgroup analyses for the vote buying list experiment?

Sure, for question 2, Ahlquist et al. indicate that the details of the vote buying list experiment were omitted for "space considerations"; however, the 16-page Ahlquist et al. 2014 article is shorter than the other two articles in the journal issue, which are 17 pages and 24 pages.

Peer reviewer requests that could have helped readers were to request a detailed report on the vote buying list experiment and to request a report of subgroup analyses for the 3,000-person sample.

---

Example 2. Sen 2014 "How Judicial Qualification Ratings May Disadvantage Minority and Female Candidates"

Sen 2014 reports logit regression results in Table 3 for four models predicting the ABA rating given to U.S. District Court nominees from 1962 to 2002, with ratings dichotomized into (1) well qualified or exceptionally well qualified and (2) not qualified or qualified.

Model 1 includes a set of variables such as the nominee's sex, race, partisanship, and professional experience (e.g., law clerk, state judge). Compared to model 1, model 2 omits the partisanship variable and adds year dummies. Compared to model 2, model 3 adds district dummies and interaction terms for female*African American and female*Hispanic. And compared to model 3, model 4 removes the year dummies and adds a variable for years of practice and a variable for the nominee's estimated ideology.

The first question raised by the table is the omission of the partisanship variable for models 2, 3, and 4, with no indication of the reason for that omission. The partisanship variable is not statistically significant in model 1, and Sen 2014 notes that the partisanship variable "is never statistically significant under any model specification" (p. 44), but it is not clear why the partisanship variable is dropped in the other models because other variables appear in all four models and never reach statistical significance.

The second question raised by the table is why years of practice appears in only the fourth model, in which roughly one-third of cases are lost due to the inclusion of estimated nominee ideology. Sen 2014 Table 2 indicates that male and white nominees had substantially more years of practice than female and black nominees: men (16.87 years), women (11.02 years), whites (16.76 years), and blacks (10.08 years); therefore, any model assessing whether ABA ratings are biased should account for sex and race differences in years of practice, under the reasonable expectation that nominees should receive higher ratings for more experience.

Peer reviewer requests that could have helped readers were to request a discussion of the absence of the partisanship variable from models 2, 3, and 4, and to request that years of experience be included in more of the models.

---

Does it matter?

Data for Ahlquist et al. 2014 are posted here. I reported on my analysis of the data in a manuscript rejected after peer review by the journal that published Ahlquist et al. 2014.

My analysis indicated that the weighted list experiment estimate of vote buying for the 3,000-person sample was 5 percent (p=0.387), with a 95% confidence interval of [-7%, 18%]. I'll echo my earlier criticism and note that a 25-percentage-point-wide confidence interval is not informative about the prevalence of voting irregularities in the United States because all plausible estimates of U.S. voting irregularities fall within 12.5 percentage points of zero.

Ahlquist et al. 2014 footnote 14 suggests that imputed data on participant voter registration were available, so a peer reviewer could have requested reporting of the vote buying list experiments restricted to registered voters, given that only registered voters have a vote to trade. I did not see a variable for registration in the dataset for the 1,000-person sample, but the list experiment for the 3,000-person sample produced the weighted point estimate that 12 percent of persons listed as registered to vote were contacted by political candidates or activists around the 2012 U.S. general election with an offer to exchange money or gifts for a vote (p=0.018).

I don't believe that this estimate is close to correct, and, given sufficient subgroup analyses, some subgroup analyses would be expected to produce implausible or impossible results, but peer reviewers requesting these data might have produced a more tentative interpretation of the list experiments.

---

For Sen 2014, my analysis indicated that the estimates and standard errors for the partisanship variable (coded 1 for nomination by a Republican president) inflate unusually high when that variable is included in models 2, 3, and 4: the coefficient and standard error for the partisanship variable are 0.02 and 0.11 in model 1, but inflate to 15.87 and 535.41 in model 2, 17.90 and 1,455.40 in model 3, and 18.21 and 2,399.54 in model 4.

The Sen 2014 dataset had variables named Bench.Years, Trial.Years, and Private.Practice.Years. The years of experience for these variables overlap (e.g., nominee James Gilstrap was born in 1957 and respectively has 13, 30, and 30 years for these variables); therefore, the variables cannot be summed to construct a variable for total years of legal experience that does not include double- or triple-counting for some cases. Bench.Years correlates with Trial.Years at -0.47 and with Private.Practice.Years at -0.39, but Trial.Years and Private.Practice.Years correlate at 0.93, so I'll include only Bench.Years and Trial.Years, given that Trial.Years appears more relevant for judicial ratings than Private.Practice.Years.

My analysis indicated that women and blacks had a higher Bench.Years average than men and whites: men (4.05 years), women (5.02 years), whites (4.02 years), and blacks (5.88 years). Restricting the analysis to nominees with nonmissing nonzero Bench.Years, men had slightly more experience than women (9.19 years to 8.36 years) and blacks had slightly more experience than whites (9.33 years to 9.13 years).

Adding Bench.Years and Trial.Years to the four Table 3 models did not produce any meaningful difference in results for the African American, Hispanic, and Female variables, but the p-value for the Hispanic main effect fell to 0.065 in model 4 with Bench.Years added.

---

I estimated a simplified model with the following variables predicting the dichotomous ABA rating variable for each nominee with available data: African American nominee, Hispanic nominee, female nominee, Republican nominee, nominee age, law clerk experience, law school tier (from 1 to 6), Bench0 and Trial0 (no bench or trial experience respectively), Bench.Years, and Trial.Years. These variables reflect demographics, nominee quality, and nominee experience, with a presumed penalty for nominees who lack bench and/or trial experience. Results are below:

The female coefficient was not statistically significant in the above model (p=0.789), but the coefficient was much closer to statistical significance when adding a control for the year of the nomination:

District.Court.Nomination.Year was positively related to the dichotomous ABA rating variable (r=0.16) and to the female variable (r=0.29), and the ABA rating increased faster over time for women than for men (but not at a statistically-significant level: p=0.167), so I estimated a model that interacted District.Court.Nomination.Year with Female and with the race/ethnicity variables:

The model above provides some evidence for an over-time reduction of the sex gap (p=0.095) and the black/white gap (0.099).

The next model is the second model reported above, but with estimated nominee ideology added, coded with higher values indicating higher levels of conservatism:

So there is at least one reasonable model specification that produces evidence of bias against conservative nominees, at least to the extent that the models provide evidence of bias. After all, ABA ratings are based on three criteria—integrity, professional competence, and judicial temperament—but the models include information for only professional competence, so a sex, race, and ideological gap in the models could indicate bias and/or could indicate a sex, race, and ideological gap in nonbiased ABA evaluations of integrity and/or judicial temperament and/or elements of professional competence that are not reflected in the model measures. Sen addressed the possibility of gaps in these other criteria, starting on page 47 of the article.

For what it's worth, evidence of the bias against conservatives is stronger when excluding the partisanship control:

---

The above models for the Sen reanalysis should be interpreted to reflect the fact that there are many reasonable models that could be reported. My assessment from the models that I estimated is that the black/white gap is extremely if not completely robust, the Hispanic/white gap is less robust but still very robust, the female/male gap is less robust but still somewhat robust, and the ideology gap is the least robust of the group.

I'd have liked for the peer reviewers on Sen 2014 to have requested results for the peer reviewers' preferred model, with requested models based only on available data and results reported in at least an online supplement. This would provide reasonable robustness checks for an analysis for which there are many reasonable model specifications. Maybe that happened: the appendix table in the working paper version of Sen 2014 is somewhat different than the published logit regression table. In any event, indicating which models were suggested by peer reviewers might help reduce skepticism about the robustness of reported models, to the extent that models suggested by a peer reviewer have not been volunteered by the researchers.

---

NOTES FOR AHLQUIST ET AL. 2014:

1. Subgroup analyses might have been reported for only the smaller 1,000-person sample because the smaller sample was collected first. However, that does not mean that the earlier sample should be the only sample for which subgroup analyses are reported.

2. Non-disaggregated results for the 3,000-person vote buying list experiment and disaggregated results for the 1,000-person vote buying list experiment were reported in a prior version of Ahlquist et al. 2014, which Dr. Ahlquist sent me. However, a reader of Ahlquist et al. 2014 might not be aware of these results, so Ahlquist et al. 2014 might have been improved by including these results.

---

NOTES FOR SEN 2014:

1. Ideally, models would include a control for twelve years of experience, given that the ABA Standing Committee on the Federal Judiciary "...believes that a prospective nominee to the federal bench ordinarily should have at least twelve years' experience in the practice of law" (p. 3, here). Sen 2014 reports results for a matching analysis that reflects the 12 years threshold, at least for the Trial.Years variable, but I'm less confident in matching results, given the loss of cases (e.g., from 304 women in Table 1 to 65 women in Table 4) and the loss of information (e.g., cases appear to be matched so that nominees with anywhere from 0 to 12 years on Trial.Years are matched on Trial.Years).

2. I contacted the ABA and sent at least one email to the ABA liaison for the ABA committee that handles ratings for federal judicial nominations, asking whether data could be made available for nominee integrity and judicial temperament, such as a dichotomous indication whether an interviewee had raised concerns about the nominee's integrity or judicial temperament. The ABA Standing Committee on the Federal Judiciary prepares a written statement (e.g., here) that describes such concerns for nominees rated as not qualified, if the ABA committee is asked to testify at a Senate Judiciary Committee hearing for the nominee (see p. 8 here). I have not yet received a reply to my inquiries.

---

GENERAL NOTES

1. Data for Ahlquist et al. 2014 are here. Code for my additional analyses is here.

2. Dr. Sen sent me data and R code, but the Sen 2014 data and code do not appear to be online now. Maya Sen's Dataverse is available here. R code for the supplemental Sen models described above is here.

This female researcher should be cited more often

By L.J Zigerell Posted on June 6, 2016 Posted in Methods No Comments Tagged with inequality, list experiment, methods, sex, you're doing it wrong

My article reanalyzing data on a gender gap in citations to international relations articles indicated that the gender gap is largely confined to elite articles, defined as articles in the right tail of citation counts or articles in the top three political science journals. That article concerned an aggregate gender gap in citations, but this post is about a particular woman who has been under-cited in the social science literature.

It is not uncommon to read a list experiment study that suggests or states that the list experiment originated in the research described in the Kuklinski, Cobb, and Gilens 1997 article, "Racial Attitudes and the New South." For example, from Heerwig and McCabe 2009 (p. 678):

Pioneered by Kuklinski, Cobb, and Gilens (1997) to measure social desirability bias in reporting racial attitudes in the "New South," the list experiment is an increasingly popular methodological tool for measuring social desirability bias in self-reported attitudes and behaviors.

Kuklinski et al. described a list experiment that was placed on the 1991 National Race and Politics Survey. Kuklinski and colleagues appeared to propose the list experiment as a new measure (p. 327):

We offer as our version of an unobtrusive measure the list experiment. Imagine a representative sample of a general population divided randomly in two. One half are presented with a list of three items and asked to say how many of these items make them angry — not which specific items make them angry, just how many. The other half receive the same list plus an additional item about race and are also asked to indicate the number of items that make them angry. [screen shot]

The initial draft of my list experiment article reflected the belief that the list experiment originated with Kuklinski et al., but I then learned [*] of Judith Droitcour Miller's 1984 dissertation, which contained this passage:

The new item-count/paired lists technique is designed to avoid the pitfalls encountered by previous indirect estimation methods. Briefly, respondents are shown a list of four or five behavior categories (the specific number is arbitrary) and are then asked to report how many of these behaviors they have engaged in — not which categories apply to them. Nothing else is required of respondents or interviewers. Unbiased estimation is possible because two slightly different list forms (paired lists) are administered to two separate subsamples of respondents, which have been randomly selected in advance by the investigator. The two list forms differ only in that the deviant behavior item is included on one list, but omitted from the other. Once the alternate forms have been administered to the two randomly equivalent subsamples, an estimate of deviant behavior prevalence can be derived from the difference between the average list scores. [screen shot]

The above passage was drawn from pages 3 and 4 of Judith Droitcour Miller's 1984 dissertation at the George Washington University, "A New Survey Technique for Studying Deviant Behavior." [Here is another description of the method, in a passage from the 2004 edition of the 1991 book, Measurement Errors in Surveys (p. 88)]

It's possible that James Kuklinski independently invented the list experiment, but descriptions of the list experiment's origin should nonetheless cite Judith Droitcour Miller's 1984 dissertation as a prior — if not the first [**] — example of the procedure known as the list experiment.

---

[*] I think it was the Adam Glynn manuscript described below through which I learned of Miller's dissertation.

[**] An Adam Glynn manuscript discussed the list experiment and item count method as special cases of aggregated response techniques. Glynn referenced a 1979 Raghavarao and Federer article, and that article referenced a 1974 Smith et al. manuscript that used a similar block total response procedure. The non-randomized version of the procedure split seven questions into groups of three, as illustrated in one of the questionnaires below. The procedure's unobtrusiveness derived from a researcher's inability in most cases to determine which responses a respondent had selected: for example, Yes-No-Yes produces the same total as No-No-No (5 in each case).

The questionnaire for the randomized version of the block total response procedure listed all seven questions; the respondent then drew a number and gave a total response for only those three questions that were associated with the number that was drawn: for example, if the respondent drew a 4, then the respondent gave a total for their responses to questions 4, 5, and 7. This procedure is similar to the list experiment, but the list experiment is simpler and more efficient.

The omitted "White" or "European American" experiment

By L.J Zigerell Posted on June 3, 2016 Posted in Methods No Comments Tagged with file drawer problem, methods, pottery barn rule, reproductions, selective reporting, TESS, you're doing it wrong

Here's part of the abstract from Rios Morrison and Chung 2011, published in the Journal of Experimental Social Psychology:

In both studies, nonminority participants were randomly assigned to mark their race/ethnicity as either "White" or "European American" on a demographic survey, before answering questions about their interethnic attitudes. Results demonstrated that nonminorities primed to think of themselves as White (versus European American) were subsequently less supportive of multiculturalism and more racially prejudiced, due to decreases in identification with ethnic minorities.

So asking white respondents to select their race/ethnicity as "European American" instead of "White" influenced whites' attitudes toward and about ethnic minorities. The final sample for study 1 was a convenience sample of 77 self-identified whites and 52 non-whites, and the final sample for study 2 was 111 white undergraduates.

Like I wrote before, if you're thinking that it would be interesting to see whether results hold in a nationally representative sample with a large sample size, well, that was tried, with a survey experiment as part of the Time Sharing Experiments in the Social Sciences. Here are the results:

I'm mentioning these results again because in October 2014 the journal that published Rios Morrison and Chung 2011 desk rejected the manuscript that I submitted describing these results. So you can read in the Journal of Experimental Social Psychology about results for the low-powered test on convenience samples for the "European American" versus "White" self-identification hypothesis, but you won't be able to read in the JESP about results when that hypothesis was tested with a higher-powered test on a nationally-representative sample with data collected by a disinterested third party.

I submitted a revision of the manuscript to Social Psychological and Personality Science, which extended a revise-and-resubmit offer conditional on inclusion of a replication of the TESS experiment. I planned to conduct an experiment with an MTurk sample, but I eventually declined the revise-and-resubmit opportunity for various reasons.

The most recent version of the manuscript is here. Links to data and code.

Selective reporting of outcome variables in "The Public's Anger"

By L.J Zigerell Posted on June 2, 2016 Posted in Methods No Comments Tagged with methods, pottery barn rule, race, reproductions, selective reporting, symbolic racism, TESS, you're doing it wrong

In the Political Behavior article, "The Public's Anger: White Racial Attitudes and Opinions Toward Health Care Reform", Antoine J. Banks presented evidence that "anger uniquely pushes racial conservatives to be more opposing of health care reform while it triggers more support among racial liberals" (p. 493). Here is how the outcome variable was measured in the article's reported analysis (p. 511):

Health Care Reform is a dummy variable recoded 0-1 with 1 equals opposition to reform. The specific item is "As of right now, do you favor or oppose Barack Obama and the Democrats' Health Care reform bill". The response options were yes = I favor the health care bill or no = I oppose the health care bill.

However, the questionnaire for the study indicates that there were multiple items used to measure opinions of health care reform:

W2_1. Do you approve or disapprove of the way Barack Obama is handling Health Care? Please indicate whether you approve strongly, approve somewhat, neither approve nor disapprove, disapprove somewhat, or disapprove strongly.

W2_2. As of right now, do you favor or oppose Barack Obama and the Democrats' Health Care reform bill?

[if "favor" on W2_2] W2_2a. Do you favor Barack Obama and the Democrats' Health Care reform bill very strongly, or not so strongly?

[if "oppose" on W2_2] W2_2b. Do you oppose Barack Obama and the Democrats' Health Care reform bill very strongly, or not so strongly?

The bold item above is the only item reported on as an outcome variable in the article. The reported analysis omitted results for one outcome variable (W2_1) and reported dichotomous results for the other outcome variable (W2_2) for which the apparent intention was to have a four-pronged outcome variable from oppose strongly to favor strongly.

---

Here is the manuscript that I submitted to Political Behavior in March 2015 describing the results using the presumed intended outcome variables and a straightforward research design (e.g., no political discussion control, no exclusion of cases, cases from all conditions analyzed at the same time). Here's the main part of the main figure:

The takeaway is that, with regard to opposition to health care reform, the effect of the fear condition on symbolic racism differed at a statistically significant level from the effect of the baseline relaxed condition on symbolic racism; however, contra Banks 2014, the effect of anger on symbolic racism did not differ at a statistically significant level from the effect of the relaxed condition on symbolic racism. The anger condition had a positive effect on symbolic racism, but it was not a unique influence.

The submission to Political Behavior was rejected after peer review. Comments suggested analyzing the presumed intended outcome variables while using the research design choices in Banks 2014. Using the model in Table 2 column 1 of Banks 2014, the fear interaction term and the fear condition term are statistically significant at p<0.05 for predicting the two previously-unreported non-dichotomous outcome variables and for predicting the scale of these two variables; the anger interaction term and the anger condition term are statistically significant at p<0.05 for predicting two of these three outcome variables, with p-values for the residual "Obama handling" outcome variable at roughly 0.10. The revised manuscript describing these results is here.

---

Data are here, and code for the initial submission is here.

---

Antoine Banks has published several studies on anger and racial politics (here, for example) that should be considered when making inferences about the substance of the effect of anger on racial attitudes. Banks had a similar article published in the AJPS, with Nicholas Valentino. Data for that article are here. I did not see any problems with that analysis, but I didn't look very hard, because the posted data were not the raw data: the posted data that I checked omitted, for example, the variables used to construct the outcome variable.

Questions that I ask when evaluating research

By L.J Zigerell Posted on August 6, 2015 Posted in Methods No Comments Tagged with methods

I left this as a comment here.

For what it's worth, here are questions that I ask when evaluating research:

1. Did the researchers preregister their research design choices so that we can be sure that the research design choices were not made based on the data? If not, are the research design choices consistent with the choices that the researcher has previously made in other research?

2. Have the researchers publicly posted documentation and all the data that were collected, so that other researchers can check the analysis for errors and assess the robustness of the reported results?

3. Did the researchers declare that there are no unreported file drawer studies, unreported manipulations, and unreported variables that were measured?

4. Were the data collected by an independent third party?

5. Is the sample representative of the population of interest?

R graph: barplot percentages

By L.J Zigerell Posted on July 8, 2015 Posted in Methods 4 Comments Tagged with methods, R

Here are posts on R graphs, some of which are for barplots. I recently graphed a barplot with labels on the bars, so I'll post the code here. I cropped the y-axis numbers for the printed figure; from what I can tell, it takes a little more code and a lot more trouble (for me, at least) to code a plot without the left set of numbers.

---

This loads the Hmisc package, which has the mgp.axis command that will be used later:

require(Hmisc)

---

This command tells R to make a set of graphs with 1 row and 0 columns (mfrow), to have margins of 6, 6, 2, and 1 for the bottom, left, top, and right (mar), and to have margins for the axes with characteristics of 0, 4, and 0, for the location of the labels, tick-mark labels, and tick marks (mgp):

par(mfrow=c(1,0), mar=c(6, 6, 2, 1), mgp=c(0,4,0))

---

This command enters the data for the barplot:

heritage <- c(66, 0, 71, 10, 49, 36)

---

This command enters the colors for the barplot bars:

colors <- c("royalblue4", "royalblue4", "cornflowerblue", "cornflowerblue", "navyblue", "navyblue")

---

This command enters the labels for the barplot bars, with \n indicating a new line:

names <- c("Flag reminds of\nSouthern heritage\nmore than\nwhite supremacy", "Flag reminds of\nwhite supremacy\nmore than\nSouthern heritage", "Proud of what\nthe Confederacy\nstood for", "Not proud of what\nthe Confederacy\nstood for", "What happens to\nSoutherners\naffects my life\na lot", "What happens to\nSoutherners\naffects my life\nnot very much")

---

This command plots a barplot of heritage with the indicated main title, no y-axis labels, from 0 to 90 on the y-axis, with horizontal labels, with colors from the "colors" set and names from the "names" set.

bp <- barplot (heritage, main="Percentage who Preferred the Georgia Flag with the Confederate Battle Emblem", ylab=NA, ylim=c(0,90), las=1, col=colors, names=names)

---

This command plots a y-axis (2) with the indicated labels being horizontal (las=2):

mgp.axis(2, at=c(0, 20, 40, 60, 80), las=2)

The above code is for the rightmost set of y-axis labels in the figure.

---

This command enters the text for the barplot bars:

labels <-c("66%\n(n=301)", "0%\n(n=131)", "71%\n(n=160)", "10%\n(n=104)", "49%\n(n=142)", "36%\n(n=122)")

---

This command plots the labels at the coordinates (barplot value, heritage value+ + 2):

text(bp, heritage+2, labels, cex=1, pos=3)

---

Full code is here.

Thank you for sharing data and code

By L.J Zigerell Posted on June 19, 2015 Posted in Methods No Comments Tagged with methods, replications, reproductions

This periodically-updated page is to acknowledge researchers who have shared data and/or code and/or have answered questions about their research. I tried to acknowledge everyone who provided data, code, or information, but let me know if I missed anyone who should be on the list. The list is chronological based on the date that I first received data and/or code and/or information.

Aneeta Rattan for answering questions about and providing data used in "Race and the Fragility of the Legal Distinction between Juveniles and Adults" by Aneeta Rattan, Cynthia S. Levine, Carol S. Dweck, and Jennifer L. Eberhardt.

Maureen Craig for code for "More Diverse Yet Less Tolerant? How the Increasingly Diverse Racial Landscape Affects White Americans' Racial Attitudes" and for "On the Precipice of a 'Majority-Minority' America", both by Maureen A. Craig and Jennifer A. Richeson.

Michael Bailey for answering questions about his ideal point estimates.

Jeremy Freese for answering questions and conducting research about past studies of the Time-sharing Experiments for the Social Sciences program.

Antoine Banks and AJPS editor William Jacoby for posting data for "Emotional Substrates of White Racial Attitudes" by Antoine J. Banks and Nicholas A. Valentino.

Gábor Simonovits for data for "Publication Bias in the Social Sciences: Unlocking the File Drawer" by Annie Franco, Neil Malhotra, and Gábor Simonovits.

Ryan Powers for posting and sending data and code for "The Gender Citation Gap in International Relations" by Daniel Maliniak, Ryan Powers, and Barbara F. Walter. Thanks also to Daniel Maliniak for answering questions about the analysis.

Maya Sen for data and code for "How Judicial Qualification Ratings May Disadvantage Minority and Female Candidates" by Maya Sen.

Antoine Banks for data and code for "The Public's Anger: White Racial Attitudes and Opinions Toward Health Care Reform" by Antoine J. Banks.

Travis L. Dixon for the codebook for and for answering questions about "The Changing Misrepresentation of Race and Crime on Network and Cable News" by Travis L. Dixon and Charlotte L. Williams.

Adam Driscoll for providing summary statistics for "What's in a Name: Exposing Gender Bias in Student Ratings of Teaching" by Lillian MacNell, Adam Driscoll, and Andrea N. Hunt.

Andrei Cimpian for answering questions and providing more detailed data than available online for "Expectations of Brilliance Underlie Gender Distributions across Academic Disciplines" by Sarah-Jane Leslie, Andrei Cimpian, Meredith Meyer, and Edward Freeland.

Vicki L. Claypool Hesli for providing data and the questionnaire for "Pr edicting Rank Attainment in Political Science" by Vicki L. Hesli, Jae Mook Lee, and Sara McLaughlin Mitchell.

Jo Phelan for directing me to data for "The Genomic Revolution and Beliefs about Essential Racial Differences A Backdoor to Eugenics?" by Jo C. Phelan, Bruce G. Linkb, and Naumi M. Feldman.

Spencer Piston for answering questions about "Accentuating the Negative: Candidate Race and Campaign Strategy" by Yanna Krupnikov and Spencer Piston.

Amanda Koch for answering questions and providing information about "A Meta-Analysis of Gender Stereotypes and Bias in Experimental Simulations of Employment Decision Making" by Amanda J. Koch, Susan D. D'Mello, and Paul R. Sackett.

Kevin Wallsten and Tatishe M. Nteta for answering questions about "Racial Prejudice Is Driving Opposition to Paying College Athletes. Here's the Evidence" by Kevin Wallsten, Tatishe M. Nteta, and Lauren A. McCarthy.

Hannah-Hanh D. Nguyen for answering questions and providing data for "Does Stereotype Threat Affect Test Performance of Minorities and Women? A Meta-Analysis of Experimental Evidence" by Hannah-Hanh D. Nguyen and Ann Marie Ryan.

Solomon Messing for posting data and code for "Bias in the Flesh: Skin Complexion and Stereotype Consistency in Political Campaigns" by Solomon Messing, Maria Jabon, and Ethan Plaut.

Sean J. Westwood for data and code for "Fear and Loathing across Party Lines: New Evidence on Group Polarization" by Sean J. Westwood and Shanto Iyengar.

Charlotte Cavaillé for code and for answering questions for the Monkey Cage post "No, Trump won't win votes from disaffected Democrats in the fall" by Charlotte Cavaillé.

Kris Byron for data for "Women on Boards and Firm Financial Performance: A Meta-Analysis" by Corrine Post and Kris Byron.

Hans van Dijk for data for "Defying Conventional Wisdom: A Meta-Analytical Examination of the Differences between Demographic and Job-Related Diversity Relationships with Performance" by Hans van Dijk, Marloes L. van Engen, and Daan van Knippenberg.

Alexandra Filindra for answering questions about "Racial Resentment and Whites' Gun Policy Preferences in Contemporary America" by Alexandra Filindra and Noah J. Kaplan.

Category: Methods