Here is the manuscript that I plan to present at the 2015 American Political Science Association conference in September: revised version here. The manuscript contains links to locations of the data; a file of the reproduction code for the revised manuscript  is here.

Comments are welcome!

Abstract and the key figure are below:

Racial bias is a persistent concern in the United States, but polls have indicated that whites and blacks on average report very different perceptions of the extent and aggregate direction of this bias. Meta-analyses of results from a population of sixteen federally-funded survey experiments, many of which have never been reported on in a journal or academic book, indicate the presence of a moderate aggregate black bias against whites but no aggregate white bias against blacks.

Metan w mcNOTE:

I made a few changes since submitting the manuscript: [1] removing all cases in which the target was not black or white (e.g., Hispanics, Asians, control conditions in which the target did not have a race); [2] estimating meta-analyses without removing cases based on a racial manipulation check; and [3] estimating meta-analyses without the Cottrell and Neuberg 2004 survey experiment, given that that survey experiment was more about perceptions of racial groups instead of a test for racial bias against particular targets.

Numeric values in the figure are for a meta-analysis that reflects [1] above:

* For white respondents: the effect size point estimate was 0.039 (p=0.375), with a 95% confidence interval of [-0.047, 0.124].
* For black respondents: the effect size point estimate was 0.281 (p=0.016), with a 95% confidence interval of [0.053, 0.509].

---

The meta-analysis graph includes five studies for which a racial manipulation check was used to restrict the sample: Pager 2006, Rattan 2010, Stephens 2011, Pedulla 2011, and Powroznik 2014. Inferences from the meta-analysis were the same when these five studies included respondents who failed the racial manipulation checks:

* For white respondents: the effect size point estimate was 0.027 (p=0.499), with a 95% confidence interval of [-0.051, 0.105].
* For black respondents: the effect size point estimate was 0.268 (p=0.017), with a 95% confidence interval of [0.047, 0.488].

---

Inferences from the meta-analysis were the same when the Cottrell and Neuberg 2004 survey experiment was removed from the meta-analysis. For the residual 15 studies using the racial manipulation check restriction:

* For white respondents: the effect size point estimate was 0.063 (p=0.114), with a 95% confidence interval of [-0.015, 0.142].
* For black respondents: the effect size point estimate was 0.210 (p=0.010), with a 95% confidence interval of [0.050, 0.369].

---

For the residual 15 studies not using the racial manipulation check restriction:

* For white respondents: the effect size point estimate was 0.049 (p=0.174), with a 95% confidence interval of [-0.022, 0.121].
* For black respondents: the effect size point estimate was 0.194 (p=0.012), with a 95% confidence interval of [0.044, 0.345].

Tagged with: , ,

Here is a passage from Pigliucci 2013.

Steele and Aronson (1995), among others, looked at IQ tests and at ETS tests (e.g. SATs, GREs, etc.) to see whether human intellectual performance can be manipulated with simple psychological tricks priming negative stereotypes about a group that the subjects self-identify with. Notoriously, the trick worked, and as a result we can explain almost all of the gap between whites and blacks on intelligence tests as an artifact of stereotype threat, a previously unknown testing situation bias.

Racial gaps are a common and perennial concern in public education, but this passage suggests that such gaps are an artifact. However, when I looked up Steele and Aronson (1995) to discover the evidence for this result, I discovered that the black participants and the white participants in the study were all Stanford undergraduates and that the students' test performances were adjusted by the students' SAT scores. Given that the analysis contained both sample selection bias and statistical control, it does not seem reasonable to make an inference about populations based on that analysis. This error in reporting results for Steele and Aronson (1995) is apparently common enough to deserve its own article.

---

Here's a related passage from Brian at Dynamic Ecology:

A neat example on the importance of nomination criteria for gender equity is buried in this post about winning Jeopardy (an American television quiz show). For a long time only 1/3 of the winners were women. This might lead Larry Summers to conclude men are just better at recalling facts (or clicking the button to answer faster). But a natural experiment (scroll down to the middle of the post to The Challenger Pool Has Gotten Bigger) shows that nomination criteria were the real problem. In 2006 Jeopardy changed how they selected the contestants. Before 2006 you had to self-fund a trip to Los Angeles to participate in try-outs to get on the show. This required a certain chutzpah/cockiness to lay out several hundred dollars with no guarantee of even being selected. And 2/3 of the winners were male because more males were making the choice to take this risk. Then they switched to an online test. And suddenly more participants were female and suddenly half the winners were female. [emphasis added]

I looked up the 538 post linked to in the passage, which reported: "Almost half of returning champions this season have been women. In the year before Jennings's streak, fewer than 1 in 3 winners were female." That passage provides two data points: this season appears to be 2015 (the year of the 538 post), and the year before Jennings's streak appears to be 2003 (the 538 post noted that Jennings's streak occurred in 2004). The 538 post reported that the rule change for the online test occurred in 2006.

So here's the relevant information from the 538 post:

  • In 2003, fewer than 1 in 3 Jeopardy winners were women.
  • In 2006, the selection process was changed to an online test.
  • Presumably in 2015, through early May, almost half of Jeopardy winners have been women.

It does not seem that comparison of a data point from 2003 to a partial data point from 2015 permits use of the descriptive term "suddenly."

It's entirely possible -- and perhaps probable -- that the switch to an online test for qualification reduced gender inequality in Jeopardy winners. But that inference needs more support than the minimal data reported in the 538 post.

Tagged with: , , ,

I left this as a comment here.

For what it's worth, here are questions that I ask when evaluating research:

1. Did the researchers preregister their research design choices so that we can be sure that the research design choices were not made based on the data? If not, are the research design choices consistent with the choices that the researcher has previously made in other research?

2. Have the researchers publicly posted documentation and all the data that were collected, so that other researchers can check the analysis for errors and assess the robustness of the reported results?

3. Did the researchers declare that there are no unreported file drawer studies, unreported manipulations, and unreported variables that were measured?

4. Were the data collected by an independent third party?

5. Is the sample representative of the population of interest?

Tagged with: