Timofey Pnin linked to an Alice Eagly article that mentioned these two meta-analyses:

  • van Dijk et al. 2012 "Defying Conventional Wisdom: A Meta-Analytical Examination of the Differences between Demographic and Job-Related Diversity Relationships with Performance"
  • Post and Bryon 2015 "Women on Boards and Firm Financial Performance: A Meta-Analysis"

I wanted to check for funnel plot asymmetry in the set of studies in these meta-analyses, so I emailed coauthors of the articles. Hans van Dijk and Kris Byron were kind enough to send data.

The funnel plot for the 612 effect sizes in the van Dijk et al. 2012 meta-analysis is below. The second funnel plot below is a close-up of the bottom of the full funnel plot, limited to studies with fewer than 600 teams. The funnel plot is remarkably symmetric.

FP1

FP2

The funnel plots below are for the Post and Byron 2015 meta-analysis, with the full set of studies in the top funnel plot and, below the full funnel plot, a close-up of the studies with a standard error less than 0.4. The funnel plot is reasonably symmetric.

FP3

FP4

UPDATE (Apr 13, 2016):

More funnel plots from van Dijk et al. 2012.

Sample restricted to age diversity (DIV TYPE=1):

vDe - Age Diversity (1)

Sample restricted to race and ethnic diversity (DIV TYPE=2):

vDe - Race Ethnic Diversity (2)

Sample restricted to sex diversity (DIV TYPE=5):

vDe - Sex Diversity (5)

Sample restricted to education diversity (DIV TYPE=6):

vDe - Education Diversity (6)

Here is a passage from Pigliucci 2013.

Steele and Aronson (1995), among others, looked at IQ tests and at ETS tests (e.g. SATs, GREs, etc.) to see whether human intellectual performance can be manipulated with simple psychological tricks priming negative stereotypes about a group that the subjects self-identify with. Notoriously, the trick worked, and as a result we can explain almost all of the gap between whites and blacks on intelligence tests as an artifact of stereotype threat, a previously unknown testing situation bias.

Racial gaps are a common and perennial concern in public education, but this passage suggests that such gaps are an artifact. However, when I looked up Steele and Aronson (1995) to discover the evidence for this result, I discovered that the black participants and the white participants in the study were all Stanford undergraduates and that the students' test performances were adjusted by the students' SAT scores. Given that the analysis contained both sample selection bias and statistical control, it does not seem reasonable to make an inference about populations based on that analysis. This error in reporting results for Steele and Aronson (1995) is apparently common enough to deserve its own article.

---

Here's a related passage from Brian at Dynamic Ecology:

A neat example on the importance of nomination criteria for gender equity is buried in this post about winning Jeopardy (an American television quiz show). For a long time only 1/3 of the winners were women. This might lead Larry Summers to conclude men are just better at recalling facts (or clicking the button to answer faster). But a natural experiment (scroll down to the middle of the post to The Challenger Pool Has Gotten Bigger) shows that nomination criteria were the real problem. In 2006 Jeopardy changed how they selected the contestants. Before 2006 you had to self-fund a trip to Los Angeles to participate in try-outs to get on the show. This required a certain chutzpah/cockiness to lay out several hundred dollars with no guarantee of even being selected. And 2/3 of the winners were male because more males were making the choice to take this risk. Then they switched to an online test. And suddenly more participants were female and suddenly half the winners were female. [emphasis added]

I looked up the 538 post linked to in the passage, which reported: "Almost half of returning champions this season have been women. In the year before Jennings's streak, fewer than 1 in 3 winners were female." That passage provides two data points: this season appears to be 2015 (the year of the 538 post), and the year before Jennings's streak appears to be 2003 (the 538 post noted that Jennings's streak occurred in 2004). The 538 post reported that the rule change for the online test occurred in 2006.

So here's the relevant information from the 538 post:

  • In 2003, fewer than 1 in 3 Jeopardy winners were women.
  • In 2006, the selection process was changed to an online test.
  • Presumably in 2015, through early May, almost half of Jeopardy winners have been women.

It does not seem that comparison of a data point from 2003 to a partial data point from 2015 permits use of the descriptive term "suddenly."

It's entirely possible -- and perhaps probable -- that the switch to an online test for qualification reduced gender inequality in Jeopardy winners. But that inference needs more support than the minimal data reported in the 538 post.

Here's a tweet that I happened upon:

The graph is available here. The idea of the graph appears to be that the average 2012 science scores on the PISA test were similar for boys and girls, so the percentage of women should be similar to the percentage of men among university science graduates in 2010.

The graph would be more compelling if STEM workers were drawn equally from the left half and the right half of the bell curve of science and math ability. But that's probably not what happens. It's more likely that college graduates who work in STEM fields have on average more science and math ability than the average person. If that's true, then it is not a good idea to compare average PISA scores for boys and girls in this case; it would be a better idea to compare PISA scores for boys and girls in the right tail of science and math ability because that is where the bulk of STEM workers likely come from.

Stoet and Geary 2013 reported on sex distributions in the right tail of math ability on the PISA:

For the 33 countries that participated in all four of the PISA assessments (i.e., 2000, 2003, 2006, and 2009), a ratio of 1.7–1.9:1 [in mathematics performance] was found for students achieving above the 95th percentile, and a 2.3–2.7:1 ratio for students scoring above the 99th percentile.

So there is a substantial sex difference in mathematics scores to the advantage of boys in the PISA data. There is also a substantial sex difference in reading scores to the advantage of girls in the PISA data, but reading ability is less useful than math ability for success in most or all STEM fields.

There is a smaller advantage for boys over girls in the right tail of science scores on the 2012 PISA, according to this report:

Across OECD countries, 9.3% of boys are top performers in science (performing at Level 5 or 6), but only 7.4% of girls are.

I'm not sure what percentile a Level 5 or 6 score is equivalent to. I'm also not sure whether math scores or science scores are more predictive for future science careers. But I am sure that it's better to examine right tail distributions than mean distributions for understanding representation in STEM.

Looks like #addmaleauthorgate is winding down. I tried throughout the episode to better understand when, if ever, gender diversity is a good idea. I posted and tweeted and commented because I perceived a tension between (1) the belief that gender diversity produces benefits, and (2) the belief that it was sexist for a peer reviewer to suggest that gender diversity might produce benefits for a particular manuscript on gender bias.

---

I posted a few comments at Dynamic Ecology as I was starting to think about #addmaleauthorgate. The commenters there were nice, but I did not get much insight about how to resolve the conflict that I perceived.

I posted my first blog post on the topic, which WT excerpted here in a comment. JJ, Ph.D posted a reply comment here that made me think, but on reflection I thought that the JJ, Ph.D comment was based on an unnecessary assumption. One of the comments at that blog post did lead to my second #addmaleauthorgate blog post.

---

I received a comment on my first blog post, from Marta, which specified Marta's view of the sexism in the review:

Suggesting getting male input to fix the bias is sexist - the reviewer implies that the authors would not have come to the same conclusions if a male had read the paper.

That's a perfectly defensible idea, but its generalization has implications, such as it being sexist to suggest that a woman be placed on a team investigating gender bias; after all, the implication in suggesting gender diversity in that case would be that an all-male team is unable to draft a report on gender bias without help from a woman.

---

The most dramatic interaction occurred on Twitter. After that, I figured that it was a good time to stop asking questions. However, I subsequently received two additional substantive responses. First, Zuleyka Zevallos posted a comment at Michael Eisen's blog that began:

Gender diversity is a term that has a specific meaning in gender studies – it comes out of intersectional feminist writing that demonstrates how cis-gender men, especially White men, are given special privileges by society and that the views, experiences and interests of women and minorities should be better represented.

Later that day, Karen James tweeted:

...diversity & inclusion are about including traditionally oppressed or marginalized groups. Men are not one of those groups.

Both comments refer to the asymmetry-in-treatment explanation that I referred to in note 4 of my first #addmaleauthorgate post. That is certainly a way to reconcile the two beliefs that I mentioned at the top of this post.

---

Some more housekeeping. My comments here and here and here did not get very far in terms of attracting responses that disagreed with me. I followed up on a tweet characterizing the "whole review" by asking for the whole review to be made public, but that went nowhere; it seems suboptimal that there is so much commentary about a peer review that has been selectively excerpted.

A writer for Science Insider wrote an article indicating that Science Insider had access to the whole review. I asked for the writer to post the whole review, but the writer tweeted that I should contact the authors for this particular newsworthy item. I don't think that is how journalism is supposed to work.

I replied to a post on the topic in Facebook and might have posted comments elsewhere online. I make no claim about the exhaustiveness of the above links. The links aren't chronological, either.

---

One more larger point. It seems that much of the negative commentary on this peer review mischaracterizes the peer review. This mischaracterization is another method by which to make it easier to dismiss thoughtful consideration of ideas that one does not want to consider.

Here is a description of the peer review:

...that someone would think it was OK to submit a formal review of a paper that said "get a male co-author"

Very strange use of quotes in that case, given that the quoted passage did not appear in the public part of the review. Notice also the generalization to "paper" instead of "paper on gender bias" and the more forceful description of "get" as opposed to "It would probably also be beneficial."

Here is more coverage of the peer review:

A scientific journal sparked a Twitter firestorm when it rejected two female scientists' work partly because the paper they submitted did not have male co-authors.

If there is any evidence that the same manuscript would not have been rejected or would have had a lesser chance of being rejected if the manuscript had male co-authors, please let me know.

One more example, from a radio station:

This week the dishonour was given to academic journal PLos One for rejecting a paper written by two female researchers on the basis that they needed to add a male co-author to legitimize their work.

I would be interested in understanding which part of the review could be characterized with the word "needed" and "legitimize." Yes, it would be terribly sexist if the reviewer wrote that the female researchers "needed to add a male co-author to legitimize their work"; however, that did not happen.

that someone would think it was OK to submit a formal review of a paper that said “get a male co-author” - See more at: http://www.michaeleisen.org/blog/?p=1700#sthash.o0RkigoR.dpuf
that someone would think it was OK to submit a formal review of a paper that said “get a male co-author” - See more at: http://www.michaeleisen.org/blog/?p=1700#sthash.o0RkigoR.dpuf

My previous post on #AddMaleAuthorGate did not focus on the part of the peer review that discussed possible sex differences. However, that part of the peer review has since been characterized as harassment, so I thought that a closer look would be of value. I have placed the relevant part of the public part of the peer review below.

"...perhaps it is not so surprising that on average male doctoral students co-author one more paper than female doctoral students, just as, on average, male doctoral students can probably run a mile race a bit faster than female doctoral students.
... ...
As unappealing as this may be to consider, another possible explanation would be that on average the first-authored papers of men are published in better journals than those of women, either because of bias at the journal or because the papers are indeed of a better quality, on average ... And it might well be that on average men publish in better journals ... perhaps simply because men, perhaps, on average work more hours per week than women, due to marginally better health and stamina."

Below, I'll gloss the passage, with notes that characterize as charitably as possible what the reviewer might have been thinking when writing the passage. Here goes:

"...perhaps it is not so surprising that on average male doctoral students co-author one more paper than female doctoral students,..." = This finding from the manuscript might not be surprising.

"...just as, on average, male doctoral students can probably run a mile race a bit faster than female doctoral students." = There might be an explanation for the finding that reflects something other than bias against women. Let me use an obvious example to illustrate this: men and women are typically segregated by sex in track races, and this might not be due to bias against women. Of course, I believe that there is overlap in the distribution of running speed, so I will toss in an "on average" and a "probably" to signal that I am not one of those sexists who think that men are better than women in running a mile race on average. I'll even use the caveat "a bit faster" to soften the proposed suggestion.

"... ..." = I wrote something here, but this passage was redacted before my review was posted on Twitter. That double ellipsis is unusual.

"As unappealing as this may be to consider..." = I know that this next part of the review might come across as politically incorrect. I'm just trying to signal that this is only something to consider.

"...another possible explanation would be that..." = I'm just proposing this as a possibility.

"...on average..." = I understand the overlap in the distribution.

"...the first-authored papers of men are published in better journals than those of women..." = I understand this finding from the manuscript.

"...either because of bias at the journal..." = That finding might actually be due to journals being biased against women. I realize this possibility, and I am not excluding it as an explanation. I even mentioned this hypothesis first, so that no one will think that I am discounting the manuscript's preferred explanation.

"...or because the papers are indeed of a better quality, on average..." = This is the most reasonable alternate explanation that I can think of. I am NOT saying that every paper by a man is necessarily of a better quality, so I'll mention the "on average" part again because I understand that there is overlap in the distribution. However, if we measure the quality of papers by men and the quality of papers by women and then compare the two measures, it might be possible that the difference in means between the two measures is not 0.00. I hope that no one forgot that this sentence began with a set of caveats about how this is a possible explanation that might be unappealing.

"..." = I wrote something else here, but this passage was also redacted before my review was posted on Twitter.

"And it might well be that on average men publish in better journals..." = Just restating a finding from the manuscript. I remembered the "on average" caveat. That's my fifth  "on average" so far in this short passage, by the way. I hope that my I'm-not-a-sexist signals are working.

"..." = I wrote something else here, too, but this passage was also redacted before my review was posted on Twitter; this ellipsis is mid-sentence, which is a bit suspicious.

"..perhaps simply because men, perhaps.." = This is just a possibility. I used the word "perhaps" twice, so that no one misses the "perhaps"s that I used to signal that this is just a possibility.

"...on average work more hours per week than women..." = This is what it means when the male-female wage gap is smaller when we switch from weekly pay to hourly pay, right?

"...due to marginally better health and stamina." = I remember reading a meta-analysis that found that men score higher than women on tests of cardiovascular endurance; I'm pretty sure that's a plausible proxy for stamina. I hope that no one interprets "health" as life expectancy or risk of a heart attack because the fact that men die on average sooner than women or might be more likely to have a heart attack is probably not much of a factor in the publishing of academic articles by early-career researchers.

---

In my voice again. Some caveats of my own:

I am not making the claim that the review or the reviewer is not sexist or that the reviewer would have made the equivalent review if the researchers were all men. The purpose of this exercise was to try to gloss as charitably as possible the part of the review that discussed sex differences. If you do not think that we should interpret the review as charitably as possible, I would be interested in an explanation why.

The purpose of this exercise was not to diminish the bias that women face in academia and elsewhere. This post makes no claim that it is inappropriate for the female researchers in this episode -- or anyone else -- to interpret the review as reflecting the type of sexism that has occurred and has continued to occur.

Rather, the purpose of this exercise was to propose the possibility that our interpretation of the review reflects some assumptions about the reviewer and that our interpretation is informed by our experiences, which might color the review in a certain way for some people and in a certain way for other people. These assumptions are not necessarily invalid and might accurately reflect reality; but I wanted to call attention to their status as assumptions.

There has recently been much commentary on the peer review received by female researchers regarding their manuscript about gender bias in academic biology (see here, here, and here). The resulting Twitter hashtag #addmaleauthorgate indicates the basis for the charge of sexism. Here is the relevant part of the peer review:

It would probably also be beneficial to find one or two male biologists to work with (or at least obtain internal peer review from, but better yet as active co-authors), in order to serve as a possible check against interpretations that may sometimes be drifting too far away from empirical evidence into ideologically based assumptions.

I am interested in an explanation of what was sexist about this suggestion. At a certain level of abstraction, the peer reviewer suggested that a manuscript on gender bias written solely by authors of one sex might be improved by having authors of another sex read or contribute to the manuscript in order to provide a different perspective.

The part of the peer review that is public did not suggest that the female authors consult male authors to improve the manuscript's writing or to improve the manuscript's statistics; the part of the peer review that is public did not suggest consultation with male authors on a manuscript that had nothing to do with sex. It would be sexist to suggest that persons of one sex consult persons of another sex to help with statistics or to help interpret results from a chemical reaction. But that did not happen here: the suggestion was only that members of one sex consult members of the other sex in the particular context of helping to improve the *interpretation of data* in a manuscript *about gender bias.*

Consider this hypothetical. The main professional organization in biology decides to conduct research and draft a statement on gender bias in biology. The team selected to perform this task includes only men. The peer reviewer from this episode suggests that including women on the team would help "serve as a possible check against interpretations that may sometimes be drifting too far away from empirical evidence into ideologically based assumptions." Is that sexism, too? If not, why not? If so, then when ‒ if ever ‒ is it not sexist to suggest that gender diversity might be beneficial?

---

Six notes:

1. I am not endorsing the peer review. I think that the peer review should have instead suggested having someone read the manuscript who would be expected to provide help thinking of and addressing alternate explanations; there is no reason to expect a man to necessarily provide such assistance.

2. The peer review mentioned particular sex differences as possible alternate explanations for the data. Maybe suggesting those alternate explanations reflects sexism, but I think that hypotheses should be characterized in terms such as substantiated or unsubstantiated instead of in terms such as sexist or inappropriate.

3. It is possible that the peer reviewer would not have suggested in an equivalent case that male authors consult female authors; that would be fairly characterized as sexism, but there is, as far as I know, no evidence of the result of this counterfactual; moreover, what the peer reviewer would have done in an equivalent case concerns only the sexism of the peer reviewer and not the sexism of the peer review.

4. I have no doubt that women in academia face bias in certain situations, and I can appreciate why this episode might be interpreted as additional evidence of gender bias. If the argument is that there is an asymmetry that makes it inappropriate to think about this episode in general terms, I can understand that position. But I would appreciate guidance about the nature and extent of this asymmetry.

5. Maybe writing a manuscript is an intimate endeavor, such that suggesting new coauthors is offensive in a way that suggesting new coauthors for a study by a professional organization is not. But that's an awfully nuanced position that would have been better articulated in an #addauthorgate hashtag.

6. Maybe the problem is that gender diversity works only or best in a large group. But that seems backwards, given that the expectation would be that a lone female student would have more of a positive influence in a class of 50 male students than in a class of 2 male students.

---

UPDATE (May 4, 2015)

Good response here by JJ, Ph.D to my hypothetical.

You might have seen a Tweet or Facebook post on a recent study about sex bias in teacher grading:

Here is the relevant section from Claire Cain Miller's Upshot article in the New York Times describing the study's research design:

Beginning in 2002, the researchers studied three groups of Israeli students from sixth grade through the end of high school. The students were given two exams, one graded by outsiders who did not know their identities and another by teachers who knew their names.

In math, the girls outscored the boys in the exam graded anonymously, but the boys outscored the girls when graded by teachers who knew their names. The effect was not the same for tests on other subjects, like English and Hebrew. The researchers concluded that in math and science, the teachers overestimated the boys' abilities and underestimated the girls', and that this had long-term effects on students' attitudes toward the subjects.

The Upshot article does not mention that the study's first author had previously published another study using the same methodology, but with the other study finding a teacher grading bias against boys:

The evidence presented in this study confirms that the previous belief that schoolteachers have a grading bias against female students may indeed be incorrect. On the contrary: on the basis of a natural experiment that compared two evaluations of student performance–a blind score and a non-blind score–the difference estimated strongly suggests a bias against boys. The direction of the bias was replicated in all nine subjects of study, in humanities and science subjects alike, at various level of curriculum of study, among underperforming and best-performing students, in schools where girls outperform boys on average, and in schools where boys outperform girls on average (p. 2103).

This earlier study was not mentioned in the Upshot article and does not appear to have been mentioned in the New York Times ever. The Upshot article appeared in the print version of the New York Times, so it appears that Dr. Lavy has also conducted a natural experiment in media bias: report two studies with the same methodology but opposite conclusions, to test whether the New York Times will report on only the study that agrees with liberal sensibilities. That hypothesis has been confirmed.