I drafted a manuscript entitled "Six Things Peer Reviewers Can Do To Improve Political Science". It was rejected once in peer review, so I'll post at least some of the ideas to my blog. This first blog post is about comments on the Valentino et al. 2018 "Mobilizing Sexism" Public Opinion Quarterly article. I sent this draft of the manuscript to Valentino et al. on June 11, 2018, limited to the introduction and parts that focus on Valentino et al. 2018; the authors emailed me back comments on June 12, 2018, which Dr. Valentino asked me to post and that I will post after my discussion.

1. Unreported tests for claims about group differences

Valentino et al. (2018) report four hypotheses, the second of which is:

Second, compared to recent elections, the impact of sexism should be larger in 2016 because an outwardly feminist, female candidate was running against a male who had espoused disdain for women and the feminist project (pp. 219-220).

Here is the discussion of their Study 2 results in relation to that expectation:

The pattern of results is consistent with expectations, as displayed in table 2. Controlling for the same set of predispositions and demographic variables as in the June 2016 online study, sexism was significantly associated with voting for the Republican candidate only in 2016 (b = 1.69, p < .05) (p.225).

However, as Gelman and Stern 2006 observed, "comparisons of the sort, 'X is statistically significant but Y is not,' can be misleading" (p. 331). In Table 2 of Valentino et al. 2018, the sexism predictor in the 2016 model had a logit coefficient of 1.69 and a standard error of 0.81, and the p-value under .05 for this sexism predictor provides information about only whether the 2016 sexism coefficient differs from zero; this p-value under .05 does not indicate whether, at p<.05, the 2016 sexism coefficient differs from the imprecisely estimated sexism coefficients of 0.23, 0.94, and 0.34 for 2012, 2008, and 2004. That difference in coefficients between sexism in 2016 and sexism in the other years is what would be needed to test the second hypothesis about the impact of sexism being larger in 2016.

2. No summary statistics reported for a regression-based inference about groups

Valentino et al. 2018 Table 2 indicates that, compared to lower levels of participant modern sexism, higher levels of participant modern sexism associate with a greater probability of a participant reported vote for Donald Trump in 2016. But the article does not report the absolute mean levels of modern sexism among Trump voters or Clinton voters. These absolute mean levels are in the figure below, limited to participants in face-to-face interviews (per Valentino et al. 2019 footnote 8):

Results in the above image indicate that the mean response across Trump voters represented beliefs:

  • that the news media should pay the same amount of attention to discrimination against women that they have been paying lately;
  • that, when women complain about discrimination, they cause more problems than they solve less than half the time;
  • and that, when women demand equality these days, less than half of the time they are actually seeking special favors.

These don't appear to be obviously sexist beliefs in the sense that I am aware of evidence that the beliefs incorrectly or unfairly disadvantage or disparage women or men, but comments are open below if you know of evidence or have an argument that the mean Trump voter response is sexist for any of these three items. Moreover, it's not clear to me that sexism can be inferred based on measures about only one sex; if, for instance, a participant believes that, when women complain about discrimination, they cause more problems than they solve, and the participant also believes that, when men complain about discrimination, they cause more problems than they solve, then it does not seem reasonable to code that person as a sexist, without more information.

---

Response from Valentino et al.

Here is the response that I received from Valentino et al.

1) Your first concern was that we did not discuss one of the conditions in our MTurk study, focusing on disgust. The TESS reference is indeed the same study. However, we did not report results from the disgust condition because we did not theorize about disgust in this paper. Our theory focuses on the differential effects of fear vs. anger. We are in fact quite transparent throughout, indicating where predicted effects are non-significant. We also include a lengthy appendix with several robustness checks, etc. 

2) We never claim all Trump voters are sexist. We do claim that in 2016 gender attitudes are a powerful force, and more conservative scores on these measures significantly increase the likelihood of voting for Trump. The evidence from our work and several other studies supports this simple claim handsomely. Here is a sample of other work that replicates the basic finding in regarding the power of sexism in the 2016 election. Many of these studies use ANES data, as we do, but there are also several independent replications using different datasets. You might want to reference them in your paper. 

Blair, K. L. (2017). Did Secretary Clinton lose to a ‘basket of deplorables’? An examination of Islamophobia, homophobia, sexism and conservative ideology in the 2016 US presidential election. Psychology & Sexuality8(4), 334-355. 

Bock, J., Byrd-Craven, J., & Burkley, M. (2017). The role of sexism in voting in the 2016 presidential election. Personality and Individual Differences119, 189-193. 

Bracic, A., Israel-Trummel, M., & Shortle, A. F. (2018). Is sexism for white people? Gender stereotypes, race, and the 2016 presidential election. Political Behavior, 1-27.

Cassese, E. C., & Barnes, T. D. (2018). Reconciling Sexism and Women's Support for Republican Candidates: A Look at Gender, Class, and Whiteness in the 2012 and 2016 Presidential Races. Political Behavior, 1-24. 

Cassese, E., & Holman, M. R. Playing the woman card: Ambivalent sexism in the 2016 US presidential race. Political Psychology

Frasure-Yokley, L. (2018). Choosing the Velvet Glove: Women Voters, Ambivalent Sexism, and Vote Choice in 2016. Journal of Race, Ethnicity and Politics3(1), 3-25. 

Ratliff, K. A., Redford, L., Conway, J., & Smith, C. T. (2017). Engendering support: Hostile sexism predicts voting for Donald Trump over Hillary Clinton in the 2016 US presidential election. Group Processes & Intergroup Relations, 1368430217741203. 

Schaffner, B. F., MacWilliams, M., & Nteta, T. (2018). Understanding white polarization in the 2016 vote for president: The sobering role of racism and sexism. Political Science Quarterly133(1), 9-34. 

3) We do not statistically compare the coefficients across years, but neither do we claim to do so. We claim the following:

"Controlling for the same set of predispositions and demographic variables as in the June 2016 online study, sexism was significantly associated with voting for the Republican candidate only in 2016 (b = 1.69, p < .05). ...In conclusion, evidence from two nationally representative surveys demonstrates sexism to be powerfully associated with the vote in the 2016 election, for the first time in at least several elections, above and beyond the impact of other typically influential political predispositions and demographic characteristics."

Therefore, we predict (and show) sexism was a strong predictor in 2016 but not in other years. Our test is also quite conservative, since we include in these models all manner of predispositions that are known to be correlated with sexism. In Table 2, the confidence interval around our 2016 estimate for sexism in these most conservative models contains the estimate for 2008 in that analysis, and is borderline for 2004 and 2012, where the impact of sexism was very close to zero. However, the bivariate logit relationships between sexism and Trump voting are much more distinct, with 2016 demonstrating a significantly larger effect than the other years. These results are easy to produce with ANES data.

---

Regarding the response from Valentino et al.:

1. My concern is that the decision about what to focus on in a paper is influenced by the results of the study. If a study has a disgust condition, then a description of the results of that disgust condition should be reported when results of that study are reported; otherwise, selective reporting of conditions could bias the literature.

2. I'm not sure that anything in their point 2 addresses anything my manuscript.

3. I realize that Valentino et al. 2018 did not report or claim to report results for a statistical test comparing the sexism coefficient in 2016 to sexism coefficients in prior years. But that reflects my criticism: that, for the hypothesis that "compared to recent elections, the impact of sexism should be larger in 2016…" (Valentino et al. 2018: 219-220), the article should have reported a statistical test to assess the evidence that the sexism coefficient in 2016 was different than than the sexism coefficient in prior recent elections.

---

NOTE

Code for the figure.

Tagged with:

Gronke et al. (2018) reported in Table 6 that "Gender Bias in Student Evaluations" (Mitchell and Martin 2018, hereafter MM) was, as of 25 July 2018, the PS: Political Science & Politics article with the highest Altmetric score, described as "a measure of attention an article receives" (p. 906, emphasis removed).

The MM research design compared student evaluations of and comments on Mitchell (a woman) to student evaluations of and comments on Martin (a man) in official university course evaluations and on the Rate My Professors website. MM reported evidence that "the language students use in evaluations regarding male professors is significantly different than language used in evaluating female professors" and that "a male instructor administering an identical online course as a female instructor receives higher ordinal scores in teaching evaluations, even when questions are not instructor-specific" (p. 648).

I think that there are errors in the MM article that warrant a correction. I mention or at least allude to some or all of these things in a forthcoming symposium piece in PS: Political Science & Politics, but I elaborate below. Comments are open if you see an error in my analyses or inferences.

---

1.

MM Table 1 reports on comparisons of official university course evaluations for Mitchell and for Martin. The table indicates that the sample size was 68, and the file that Dr. Mitchell sent me upon my request has 23 of these comments for Martin and 45 of these comments for Mitchell. Table 1's "Personality" row indicates 4.3% for Martin and 15.6% for Mitchell, which correspond to 1 personality-related comment of 23 comments for Martin and 7 personality-related comments of 45 comments for Mitchell. The table has three asterisks to indicate a p-value less than 0.01 for the comparison of the 4.3% and the 15.6%, but it is not clear how such a low p-value was derived.

I conducted a simulation in R to estimate, given 8 personality-related comments across 68 comments, how often random distribution of these 8 personality-related comments would results in Martin's 23 comments having 1 or fewer personality-related comment. For the simulation, for 10 million trials, I started with eight 1s and sixty 0s, drew 23 of these 68 numbers to represent comments on Martin, and calculated the difference between the proportion of 1s for Martin and the proportion of 1s in the residual numbers (representing comments on Mitchell):

list <- rep_len(NA,10000000)
for (i in 1:10000000){
   comments <- c(rep_len(1,8),rep_len(0,60))
   martin <- sample(comments,23,replace=FALSE)
   diff.prop <- sum(martin)/23 - (8-sum(martin))/45
   list[i] <- diff.prop
}
stack(table(list))

Here are results from the simulation:

   values                 ind
1  290952  -0.177777777777778
2 1412204   -0.11207729468599
3 2788608 -0.0463768115942029
4 2927564  0.0193236714975845
5 1782937   0.085024154589372
6  646247   0.150724637681159
7  135850   0.216425120772947
8   14975   0.282125603864734
9     663   0.347826086956522

The -0.1778 in line 1 represents 0 personality-related comments of 23 comments for Martin and 8 personality-related comments of 45 comments for Mitchell (0% to 17.78%), which occurred 290,952 times in the 10 million simulations (2.9 percent of the time). The -0.1121 in line 2 represents 1 personality-related comment of 23 comments for Martin and 7 personality-related comments of 45 comments for Mitchell (4.3% to 15.6%), which occurred 1,412,204 times in the 10 million simulations (14.1 percent of the time). So the simulation indicated that Martin receiving only 1 or fewer of the 8 personality-related comments would be expected to occur about 17 percent of the time if the 8 personality-related comments were distributed randomly. But recall that the MM Table 1 asterisks for this comparison indicate a p-value less than 0.01.

MM Table 2 reports on comparisons of Rate My Professors comments for Mitchell and for Martin, with a reported sample size of N=54, which is split into sample sizes of 9 for Martin and 45 for Mitchell in the file that Dr. Mitchell sent me upon my request; the nine comments for Martin are still available at the Rate My Professors website. I conducted another simulation in R for the incompetency-related comments, in which corresponding proportions were 0 of 9 for Martin and 3 of 45 for Mitchell (0% to 6.67%).

list <- rep_len(NA,10000000)
for (i in 1:10000000){
   comments <- c(rep_len(1,3),rep_len(0,51))
   martin <- sample(comments,9,replace=FALSE)
   diff.prop <- sum(martin)/9 - (3-sum(martin))/45
   list[i] <- diff.prop
}
stack(table(list))

Here are results from the simulation:

   values                 ind
1 5716882 -0.0666666666666667
2 3595302  0.0666666666666667
3  653505                 0.2
4   34311   0.333333333333333

The -0.0667 in line 1 represents 0 incompetency-related comments of 9 comments for Martin and 3 incompetency-related comments of 45 comments for Mitchell (0% to 6.67%), which occurred 5,716,882 times in 10 million simulations (57 percent of the time). So the simulation indicated that Martin's 9 comments having zero of the 3 incompetency-related comments would be expected to occur about 57 percent of the time if the 3 incompetency-related comments were distributed randomly. The MM Table 2 asterisk for this comparison indicates a p-value less than 0.1.

I have concerns about other p-value asterisks in MM Table 1 and MM Table 2, but I will not report simulations for those comparisons here.

---

2.

MM Table 4 inferential statistics appear to be unadjusted for the lack of independence of some observations. Click here, then Search by Course > Spring 2015 > College of Arts and Sciences > Political Science > POLS 2302 (or click here). Each "Total Summary" row at the bottom has 218 evaluations; for example, the first item of "Overall the instructor(s) was (were) effective" has 43 strongly agrees, 55 agrees, 75 neutrals, 24 disagrees, and 21 strongly disagrees, which suggests that 218 students completed these evaluations. But the total Ns reported in MM Table 4 are greater than 218. For example, the "Course" line in MM Table 4 has an N of 357 for Martin and an N of 1,169 for Mitchell, which is a total N of 1,526. That 1,526 is exactly seven times 218, and the MM appendix indicates that the student evaluations had 7 "Course" items.

Using this code, I reproduced MM Table 4 t-scores closely or exactly by treating each observation as independent and conducting a t-test assuming equal variances, suggesting that MM Table 4 inferential statistics were not adjusted for the lack of independence of some observations. However, for the purpose of calculating inferential statistics, multiple ratings from the same student cannot be treated as if these were independent ratings.

The aforementioned code reports p-values for individual-item comparisons of evaluations for Mitchell and for Martin, which avoids the problem of a lack of independence for some student responses. But I'm not sure that much should be made of any differences detected or not detected between evaluations for Mitchell and evaluations for Martin, given the lack of randomization of students to instructors or any evidence that the students in Mitchell's sections were sufficiently equal before the course to the students in Martin's sections, and given the possibility that students in these sections might have already has courses or interactions with Mitchell and/or Martin and that the evaluations reflected these prior experiences.

---

3.

Corrected inferential statistics for MM Table 1 and MM Table 2 would ideally reflect consideration of whether non-integer counts of comments should be used, as MM appears to have done. Multiplying proportions in MM Table 1 and MM Table 2 by sample sizes from the MM data produces some non-integer counts of comments. For example, the 15.2% for Martin in the MM Table 1 "Referred to as 'Teacher'" row corresponds to 3.5 of 23 comments, and the 20.9% for Mitchell in the MM Table 2 "Personality" row corresponds to 9.4 of 45 comments. Based on the data that Dr. Mitchell sent me, it seems that a comment might have been discounted by the number of sentences in the comment; for example, four of the official university course evaluations comments for Martin contain the word "Teacher", but the percentage for Martin is not 4 of 23 comments (17.4%) but is instead 3.5 of 23 comments (15.2%), presumably because one of the "teacher" comments had two sentences, only one of which referred to Martin as a teacher; the other three comments that referred to Martin as a teacher did not have multiple sentences.

Corrected inferential statistics for MM Table 1 and MM Table 2 for the frequency of references to the instructors as a professor should reflect consideration of the instructors' titles and job titles. For instance, for MM Table 1, the course numbers in the MM data match course listings for the five courses that Mitchell or Martin taught face-to-face at Texas Tech University in Fall 2015 or Spring 2015 (see here):

Mitchell
POLS 3312 Game Theory [Fall 2015]
POLS 3361 International Politics: Honors [Spring 2015]
POLS 3366 International Political Economy [Spring 2015]

Martin
POLS 3371 Comparative Politics [Fall 2015]
POLS 3373 Governments of Western Europe [Spring 2015]

Online CVs indicated that Mitchell's CV listed her Texas Tech title in 2015 as Instructor and that Martin's CV listed his Texas Tech title in 2015 as Visiting Professor.

A correction could also discuss the fact that, while Mitchell is referred to as "Dr." 19 times across all MM Table 1 and MM Table 2 comments, none of these comments refer to Martin as "Dr.". Martin's CV indicated that he earned his Ph.D. in 2014, so I do not see how non-reporting of references to Mitchell and Martin as "Dr." in the official student evaluations in MM Table 1 can be attributed to some comments being made before Martin received his Ph.D. Rate My Professors comments for Martin date to November 2014; however, even if the non-reporting of references to Mitchell and Martin as "Dr." in MM Table 2 can be attributed to some comments being made before Martin received his Ph.D., any use of "Professor" for Martin must be discounted because students presumably more titles to refer to Mitchell (e.g., "Dr.", "Professor") than to refer to Martin (e.g., "Professor").

---

Other notes:

---

4.

PS: Political Science & Politics should require authors to upload data and code so that readers can more clearly assess what the authors did.

---

5.

MM Table 4 data appear to have large percentages of enrolled students who did not evaluate Mitchell or Martin. Texas Tech data for Spring 2015 courses here indicate that enrollment for Mitchell's four sections of the course used in the study was 247 (section D6), 247 (section D7), 243 (section D8), and 243 (section D9), and that enrollment for Martin's two sections of the course was 242 (section D10) and 199 students (section D11). Mitchell's evaluations had ratings for 167 students of the 980 students in her courses, for a 17.0 response rate, and Martin's evaluations had ratings for 51 students of his 441 students, for an 11.6 percent response rate. It's possible that Mitchell's nearly 50 percent higher response rate did not affect differences in mean ratings between the instructors, but the difference in response rates would have been relevant information for the article to include.

---

6.

MM state (p. 652, emphasis in the original):

"To reiterate, of the 23 questions asked, there were none in which a female instructor received a higher rating."

My calculations indicate that Mitchell received a higher rating than Martin did on 3 of the 23 MM Table 4 items: items 17, 21, and 23. Moreover, MM Table 4 indicates that the mean for Mitchell was higher than the mean for Martin across the three Technology items. I think that the "there were none" statement is intended to indicate that Mitchell did not receive a higher rating than Martin did on any of the items for which the corresponding p-value was sufficiently low, but, if that's the case, then that should be stated clearly because the statement can otherwise be misleading.

But I'm curious how MM could have reported a difference in favor of Mitchell if MM were reporting results using one-tailed statistical tests to detect a difference in favor of Martin, as I read the MM Table 4 Technology line to indicate, with a t-score of 1.93 and a p-value of 0.027.

---

7.

MM reports that the study indicated that "a male instructor administering an identical online course as a female instructor receives higher ordinal scores in teaching evaluations, even when questions are not instructor-specific" (p. 648). But that was not always true: as indicated above, MM Table 4 even indicates that the mean for Mitchell was higher than the mean for Martin across the three not-instructor-specific Technology items.

---

8.

The MM appendix (p. 4) indicated that:

Students had a tendency to enroll in the sections with the lowest number initially (merely because those sections appeared first in the registration list). This means that section 1 tended to fill up earlier than section 3 or 4. It may also be likely that students who enroll in courses early are systematically different than those who enroll later in the registration period; for example, they may be seniors, athletes, or simply motivated students. For this reason, we examined sections in the mid- to high- numerical order: sections 6, 7, 8, 9, and 10.

The last line should indicate that data were from sections 6 to 11. See the sample sizes for Martin in the Texas Tech website data: item 1 for section D10 has student evaluation sample sizes of 6, 12, 10, 1, and 3, for a total of 32; adding the sample for item 1 from section D11 (7, 5, 6, 1, 0) raises that to 51; multiplying 51 times 7 produces 357, which is the sample size for Martin in the "Course" section of MM Table 4.

---

9.

I think that Blåsjö (2018) interpreted the statement that "For this reason, we examined sections in the mid- to high- numerical order: sections 6, 7, 8, 9, and 10" as if Mitchell and Martin collected data for other sections but did not analyze these data. Blåsjö: "Actually the researchers threw away at least half of the actual data". I think that that is a misreading of the (perhaps unclear) statement quoted above from the MM appendix. From what I can tell based on the data at the Texas Tech site, data were collected for only sections 6 to 11.

---

NOTE:

Thanks to representatives from the Texas Tech IRB and the Illinois State University IRB, respectively, for providing and forwarding the link to the Texas Tech student evaluations.

Tagged with:

The Monkey Cage published a post by Dawn Langan Teele and Kathleen Thelen: "Some of the top political science journals are biased against women. Here's the evidence." The evidence presented for the claim of bias appears to be that women represent a larger percentage of the political science discipline than of authors in top political science journals. But that doesn't mean that the journals are biased against women, and the available data that I am aware of also doesn't indicate that the journals are biased against women:

1. Discussing data from World Politics (1999-2004), International Organization (2002), and Comparative Political Studies and International Studies Quarterly (three undisclosed years), Breuning and Sanders 2007 reported that "women fare comparatively well and appear in each journal at somewhat higher rates than their proportion among submitting authors" (p. 350).

2. Data for the American Journal of Political Science reported by Rick Wilson here indicated that 32% of submissions from 2010 to 2013 had at least one female author and 35% of accepted articles had at least one female author.

3. Based on data from 1983 to 2008 in the Journal of Peace Research, Østby et al. 2013 reported that: "If anything, female authors are more likely to be selected for publication [in JPR]".

4. Data below from Ishiyama 2017 for the American Political Science Review from 2012 to 2016 indicate that women served as first author for 27% of submitted manuscripts and 25% of accepted manuscripts.

APSR Data---

The data across the four points above do not indicate that these journals or corresponding peer reviewers are biased against women in this naive analysis. Of course, causal identification of bias would require a more representative sample beyond the largely volunteered data above and would require, for claims of bias among peer reviewers, statistical control for the quality of submissions and, for claims of bias at the editor level, statistical control for peer reviewer recommendations; analyses would get even more complicated accounting for the possibility that editor bias can influence peer reviewers selection, which can make the process easier or more difficult than would occur with unbiased assignment to peer reviewers.

Please let me know if you are aware of any other relevant data for political science journals.

---

NOTE

1 The authors of the Monkey Cage post have an article that cites Breuning and Sanders 2007 and Østby et al. 2013, but these data were not mentioned in the Monkey Cage post.

Tagged with: , ,

Based on a sample of undergraduate students at a university in Texas, Anderson et al. 2009 reported (p. 216) that:

Contrary to popular beliefs, feminists reported lower levels of hostility toward men than did nonfeminists.

But this stereotype-inconsistent pattern was based a coding of "feminist" that reflected whether a participant had defined "feminist" "in a way consistent with our operational definition of feminism" (p. 220) and not based on whether the participant self-identified as a feminist, a self-identification for which the researchers had data.

---

I assessed claims about self-identified feminists' views of men using data from the ANES 2016 Time Series Survey national sample. My first predictor was a dichotomous measure of sex, coded 1 for female and 0 for male. My second predictor was self-identified feminist, coded as 1 for a participant who identified as a feminist or strong feminist in variable V161345.

The best available dataset measures to construct a measure of negative attitudes toward men were measures of perceived levels of discrimination against men and women in the United States (V162363 and V162362, respectively). I coded participants as 1 in a dichotomous variable if the participant indicated "none at all" for the amount of discrimination against men in the United States but indicated a nonzero level of discrimination against women in the United States. Denial of discrimination is a plausible measure of negative attitudes toward a group that faces discrimination, and there is statistical evidence that men in the United States face discrimination in areas such as criminal sentencing (e.g., Doerner 2012 and Starr 2015); moreover, men are formally excluded from certain opportunities, such as opportunities at the NSF-funded Visions in Methodology conference.

---

In weighted regressions, 37% of nonfeminist women reported no discrimination against men and a nonzero level of discrimination against women, compared to 46% of feminist women, with a p-value of p=0.002 for the 9 percentage-point difference. However, the gap between feminist men and nonfeminist men was 20 percentage points, with 28% of nonfeminist men reporting no discrimination against men and a nonzero level of discrimination against women, compared to 48% of feminist men, with a p-value less than 0.001 for the difference. Feminist identification was thus associated with an 11 percentage-point larger difference in anti-male attitudes for men than for women, with a p-value for the difference of p=0.012.

Output for the interaction model is below:

denialDM

---

NOTES

1. My Stata code is here. ANES 2016 Time Series Study data is available here.

2. The denialDM output variable is dichotomous, but estimates and inferences do not change if logit is used instead of linear regression.

3. The dataset has another question (V161346) that asked participants how well "feminist" described them, on a 5-point scale (extremely well, very well, somewhat well, not very well, and not at all); inferences are the same using that measure. Inferences are also the same using V161345 to make a 3-part feminist measure coded from non-feminist to strong feminist. See the Stata code.

4. Hat tip to Nathaniel Bechhofer, who retweeted this tweet, which led to this post.

Tagged with:

According to its website, Visions in Methodology "is designed to address the broad goal of supporting women who study political methodology" and "serves to connect women in a field where they are under-represented." The Call for Proposals for the 2017 VIM conference indicates that submissions were restricted to women:

We invite submissions from female graduate students and faculty that address questions of measurement, causal inference, the application of advanced statistical methods to substantive research questions, as well as the use of experimental approaches (including incentivized experiments)...Please consider applying, or send this along to women you believe may benefit from participating in VIM!

Here is the program for the 2016 VIM conference, which lists activities restricted to women, lists conference participants (which appear to be only women), and has a photo that appears to be from the conference (which appears to have only women in the photo).

The 2017 VIM conference webpage indicates that the conference is sponsored by several sources such as the National Science Foundation and the Stony Brook University Graduate School. But page 118 of the NSF's Proposal & Award Policies & Procedures Guide (PAPPG) of January 2017 states:

Subject to certain exceptions regarding admission policies at certain religious and military organizations, Title IX of the Education Amendments of 1972 (20 USC §§ 1681-1686) prohibits the exclusion of persons on the basis of sex from any education program or activity receiving Federal financial assistance.  All NSF grantees must comply with Title IX.

The VIM conference appears to be an education program or activity receiving Federal financial assistance and, as such, submissions and conference participation should not be restricted by sex.

---

NOTES:

1. This Title IX Legal Manual discusses what constitutes an education program or activity:

While Title IXs antidiscrimination protections, unlike Title VI’s, are limited in coverage to "education" programs or activities, the determination as to what constitutes an "education program" must be made as broadly as possible in order to effectuate the purposes of both Title IX and the CRRA. Both of these statutes were designed to eradicate sex-based discrimination in education programs operated by recipients of federal financial assistance, and all determinations as to the scope of coverage under these statutes must be made in a manner consistent with this important congressional mandate.

2. I think that the relevant NSF award is SES 1324159, which states that part of the project will "continue a series of small meetings for women methodologists that deliberately mix senior leaders in the subfield with young, emerging scholars who can benefit substantially from such close personal interaction." This page indicates that the 2014 VIM conference received support from NSF grant SES 1120976.

---

UPDATE [June 20, 2019]

I learned from a National Science Foundation representative of a statute (42 U.S. Code § 1885a) that permits the National Science Foundation to fund women-only activities listed in the statute. However, the Visions in Methodology conference has been funded by host organizations such as Stony Brook University, and I have not yet uncovered any reason why host institutional covered by Title IX would not be in violation of Title IX in funding single-sex educational opportunities.

Tagged with: ,

The above tweet links to this article discussing a study of hiring outcomes for 598 job finalists in finalist groups of 3 to 11 members.

The finalist groups in the sample ranged from 3 to 11 members, but the data in the figure are restricted to an unreported number of groups with exactly 4 members. The likelihoods in the figure of 0%, 50%, and 67% did not suggest large samples, so I emailed the faculty authors at Stefanie.Johnson [at] colorado.edu (on April 26) and david.hekman [at] colorado.edu (on May 2) asking for the data or for information on the sample sizes for the figure likelihoods. I also asked whether a woman was hired from a pool of any size in which only one finalist was a woman. I later tweeted a question to the faculty author who I found on Twitter.

I have not yet received a reply from either of these faculty authors.

I acknowledge researchers who provide data, code, and/or information upon request, so I thought it would be a good idea to note the researchers who don't.

Tagged with: ,

Pursuant to a request from Nathaniel Bechhofer, in this post I discuss the research reported in "The Effect of Gender Norms in Sitcoms on Support for Access to Abortion and Contraception", by Nathaniel Swigger. See here for a post about the study and here for the publication.

---

Disclosure: For what it's worth, I met Nathaniel Swigger when I was on the job market.

---

1. I agree with Nathaniel Bechhofer that the Limitations section of Swigger 2016 is good.

2. The article does a good job with disclosures, at least implied disclosures:

I don't think that there are omitted outcome variables because the bottom paragraph of page 9 and Table 1 report on multiple outcome variables that do not reach statistical significance (the first Results paragraph reports the lack of statistical significance for the items about federal insurance paying for abortion and spending on women's shelters). After reading the blog post, I thought it was odd to devote seven items to abortion and one item to contraception insurance, but in a prior publication Swigger used seven items for abortion, one item for contraception insurance, and items for government insurance for abortion.

I don't think that there are omitted conditions. The logic of the experiment does not suggest a missing condition (like here). Moreover, the article notes that results are "not quite in the way anticipated by the hypotheses" (p. 11), so I'm generally not skeptical about underreporting for this experiment, especially given the disclosure of items for which a difference was not detected.

3. I'm less certain that this was the only experiment ever conducted testing these hypotheses, but I'm basing this on underreporting in social science generally and not on any evidence regarding this experiment. I'd like for political science journals to adopt the requirement for—or for researchers to offer—disclosure regarding the completeness of the reporting of experimental conditions, potential outcome and explanatory variables, and stopping rules for data collection.

4. The estimated effect size for the abortion index is very large. Based on Table 1, the standard deviation for the abortion index was 4.82 (from a simple mean of the conditions because I did not see an indication of the number of cases per condition). For the full sample, the difference between the How I Met Your Mother and Parks and Recreation conditions was 5.57 for the abortion index, which corresponds to an estimate of d of 1.16, which—based on this source—falls between the effect size for men being heavier than women (d=1.04) and liberals liking Michelle Obama more than conservatives do (d=1.26). For another comparison, the How I Met Your Mother versus Parks and Recreation difference caused a 5.57 difference on the abortion index, which is less than the 4.47 difference between Catholics and persons who are not Christian or Muslim.

The experiment had 87 participants after exclusions, across three conditions. A power calculation indicated that 29 participants per condition would permit detection of a relatively large d=0.74 effect size 80 percent of the time. Another way to think of the observed d=1.16 effect size is that, if the experiment were conducted over and over again with 29 participants per condition, 99 times of 100 the experiment would be expected to detect a difference on the abortion index between the How I Met Your Mother and Parks and Recreation conditions.

Table 3 output for the dichotomous contraception insurance item is in logit coefficients, but Table 1 indicates the effect sizes more intuitively, with means for the How I Met Your Mother and Parks and Recreation conditions of 0.19 and 0.50, which is about a difference of a factor of 2.6. The control condition mean is 0.69, which corresponds to a factor of 3.6 difference compared to the How I Met Your Mother condition.

---

In conclusion, I don't see anything out of the ordinary in the reported analyses, but the effect sizes are larger than I would expect. Theoretically, the article notes on page 7 that the How I Met Your Mother and Parks and Recreation stimuli differ in many ways, so it's impossible to isolate the reason for any detected effect, so it's probably best to describe the results in more general terms about the effect of sitcoms, as Sean McElwee did.

Tagged with: