Tour of research on student evaluations of teaching [16-18]: Huston 2006, Miles and House 2015, and Martin 2016
Let's continue our discussion of studies in Holman et al. 2019 "Evidence of Bias in Standard Evaluations of Teaching" listed as "finding bias". See here for the first entry in the series and here for other entries.
---
16.
Huston 2006 "Race and Gender Bias in Higher Education: Could Faculty Course Evaluations Impede Further Progress toward Parity" is a review that, as far as I can tell, does not report novel data on unfair sex or race bias in student evaluations of teaching.
Sandler 1991 "Women Faculty at Work in the Classroom: Or, Why It Still Hurts To Be a Woman in Labor" is a review/essay-type of publication.
---
17.
Miles and House 2015 "The Tail Wagging the Dog; An Overdue Examination of Student Teaching Evaluations" [sic for the semicolon] reported on an analysis of student evaluations from a southwestern university College of Business, with 30,571 cases from 2011 through 2013 for 255 professors across 1,057 courses with class sizes from 10 to 190. The mean rating for the 774 male-instructed courses did not statistically differ from the mean rating for the 279 female-instructed courses (p=0.33), but Table 7 indicates that the 136 male-instructed large required courses had a higher mean rating than the 30 female-instructed large required courses (p=0.01). I don't see results reported for a gender difference in small courses.
For what it's worth, page 121 incorrectly notes that scores from male-instructed courses range from 4.96 to 4.26; the 4.96 should be 4.20 based on the lower bound of 4.196 in Table 4. Moreover, Hypothesis 6 is described as regarding a gender difference for "medium and large sections of required classes" (p. 119) but the results are for "large sections of required classes" (p. 122, 123) and the discussion of Hypothesis 6 included elective courses (p. 119), so it's not clear why medium classes and elective courses weren't included in the Table 7 analysis.
---
18.
Martin 2016 "Gender, Teaching Evaluations, and Professional Success in Political Science" reports on publicly available student evaluations for undergraduate political science courses from a southern R1 university from 2011 through 2014 and a western R1 university from 2007 through 2013. Results for the items, on a five-point scale, indicated little gender difference in small classes of 10 students, a mean male instructor rating 0.1 and 0.2 points higher than the mean female instructor rating for classes of 100, and a mean male instructor rating 0.5 points higher than the mean female instructor rating for classes of 200 or 400.
The statistical models had predictors only for instructor gender, class size, and an interaction term of instructor gender and class size. No analysis was reported that assessed whether ratings could be accounted for by plausible alternate explanations such as course or faculty performance.
---
Comments are open if you disagree, but I don't think that any of these three studies report a novel test for unfair sex or race bias in student evaluations of teaching using a research design with internal validity, with internal validity referring to an analysis that adequately addresses plausible alternate explanations. The interaction of instructor gender and class size that appeared in Miles and House 2015 and Martin 2016 appears to be worth further consideration in a research design that adequately addresses plausible alternate explanations.