Tour of research on student evaluations of teaching [13-15]: Smith and Hawkins 2011, Reid 2010, and Subtirelu 2015
Let's continue our discussion of studies in Holman et al. 2019 "Evidence of Bias in Standard Evaluations of Teaching" listed as "finding bias". See here for the first entry in the series and here for other entries.
---
13.
Smith and Hawkins 2011 "Examining Student Evaluations of Black College Faculty: Does Race Matter" analyzed "undergraduate student ratings data for tenure-track faculty who used the 36-item student evaluation form adapted by the college" (p. 152), over a three-year period for the College of Education at a southeastern research university. Mean ratings from low to high were for Black faculty, White faculty, and nonwhite nonblack faculty.
No analysis was reported that assessed whether ratings on the items could be explained by plausible alternate explanations such as course or faculty performance.
---
14.
Reid 2010 "The Role of Perceived Race and Gender in the Evaluation of College Teaching on RateMyProfessors.com" reported on RateMyProfessors data for faculty at the 25 highest ranked liberal arts colleges. Table 3 indicated that the mean overall quality ratings by race were: White (3.89), Other (3.88), Latino (3.87), Asian (3.75), and Black (3.48). Table 4 indicated that the mean overall quality ratings by gender were: male (3.87) and female (3.86).
No analysis was reported that assessed whether ratings on the overall quality item or the more specific items could be explained by plausible alternate explanations such as faculty department, course, or faculty performance.
---
15.
Subtirelu 2015 "'She Does Have an Accent but…': Race and Language Ideology in Students' Evaluations of Mathematics Instructors on RateMyProfessors.com" reported that an analysis of data on RateMyProfessors indicated that "instructors with Chinese or Korean last names were rated significantly lower in Clarity and Helpfulness" than instructors with "US last names", that "RMP users commented on the language of their 'Asian' instructors frequently but were nearly entirely silent about the language of instructors with common US last names", and that "RMP users tended to withhold extreme positive evaluation from instructors who have Chinese or Korean last names, although this was frequently lavished on instructors with US last names" (pp. 55-56).
Discussing the question of whether this is unfair bias, Subtirelu 2015 indicated that "...a consensus about whether an instructor has 'legitimate' problems with his or her speech...would have to draw on some ideological framework of expectations for what or whose language will be legitimized [that] would almost certainly serve the interests of some by constructing their language as 'without problems' or 'normal'...while marginalizing others by constructing their language as 'containing problems' or 'being abnormal'" (p. 56).
In that spirit, I'll refrain from classifying as "containing problems" the difference in ratings that Subtirelu 2015 detected.
---
Comments are open if you disagree, but I don't think that any of these three studies report a novel test for unfair sex or race bias in student evaluations of teaching using a research design with internal validity, with internal validity referring to an analysis that adequately addresses plausible alternate explanations.