Tour of research on student evaluations of teaching [1-3]: Peterson et al. 2019, Storage et al. 2016, and Piatak and Mohr 2019

My prior post on Holman et al. 2019 "Evidence of Bias in Standard Evaluations of Teaching" indicated that:

I think there would be value in a version of "Evidence of Bias in Standard Evaluations of Teaching" that accurately summarizes each study that has tested for unfair bias in student evaluations of teaching using a research design with internal validity and plausibly sufficient statistical power, especially if each summary were coupled with a justification of why the study provides credible evidence about unfair bias in student evaluations of teaching.

Pursuant to a discussion with Holman et al. 2019 co-author Dr. Rebecca Kreitzer, I thought that it might be a good idea for me to occasionally read and discuss a study that Holman et al. has categorized as finding bias.

---

I have already posted about Peterson et al. 2019 "Mitigating gender bias in student evaluations of teaching". Holman et al. 2019 includes that article in the list of academic articles, book chapters, and working papers finding bias, so let's start there...

I do not perceive how the results in Peterson et al. 2019 can be read as finding bias. Feel free to read the article yourself or to read the Holman et al. 2019 summary of the article. Peterson et al. 2019 indicates that their results "indicate that a relatively simple intervention in language can potentially mitigate gender bias in student evaluation of teaching", but their research design does not permit an inference that bias was present among students the control group.

---

Given that I am familiar with the brilliance research discussed in this Slate Star Codex post, let's move on to Storage et al. 2016 "The frequency of 'brilliant' and 'genius' in teaching evaluations predicts the representation of women and African Americans across fields", which reported evidence of a difference found in RateMyProfessors data:

Across the 18 fields in our analysis, "brilliant" was used in a 1.81:1 male:female ratio and "genius" in a 3.10:1 ratio...In contrast, we found little evidence of gender bias in use of "excellent" and "amazing" in online evaluations, with male:female ratios of 1.08:1 and 0.91:1, respectively.

But is the male/female imbalance in the frequency of "brilliant" and "genius" an unfair bias? One alternate explanation is that male instructors are more likely than female instructors to be in fields in which students use "brilliant" and "genius" in RateMyProfessors comments; that pattern appears in Storage et al. 2016 Figure 2. Another alternate explanation is that a higher percentage of male instructors than female instructors are "brilliant" and "genius"; for what it's worth, my analysis here indicates that male test-takers are disproportionately at the highest scores on the SAT-Math test, even accounting for the higher number of female SAT test-takers.

It's certainly possible that, accounting for these and other plausible alternate explanations, student comments are unfairly more likely to refer to male instructors than female instructors as "brilliant" and "genius". But it's not clear that the Storage et al. 2016 analysis permits such an inference of unfair bias.

From what I can tell, the main implication of research on bias in student evaluations of teaching concerns whether student evaluations of teaching should be used in employment decisions. Data from Storage et al. 2016 are from RateMyProfessors, so another hurdle for anyone properly using Storage et al. 2016 for the purpose of undercutting the use of student evaluations of teaching in employment decisions is producing a plausible argument that the "brilliant" and "genius" pattern in RateMyProfessors comments are representative of comments on student evaluations conducted by a college or university that are used in employment decisions.

Another hurdle is establishing that any instructor's employment would be nontrivially affected by a less-frequent-than-deserved use of "brilliant" and "genius" in student evaluation comments conducted by a college or university or on the RateMyProfessors site.

---

Let's move on to another publication that Holman et al. 2019 has listed as finding bias: Piatak and Mohr 2019 "More gender bias in academia? Examining the influence of gender and formalization on student worker rule following".

It's not clear to me why an article reporting on a study of "student worker rule following" should be included in a list of "Evidence of Bias in Standard Evaluations of Teaching".

---

Comments are open if you disagree, but I don't see anything in Peterson et al. 2019 or Storage et al. 2016 or Piatak and Mohr 2019 that indicates a test for unfair bias in student evaluations of teaching using a research design with internal validity: from what I can tell, Peterson et al. 2019 had no test for unfair bias, Storage et al. 2016 did not address plausible alternate explanations, and Piatak and Mohr 2019 isn't even about student evaluations of teaching.

Tour of research on student evaluations of teaching [1-3]: Peterson et al. 2019, Storage et al. 2016, and Piatak and Mohr 2019

Leave a Reply Cancel reply