Tour of research on student evaluations of teaching [25-27]: Elmore and LaPointe 1974, Elmore and LaPointe 1975, and Ferber and Huber 1975

Let's continue our discussion of studies in Holman et al. 2019 "Evidence of Bias in Standard Evaluations of Teaching" listed as "finding bias". See here for the first entry in the series and here for other entries.

---

25.

Elmore and LaPointe 1974 "Effects of Teacher Sex and Student Sex on the Evaluation of College Instructors" analyzed student evaluation data from courses from various departments of the Southern Illinois University at Carbondale in 1971. Complete data were available from 1,259 students in 38 pairs of courses matched on course number and instructor sex. For the 20 instructor evaluation items analyzed, only two items had a mean difference between female instructors and male instructors using a p=0.01 threshold: men instructors were rated higher for "spoke understandably", and women instructors were rated higher for"promptly returned homework and tests".

I'm not sure why Elmore and LaPointe 1974 is included in a list of studies finding bias in standard evaluations of teaching. No statistically-significant difference was reported for 18 of the 20 instructor evaluation items, and, for the two items for which there was a reported difference, one difference favored male instructors and the other difference favored female instructors. But, more importantly, the Elmore and LaPointe 1974 research design does not permit the inference that student ratings were biased from reality; for example, no evidence is reported that indicates that the female instructors didn't return homework and tests more promptly on average than the male instructors did.

---

26.

Elmore and LaPointe 1975 "Effect of Teacher Sex, Student Sex, and Teacher Warmth on the Evaluation of College Instructors" analyzed student evaluation data from courses from various departments of the Southern Illinois University at Carbondale in 1974. Data were available from 838 students in 22 pairs of courses matched on course and instructor sex. Twenty standard instructor evaluation items were used, plus instructor responses and student responses to an item about whether the instructor's primary interest lie in the course content or the students and a five-point measure of how warm a person the instructor is. The p-value threshold was 0.0025.

Results indicated that "When students rate their instructor's interest and warmth, teachers perceived as warmer or primarily interested in students receive higher ratings in effectiveness regardless of their sex", that "In general, female faculty receive significantly higher effectiveness ratings than do male faculty when they rate themselves low in warmth or interested in course content", and that "Male teachers who rate themselves high in warmth or primarily interested in students receive significantly higher ratings than male teachers who rate themselves low in warmth or primarily interested in course content, respectively" (p. 374).

I'm not sure how these data establish an unfair bias in student evaluations of teaching.

---

27.

Ferber and Huber 1975 "Sex of Student and Instructor: A Study of Student Bias" reported on responses to three items from students in the first class meeting of four large introductory economics or sociology courses at the University of Illinois Urbana from 1972.

The first item asked students to rate men college teachers that they had had in seven academic areas and women college teachers that they had had in seven academic areas. Results in Table 1 indicate that, across the seven academic areas, the mean rating for men college teachers was identical to the mean rating for women college teachers (2.24).

The second question asked about student preferences for men instructors or women instructors in various types of classroom situations. Results in Table 2 indicate that most students did not express a preference, but, of the students who did express a preference, the majority preferred a man instructor. For example, of 1,241 students, 39 percent expressed a preference for a man instructor in a large lecture and 2 percent expressed a preference for a woman instructor in a large lecture.

The third item asked students to rate their level of agreement with a statement, attributed to a man or to a woman. For one statement, the prompt was: "A well-known American economist [Mary Killingsworth/Charles Knight] proposes that compulsory military service be replaced by the requirement that all young people give one year of service for their country". Results in Table 6 indicate that the mean level of agreement did not differ between Mary and Charles at p<0.05 among male students, among female students, or among the full sample.

For the other statement, the prompt was: "According to the contemporary social theorist [Frank Merton/Alice Parsons], in order to achieve equal educational opportunity in the United States, no parents should be allowed to pay for their children's education; every college student should borrow from the federal government to pay for tuition and living expenses". Results in Table 6 indicate that, on a rating scale from 1 for strongly agree to 5 for strongly disagree, the mean level of agreement differed at p<0.05 among male students, among female students, and among the full sample, with Alice favored over Frank (respective overall means of 3.38 and 3.66).

I'm not sure why Ferber and Huber 1975 is included in a list of studies finding bias in standard evaluations of teaching. The first item is the only item directly on point for assessing bias in student evaluations of teaching, and there was no overall difference in that item for male instructors and female instructors and no evidence that the lack of a difference was unfair.

---

Comments are open if you disagree, but I don't think that any of these three studies provide sufficient evidence to undercut the use of student evaluations in employment decisions.

And it's worth considering whether these data from the Nixon administration should be included in the main Holman et al. 2019 list, given that the sum of "76" studies "finding bias" in the Holman et al. 2019 list is being used to suggest inferences about the handling of student evaluations of teaching in contemporary times.

Tagged with: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.