Let's continue our discussion of studies in Holman et al. 2019 "Evidence of Bias in Standard Evaluations of Teaching" listed as "finding bias". See here for the first entry in the series and here for other entries.

---

7.

Spooren et al. 2013 "On the validity of student evaluation of teaching: The state of the art" is a review that, as far as I can tell, does not report novel data on unfair sex or race bias in student evaluations of teaching.

---

8.

Laube et al. 2007 "The impact of gender on the evaluation of teaching: What we know and what we can do" is a review that, as far as I can tell, does not report novel data on unfair sex or race bias in student evaluations of teaching.

---

9.

Stark and Freishtat 2014 "An evaluation of course evaluations" is a discussion that, as far as I can tell, does not report novel data on unfair sex or race bias in student evaluations of teaching.

---

Comments are open if you disagree, but I don't think that any of these three studies report a novel test for unfair sex or race bias in student evaluations of teaching using a research design with internal validity. I think that these publications would be more appropriate in a separate section of Holman et al. 2019 "Evidence of Bias in Standard Evaluations of Teaching" instead of in their list of academic articles, book chapters, and working papers finding bias.

Tagged with: , ,

Let's continue our discussion of studies in Holman et al. 2019 "Evidence of Bias in Standard Evaluations of Teaching" listed as "finding bias". See here for the first entry in the series and here for other entries.

---

4.

El-Alayli et al. 2018 "Dancing backwards in high heels: Female professors experience more work demands and special favor requests, particularly from academically entitled students" does not present novel evidence about bias in student evaluations of teaching. Instead: "The current research examined the extra burdens experienced by female professors in academia in the form of receiving more work demands from their students" (p. 145).

---

5.

Holman et al. 2019 "Evidence of Bias in Standard Evaluations of Teaching" lists as "finding bias" Hessler et al. 2018 "Availability of cookies during an academic course session affects evaluation of teaching". I'm not sure why this study is included in a list that one of the Holman et al. 2019 coauthors described as a "list of 76 articles demonstrating gender and/or racial bias in student evaluations". The Hessler et al. 2018 experimental design focused on the provision or non-provision of cookies; the study also had variation in which Teacher A handled 10 groups of students and Teacher B handled the other 10 groups of students, but the p-value was 0.514 for this variation in teacher in the Table 3 regression predicting the summation score.

---

6.

The Holman et al. 2019 "Evidence of Bias in Standard Evaluations of Teaching" list doesn't provide a summary for Uttl et al. 2017 "Meta-analysis of faculty's teaching effectiveness: Student evaluation of teaching ratings and student learning are not related", so I'm not sure why this study is included in a list that one of the Holman et al. 2019 coauthors described as a "list of 76 articles demonstrating gender and/or racial bias in student evaluations".

For what it's worth, I don't know that student evaluations of teaching being uncorrelated with learning is much of a problem, unless student evaluations of teaching are used as a measure of student learning. For example, if an instructor received a low score on an item asking about the instructor's availability outside of class because the instructor is not available outside of class, then I don't see why responses to that instructor availability item would need to be correlated with student learning in order to be a valid measure of the instructor's availability outside of class.

---

Comments are open if you disagree, but I don't think that any of these three studies report a novel test for unfair sex or race bias in student evaluations of teaching using a research design with internal validity.

Tagged with: , ,

My prior post on Holman et al. 2019 "Evidence of Bias in Standard Evaluations of Teaching" indicated that:

I think there would be value in a version of "Evidence of Bias in Standard Evaluations of Teaching" that accurately summarizes each study that has tested for unfair bias in student evaluations of teaching using a research design with internal validity and plausibly sufficient statistical power, especially if each summary were coupled with a justification of why the study provides credible evidence about unfair bias in student evaluations of teaching.

Pursuant to a discussion with Holman et al. 2019 co-author Dr. Rebecca Kreitzer, I thought that it might be a good idea for me to occasionally read and discuss a study that Holman et al. has categorized as finding bias.

---

1.

I have already posted about Peterson et al. 2019 "Mitigating gender bias in student evaluations of teaching". Holman et al. 2019 includes that article in the list of academic articles, book chapters, and working papers finding bias, so let's start there...

I do not perceive how the results in Peterson et al. 2019 can be read as finding bias. Feel free to read the article yourself or to read the Holman et al. 2019 summary of the article. Peterson et al. 2019 indicates that their results "indicate that a relatively simple intervention in language can potentially mitigate gender bias in student evaluation of teaching", but their research design does not permit an inference that bias was present among students the control group.

---

2.

Given that I am familiar with the brilliance research discussed in this Slate Star Codex post, let's move on to Storage et al. 2016 "The frequency of 'brilliant' and 'genius' in teaching evaluations predicts the representation of women and African Americans across fields", which reported evidence of a difference found in RateMyProfessors data:

Across the 18 fields in our analysis, "brilliant" was used in a 1.81:1 male:female ratio and "genius" in a 3.10:1 ratio...In contrast, we found little evidence of gender bias in use of "excellent" and "amazing" in online evaluations, with male:female ratios of 1.08:1 and 0.91:1, respectively.

But is the male/female imbalance in the frequency of "brilliant" and "genius" an unfair bias? One alternate explanation is that male instructors are more likely than female instructors to be in fields in which students use "brilliant" and "genius" in RateMyProfessors comments; that pattern appears in Storage et al. 2016 Figure 2. Another alternate explanation is that a higher percentage of male instructors than female instructors are "brilliant" and "genius"; for what it's worth, my analysis here indicates that male test-takers are disproportionately at the highest scores on the SAT-Math test, even accounting for the higher number of female SAT test-takers.

It's certainly possible that, accounting for these and other plausible alternate explanations, student comments are unfairly more likely to refer to male instructors than female instructors as "brilliant" and "genius". But it's not clear that the Storage et al. 2016 analysis permits such an inference of unfair bias.

From what I can tell, the main implication of research on bias in student evaluations of teaching concerns whether student evaluations of teaching should be used in employment decisions. Data from Storage et al. 2016 are from RateMyProfessors, so another hurdle for anyone properly using Storage et al. 2016 for the purpose of undercutting the use of student evaluations of teaching in employment decisions is producing a plausible argument that the "brilliant" and "genius" pattern in RateMyProfessors comments are representative of comments on student evaluations conducted by a college or university that are used in employment decisions.

Another hurdle is establishing that any instructor's employment would be nontrivially affected by a less-frequent-than-deserved use of "brilliant" and "genius" in student evaluation comments conducted by a college or university or on the RateMyProfessors site.

---

3.

Let's move on to another publication that Holman et al. 2019 has listed as finding bias: Piatak and Mohr 2019 "More gender bias in academia? Examining the influence of gender and formalization on student worker rule following".

It's not clear to me why an article reporting on a study of "student worker rule following" should be included in a list of "Evidence of Bias in Standard Evaluations of Teaching".

---

Comments are open if you disagree, but I don't see anything in Peterson et al. 2019 or Storage et al. 2016 or Piatak and Mohr 2019 that indicates a test for unfair bias in student evaluations of teaching using a research design with internal validity: from what I can tell, Peterson et al. 2019 had no test for unfair bias, Storage et al. 2016 did not address plausible alternate explanations, and Piatak and Mohr 2019 isn't even about student evaluations of teaching.

Tagged with: , ,

"Evidence of Bias in Standard Evaluations of Teaching" (Mirya Holman, Ellen Key, and Rebecca Kreitzer, 2019) has been cited as evidence of bias in student evaluations of teaching.

I am familiar with Mitchell and Martin 2018, so let's check how that study is summarized in the list, as archived on 20 November 2019. I count three substantive errors in the summary and one spelling error, highlighted below, and not counting the fgender in the header or the singular RateMyProfessor:

The summary referred to the online courses as being from different universities, but all of the online courses in the Mitchell and Martin 2018 analysis were at the same university. The summary referred to "female instructors" and "male professors", but the Mitchell and Martin 2018 analysis compared comments and evaluations for only one female instructor to comments and evaluations for only one male instructor. The summary indicated that female instructors were evaluated differently in intelligence, but no Mitchell and Martin 2018 table reported a statistical significance asterisk for the Intelligence/Competency category.

---

The aforementioned errors in the summary of Mitchell and Martin 2018 can be easily fixed, but that would not address a flaw in a particular use of the list, given that, from what I can tell, Mitchell and Martin 2018 has errors that undercut the inference about students using different language when evaluating female instructors than when evaluating male instructors. Listing that study and other studies as evidence of bias in student evaluations of teaching based on an uncritical reading of results shouldn't be convincing evidence of bias in student evaluations of teaching, especially if the categorizing of studies does not indicate whether "bias" is operationalized as an unfair difference or as a mere difference.

I think there would be value in a version of "Evidence of Bias in Standard Evaluations of Teaching" that accurately summarizes each study that has tested for unfair bias in student evaluations of teaching using a research design with internal validity and plausibly sufficient statistical power, especially if each summary were coupled with a justification of why the study provides credible evidence about unfair bias in student evaluations of teaching. But I don't see why anyone should be convinced by "Evidence of Bias in Standard Evaluations of Teaching" in its current form.

Tagged with: ,

The Enders 2019 Political Behavior article "A Matter of Principle? On the Relationship Between Racial Resentment and Ideology" interprets its results as "providing disconfirmatory evidence for the principled conservatism thesis" (p. 3 of the pdf). This principled conservatism thesis "asserts that adherence to conservative ideological principles causes what are interpret[ed] as more resentful responses to the individual racial resentment items, especially those that deal with subjects like hard work and struggle" (p. 5 of the pdf).

So how could we test whether adherence to conservative principles causes what are interpreted as resentful responses to racial resentment items? I think that a conservative principle informing a "strongly agree" response to the racial resentment item that "Irish, Italians, Jewish, and many other minorities overcame prejudice and worked their way up. Blacks should do the same without any special favors" might be an individualism that opposes special favors to reduce inequalities of outcome, so that, if a White participant strongly agreed that Blacks should work their way up without special favors, then—to be principled—that White participant should also strongly agree that poor Whites should work their way up without special favors.

Thus, testing the principled conservatism thesis could involve asking participants the same racial resentment items with a variation in targets or a variation to a domain in which Blacks tend to outperform Whites. If there is a concern about social desirability affecting responses when participants are asked the same item with a variation in target or domain, the items could be experimentally manipulated and responses compared at an aggregate level. This type of analysis involved manipulating the target of racial resentment items to be Blacks or another group has recently been conducted and reported on in a paper by Carney and Enos, but this paper is not cited in Enders 2019, and I would have hoped that the peer reviewers would have requested or required a discussion of information in that paper that relates to the principled nature of conservatives' responses to racial resentment items.

---

Instead of manipulating the target of racial resentment items, Enders 2019 tested the principled conservatism thesis with an analysis that assessed how responses to racial resentment items associated with attitudes about limited government and with preferences about federal spending on, among other things, public schools, child care, and the environment. From what I can tell, Enders 2019 assessed the extent to which participants are principled in a test in which principled conservative responses are only those responses in which responses expected from a conservative to racial resentment items match responses expected from a conservative to items measuring preferences about federal spending or match responses expected from a conservative to items measuring attitudes about limited government. As I think Enders 2019 suggests, this is a consistency across domains at the level of "conservatism" and is not a consistency across targets within the domain of the racial resentment items: "If I find that principled conservatism does not account for a majority of the variance in the racial resentment scale under these conditions, then I will have reasonably robust evidence against the principled conservatism thesis" (p. 7 of the pdf).

But I don't think that the level of "conservatism" is the correct level for assessing whether perceived racially prejudiced responses to racial resentment items reflect "adherence to (conservative) ideological principles" (p. 2 of the pdf). Enders 2019 indicates that "Critics argue that racially prejudiced responses to the items that compose the racial resentment scale are observationally equivalent to the responses that conservatives would provide" (abstract). However, at least for me, my criticism of the racial resentment items as producing unjustified inferences of racial bias is not limited to inferences about responses from self-identified conservatives: "This statement [about whether, if blacks would only try harder, they could be just as well off as whites] cannot be used to identify racial bias because a person who agreed with the statement might also agree that poor whites who try harder could be just as well off as middle-class whites" (p. 522 of this article). I don't perceive any reason why a person who supports increased federal spending on the public schools, child care, and the environment cannot also have a principled objection to special favors to reduce inequalities of outcome.

And even if "conservatism" were the correct level of analysis, I don't think that the Enders 2019 operationalizations of principled conservatism—as a preference for limited government and as a preference for decreased federal spending—are valid because, as far as I can tell, these operationalizations of principled conservatism are identical to principled libertarianism.

---

Enders 2019 asks "Why else would attitudes about racial issues be distinct from attitudes about other policy areas, if not for the looming presence and substantive impact of racial prejudice?" (p. 21 of the pdf). I think the correct response is that the principles that inform attitudes about these other policy areas are distinct from the principles that inform attitudes about issues in the racial resentment items, to the extent that these attitudes even involve principles.

I don't think that the principle that "the less government, the better" produces conservative policy preferences about federal spending on national defense or domestic law enforcement, and I don't see a reason to assign to racial prejudice an inconsistency between support for increased federal spending in these domains and agreement that "the less government, the better". And I don't perceive a reason for racial prejudice to be assigned responsibility for a supposed inconsistency between responses to the claim that "the less government, the better" and responses to the racial resentment statements that "Generations of slavery and discrimination have created conditions that make it difficult for blacks to work their way out of the lower class" or that "...if blacks would only try harder they could be just as well off as whites", because, as far as I can tell, there is no inconsistency in which a preference for limited government compels particular responses to these racial resentment items.

---

NOTES

1. Enders 2019 noted that: "More recently, DeSante (2013), utilizing an experimental research design, found that the most racially resentful whites, as opposed to less racially resentful whites, were more likely to allocate funds to offset the state budget deficit than allocated such funds to a black welfare applicant. This demonstrates a racial component of racial resentment, even accounting for principled conservatism" (p. 6). But I don't think that this indicates a demonstration of a racial component of racial resentment, because there is no indication whether the preference for allocating funds to offset the state budget deficit instead of allocating funds to welfare recipients occurred regardless of the race of the welfare recipients. My re-analysis of data for DeSante 2013 indicated that "...when comparing conditions with two White applicants and conditions with two Black applicants, there is insufficient evidence to support the inference of a difference in the effect of racial resentment on allocations to offset the state budget deficit" (pp. 5-6).

2. I sent the above comments to Adam Enders in case he wanted to comment.

3. After I sent the above comments, I saw this Robert VerBruggen article on the racial resentment measure. I don't remember seeing that article before, but it has a lot of good points and ideas.

Tagged with:

Ethnic and Racial Studies recently published "Revisiting the Asian Second-Generation Advantage", by Van C. Tran, Jennifer Lee, and Tiffany J. Huang, which I will refer to below as Tran et al. 2019. Ethnic and Racial Studies has also published my comment, and a Tran et al. response. I'll reply to their response below...

---

Here are three findings from Tran et al. 2019 important for the discussion below:

1. Table 2 indicates that U.S. second-generation Chinese, Indians, Filipinos, Vietnamese, and Koreans are more likely than native Whites to hold a college degree.

2. Table 2 indicates that U.S. second-generation Chinese, Indians, Filipinos, Vietnamese, and Koreans are more likely than native Whites to report being in a managerial or professional position.

3. Table 4 Model 1 does not provide evidence at p<.05 that U.S. second-generation Chinese, Indians, Filipinos, Vietnamese, or Koreans are less likely than native Whites to report being in a managerial or professional position, controlling for age, age squared, gender, region, survey year, and educational attainment.

---

Below, I'll respond to what I think are the two key errors in the Tran et al. reply.

1.

From the first paragraph of the Tran et al. reply:

Given this Asian educational advantage, we hypothesized that second-generation Asians would also report an occupational advantage over whites, measured by their likelihood to be in a professional or managerial occupation.

It makes sense to expect the second-generation Asian educational advantage to translate to a second-generation Asian occupational advantage. And that is what Tran et al. 2019 Table 2 reported: 45% of native Whites reported being in a professional or managerial position, compared to 73% of second-generation Chinese, 79% of second-generation Indians, 52% of second-generation Filipinos, 53% of second-generation Vietnamese, and 60% of second-generation Koreans. Tran et al. 2019 even commented on this occupational advantage: "Yet despite variation among the second-generation Asian groups, each exhibits higher rates of professional attainment than native-born whites and blacks" (p. 2260). But here is the Tran et al. reply following immediately from the prior block quote:

Contrary to our expectation, however, we found that, with the exception of second-generation Chinese, the other four Asian ethnic groups in our study – Indians, Filipinos, Vietnamese and Koreans – report no such advantage in professional or managerial attainment over whites (Tran, Lee, and Huang 2019: Table 4, Model 1). More precisely, the four Asian ethnic groups are only as likely as whites to be in a managerial or professional occupation, controlling for age, the quadratic term of age, gender, education, and region of the country.

The finding contrary to the Tran et al. expectation (from Tran et al. 2019 Table 4 Model 1) was not from what the other four Asian ethnic groups reported but was from a model predicting what was reported controlling for educational attainment and other factors. Tran et al. therefore expected an educational advantage to cause an occupational advantage that remained after controlling for the educational advantage. The Tran et al. reply states this expressly (p. 2274, emphasis in the original):

Because second-generation Asians hold such a significant educational advantage over whites, we had expected that second-generation Asians would also report an occupational advantage over whites, even after controlling for respondents' education.

Properly controlling for a factor means to eliminate the factor as an explanation. For instance, men having a higher average annual salary than women have might be due to men working more hours on average per year than women work. Comparing the average hourly salary for men to the average hourly salary for women controls for hours worked and eliminates the explanation that the any residual gender difference in average annual salary is due to a gender difference in hours worked per year. The logic of the Tran et al. expectation applied to the gender salary gap would produce expectations such as: Because men work more hours on average than women work, we expected that men would have a higher average annual salary than women have, even after controlling for the fact that men work more hours on average than women work.

---

2.

From the Tran et al. reply (p. 2274, emphasis added):

Given that second-generation Asians are more likely to have graduated from college than whites, we hypothesized that they would evince a greater likelihood of attaining a professional or managerial position than whites, as is the case for the Chinese. Instead, we found that second-generation Chinese are the exception, rather than the norm, among second-generation Asians. Hence, we concluded that second-generation Asians are over-credentialed in education in order to achieve parity with whites in the labor market.

I think that there are two ways that labor market parity can be properly conceptualized in the context of this analysis. The first is for labor market outcomes for second-generation Asians to equal labor market outcomes for native Whites, without controlling for any factors; the second is for labor market outcomes for second-generation Asians to equal to labor market outcomes for native Whites, controlling for particular factors. Tran et al. appear to be using the "controlling for" conceptualization of parity. Now to the bolded statement...

Ignoring the advantage for second-generation Chinese, and interpreting as parity insufficient evidence of a difference in the presence of statistical control, Tran et al. 2019 provided evidence that second-generation Asians are over-credentialed in education relative to native Whites *and* that second-generation Asians have achieved labor market parity with native Whites. But I do not see anything in the Tran et al. 2019 analysis or reply that indicates that second-generation Asians need to be over-credentialed in education "in order to achieve" this labor market parity with native Whites.

Returning to the gender salary gap example, imagine that men have a higher average annual salary than women have, but that this salary advantage disappears when controlling for hours worked, so that men have salary parity with women; nothing in that analysis indicates that men need to overwork in order to achieve salary parity with women.

---

So I think that the two key errors in the Tran et al. reply are:

1. The expectation that the effect of education will remain after controlling for education.

2. The inference from their reported results that second-generation Asians need to be over-credentialed in order to achieve labor market parity with natives Whites.

Tagged with: , ,

Racial resentment and symbolic racism are terms used to describe a set of measures used in racial attitudes research, including statements such as "Irish, Italians, Jewish and many other minorities overcame prejudice and worked their way up. Blacks should do the same without any special favors". This item and at least some of the other racial resentment items confound racism and nonracial ideology; in this "special favors" item, an individualist who believes that everyone should work their way up without special favors would select a response on the same side of the scale as an antiBlack racist who believes that only Blacks should work their way up without special favors.

Feldman and Huddy (2005) concluded that "racial resentment is an inadequate measure of prejudice because it confounds prejudice and political ideology" (p. 181), which is consistent with factor analysis of racial resentment items (Sears and Henry 2003: 271). Some research has addressed this confounding with what Feldman and Huddy (2005: 171) call the multivariate approach, in which the analysis includes statistical control for related ideological values. The logic of this multivariate approach is that racial resentment confounds ideology and antiBlack animus so that controlling for ideology should permit the residual association of racial resentment to be interpreted as the association due to antiBlack animus.

The analysis below approaches from the opposite direction: racial resentment confounds ideology and antiBlack animus so that controlling for antiBlack animus should permit the residual association of racial resentment to be interpreted as the association due to ideology. Moreover, if controls for ideology and for antiBlack animus are both included, then the association of racial resentment with an outcome variable should be zero. But this is not even close to being true, as illustrated below in a figure that reports the association of racial resentment with racial or possibly racialized outcome variables, using different sets of statistical control.

In each panel above, the top estimate indicates the association of racial resentment with the outcome variable controlling for only demographics. The second and third estimates respectively indicate the association of racial resentment with outcome variables after controls for demographics and racial attitudes and after controls for demographics and ideology. The fourth and fifth estimates respectively indicate the association of racial resentment with outcome variables after controls for demographics, ideology, and racial attitudes and after controls for demographics, ideology, and racial animus. The key comparison is between the third estimate and the fourth and fifth estimates: the measures of racial attitudes and racial animus had relatively little impact on the racial resentment estimate once the controls for ideology were included in the analysis. For example, in the top left panel, the coefficient for racial resentment was 0.51 controlling for demographics and ideology, was 0.48 controlling for demographics, ideology, and racial attitudes, and was 0.52 controlling for demographics, ideology, and racial animus. In a common racial resentment association analysis, the 0.51 coefficient controlling for demographics and ideology would be assigned to antiBlack animus, but the addition of seven racial attitudes controls accounted for only 0.03 of the 0.51 coefficient and the inclusion of six antiBlack animus controls did not even reduce the 0.51 coefficient. (see the Notes below for more description on the measurements).

A reasonable critique of the above analysis is that racial resentment taps a form of antiBlack racism that is not captured or is not well captured in the included measures of racial attitudes and racial animus. But, from what I can tell, that is an equally valid criticism of analyses that control for ideology: the nonracial ideology captured in racial resentment measures is not captured or not well captured in the included measures of ideology.

NOTES

1. The sample for the analysis was the 3,261 non-Hispanic Whites who completed face-to-face or online the pre- and post-election surveys, conducted between 8 September 2012 and 24 January 2013, and who were not listwise deleted from a model due to missing data for a variable. Each variable in the analysis was coded to range from 0 to 1. Linear regressions without weights were used to predict values of the outcome variables.

The racial resentment measure summed responses to the four ANES 2012 racial resentment items. Models included demographic controls for participant sex, marital status, age, education level, and household family income. Ideological controls were self-reported partisanship, self-reported ideology, an item about guaranteed jobs, an index of attitudes about the role of government, a moral traditionalism index, an authoritarianism index, and an egalitarianism index.

One set of models included seven controls for racial attitudes: a feeling thermometer difference of ratings of Whites and ratings of Blacks, a rating difference for Blacks and for Whites in general on a laziness stereotype scale, a rating difference for Whites and for Blacks in general on an intelligence stereotype scale, an item rating admiration of Blacks, an item rating sympathy for Blacks, an item measuring the perceived political influence of Blacks relative to Whites, and a difference in ratings of the level of discrimination in the United States today against Whites and against Blacks. Another set of models included six dichotomous controls that attempted to isolate antiBlack animus: a more than 20-point feeling thermometer rating difference in which Whites were rated higher than Blacks and with Whites rated at or above 50 and Blacks rated below 50, a rating of Blacks as lazier in general than Whites, a rating of Whites as more intelligent in general than Blacks, an indication of never feeling sympathy for Blacks, an indication that Blacks have too much influence in American politics but Whites don't, and an indication that there is no discrimination against Blacks in the United States today but that there is discrimination against Whites in the United States today.

2. Code for the analysis is here.

3. Results for the 2016 ANES are below:

4. Code for the 2016 ANES analysis is here.

5. Citations:

American National Election Studies (ANES). 2016. ANES 2012 Time Series Study. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2016-05-17. https://doi.org/10.3886/ICPSR35157.v1.

American National Election Studies, University of Michigan, and Stanford University. 2017. ANES 2016 Time Series Study. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2017-09-19. https://doi.org/10.3886/ICPSR36824.v2.

Tagged with: ,