In a survey experiment reported in LaFleur Stephens-Dougan's 2016 Journal of Politics article, "Priming Racial Resentment without Stereotypic Cues", respondents were shown a campaign mailer for a white candidate named Greg Davis, with experimental manipulations of the candidate's party (Democrat or Republican) and the photos on the candidate's mailers (five photos of whites, five photos of blacks, or a mixture of photos of whites and blacks).

One key finding described in the abstract is that "white Democratic candidates are penalized for associating with blacks, even if blacks are portrayed in a positive manner".

The JOP article describes the analysis in Chapter 5 of Stephens' dissertation, which reported on a July 2011 YouGov/Polimetrix survey. Dissertation page 173 indicated that the survey had 13 experimental conditions, but the JOP article reports only six conditions, omitting the control condition and the six conditions in which candidate Greg Davis was black. Stephens-Dougan might plan to report on these omitted conditions in a subsequent publication, so I'll concentrate on the outcome variables.

---

Many potential outcome variables were not reported on in the JOP article, based on a comparison of the survey questionnaire in Appendix D of the dissertation to the outcome variables mentioned in the main text or the appendix of the article. The box below describes each post-treatment item on the survey questionnaire except for the Q7 manipulation check: regular font with a [*] indicates items reported on in the article, and boldface indicates items not reported on in the article. See pages 224 to 230 of the dissertation for the exact item wording.

Q8. Feeling thermometer for the candidate.

Q9 [*]. Likelihood of voting for Greg Davis.

Q10. How well a series of terms describe Greg Davis:

  • Intelligent
  • Inexperienced
  • Trustworthy
  • Hardworking
  • Fair [*]
  • Competent

Q11. Perception of Greg Davis' political ideology.

Q12. Whether Democrats, Republicans, or neither party would be better at:

  • Helping Senior Citizens
  • Improving Health Care
  • Improving the Economy
  • Reducing Crime
  • Reforming Public Education

Q13. Like, dislike, or neither for the Democratic Party.

Q14. Like, dislike, or neither for the Republican Party.

Q15. Perception of how well Greg Davis would handle:

  • Helping Senior Citizens
  • Improving Health Care
  • Improving the Economy
  • Reducing Crime [*]
  • Reforming Public Education

Q16 [*]. Whether Greg Davis' policies will favor Whites over Blacks, Blacks over Whites, or neither.

Q17. Perception of which groups Greg Davis will help if elected:

  • Teachers
  • Latinos
  • Corporate Executives
  • Farmers
  • Senior Citizens
  • African Americans
  • Homeowners
  • Students
  • Small Business Owners
  • Whites

Q18 [*]. Perception of Greg Davis' position on affirmative action for Blacks in the workplace.

Q19. Perception of Greg Davis' position on the level of federal spending on Social Security.

Q20. Job approval, disapproval, or neither for Barack Obama as president.

The boldface above indicates that many potential outcome variables were not reported on in the JOP article. For example, Q10 asked respondents how well the terms "intelligent", "inexperienced", "trustworthy", "hardworking", "fair", and "competent" describe the candidate, but readers are told about results only for "fair"; readers are told results for Q16 about the candidate's perceived preference for policies that help whites over blacks or vice versa, but readers are not told results for "Whites" and "Blacks" on Q17 about the groups that the candidate is expected to help.

Perhaps the estimates and inferences are identical for the omitted and included items, but prior analyses [e.g., here and here] suggest that omitted items often produce different estimates and sometimes produce different inferences than included items.

Data for most omitted potential outcome variables are not in the article's dataset at the JOP Dataverse, but the dataset did contain a "thermgregdavis" variable that ranged from 1 to 100, which is presumably the feeling thermometer in Q8. I used the model in line 14 of Stephens-Dougan's code, but -- instead of using the reported Q9 outcome variable for likelihood of voting for Greg Davis -- I used "thermgregdavis" as the outcome variable and changed the estimation technique from ordered logit to linear regression: the p-value for the difference between the all-white photo condition and the mixed photo condition was p=0.693, and the p-value for the difference between the all-white photo condition and the all-black photo condition was p=0.264.

---

This sort of selective reporting is not uncommon in social science [see here, here, and here], but I'm skeptical that researchers with the flexibility to report the results they want based on post-hoc research design choices will produce replicable estimates and unbiased inferences, especially in the politically-charged racial discrimination subfield. I am also skeptical that selective reporting across publications will balance out in a field in which a supermajority of researchers fall on one side of the political spectrum.

So how can such selective reporting be prevented? Researchers can preregister their research designs. Journals can preaccept articles based on a preregistered research design. For non-preregistered studies, journals can require as a condition of publication the declaration of omitted studies, experiments, experimental conditions, and outcome variables. Peer reviewers can ask for these declarations, too.

---

It's also worth comparing the hypotheses as expressed in the dissertation to the hypotheses as expressed in the JOP article. First, the hypotheses from dissertation Chapter 5, on page 153:

H1: Democratic candidates are penalized for an association with African Americans.

H2: Republican candidates are rewarded for an association with African Americans.

H3: The racial composition of an advertisement influences voters' perceptions of the candidates' policy preferences.

Now, the JOP hypotheses:

H1. White Democratic candidates associated with blacks will lose vote support and will be perceived as more likely to favor blacks over whites and more likely to support affirmative action relative to white Democratic candidates associated with images of only whites.

H2. Counterstereotypic images of African Americans paired with a white Democratic candidate will prime racial attitudes on candidate evaluations that are implicitly racial relative to a comparable white Democratic candidate associated with all whites.

H3. Counterstereotypic images of African Americans paired with a white Republican candidate will be inconsequential such that they will not be associated with a main effect or a racial priming effect.

So hypotheses became more specific for Democratic candidates and switched from Republicans being rewarded to Republicans not experiencing a consequential effect. My sense is that hypothesis modification is not uncommon in social science, but the reason for the survey items asking about personal characteristics of the candidate (e.g., trustworthy, competent) is clearer in light of the dissertation's hypotheses about candidates being penalized or rewarded for an association with African Americans. After all, the feeling thermometer and the other Q10 characteristic items can be used to assess a penalty or reward for candidates.

---

In terms of the substance of the penalty, the abstract of the JOP article notes: "I empirically demonstrate that white Democratic candidates are penalized for associating with blacks, even if blacks are portrayed in a positive manner."

My analysis of the data indicated that, based on a model with no controls and no cases dropped, and comparing the all-white photo condition to the all-black photo condition, there is evidence of this penalty in the Q9 vote item at p=0.074. However, evidence for this penalty is weak in the feeling thermometer (p=0.248) and in the "fair" item (p=0.483), and I saw no evidence in the article or dissertation that the penalty can be detected in the items omitted from the dataset.

Moreover, much of the estimated penalty might reflect only the race of persons in the photos providing a signal about candidate Greg Davis' ideology. Compared to respondents in the all-white photo condition, respondents in the mixed photo condition and the all-black photo condition rated Greg Davis as more liberal (p-values of 0.014 and 0.004), and the p=0.074 penalty in the Q9 vote item inflates to p=0.710 when including the measure of Greg Davis' perceived ideology, with corresponding p-values ranging from p=0.600 to p=0.964 for models predicting a penalty in the thermometer and the "fair" item.

---

NOTES:

1. H/T to Brendan Nyhan for the pointer to the JOP article.

2. The JOP article emphasizes the counterstereotypical nature of the mailer photos of blacks, but the experiment did not vary the photos of blacks, so the experiment provides no evidence about the influence of the counterstereotypical nature of the photos.

3. The JOP article reports four manipulation checks (footnote 6), but the dissertation reports five manipulation checks (footnote 65, p. 156). The omitted manipulation check concerned whether the candidate tried to appeal to racial feelings. The dataset for the article at the JOP Dataverse has a "manipchk_racialfeelings" variable that is presumably this omitted manipulation check.

4. The abstract reports that "Racial resentment was primed such that white Democratic candidates associated with blacks were perceived as less fair, less likely to reduce crime, and less likely to receive vote support." However, Table 2 of the article and my analysis indicate that no photo condition comparison produced a statistically-significant main effect for the "fair" item and only the all-white vs. mixed photo comparison produced a statistically-significant main effect for perceptions of the likelihood of reducing crime, with this one main effect reaching statistical significance only under the article's generous convention of using a statistical significance asterisk for a one-tailed p-value less than 0.10 (the p-value was p=0.142).

Table 4 of the article indicated a statistically-significant interaction between photo conditions and racial resentment when predicting the "fair" item and perceptions of the likelihood of reducing crime, so I think that this interaction is what is referred to in the abstract statement that "Racial resentment was primed such that white Democratic candidates associated with blacks were perceived as less fair, less likely to reduce crime, and less likely to receive vote support."

5. The 0.142 p-value referred to in the previous item inflates to p=0.340 when the controls are removed from the model. There are valid reasons for including demographic controls in a regression predicting results from a survey experiment, but the particular set of controls should be preregistered to prevent researchers from estimating models without controls and with different combinations of controls and then selecting a model or models to report based on the corresponding p-value or effect size.

6. Code for the new analyses:

  • reg votegregdavis i.whitedem_treatments [pweight = weight]
  • reg thermgregdavis i.whitedem_treatments [pweight = weight]
  • reg fair_gregdavis i.whitedem_treatments [pweight = weight]
  • reg ideo_gregdavis i.whitedem_treatments [pweight = weight]
  • reg votegregdavis i.whitedem_treatments ideo_gregdavis [pweight = weight]
  • reg thermgregdavis i.whitedem_treatments ideo_gregdavis [pweight = weight]
  • reg fair_gregdavis i.whitedem_treatments ideo_gregdavis [pweight = weight]
  • ologit fair_gregdavis i.whitedem_treatments gender educ income pid7 south [pweight = weight]
  • ologit gregdavis_redcrim i.whitedem_treatments gender educ income pid7 south [pweight = weight]
  • ologit gregdavis_redcrim i.whitedem_treatments [pweight = weight]

7. I emailed Dr. Stephens-Dougan, asking whether there was a reason for the exclusion of items and about access to a full dataset. I received a response and invited her to comment on this post.

Tagged with: , , ,

My article reanalyzing data on a gender gap in citations to international relations articles indicated that the gender gap is largely confined to elite articles, defined as articles in the right tail of citation counts or articles in the top three political science journals. That article concerned an aggregate gender gap in citations, but this post is about a particular woman who has been under-cited in the social science literature.

It is not uncommon to read a list experiment study that suggests or states that the list experiment originated in the research described in the Kuklinski, Cobb, and Gilens 1997 article, "Racial Attitudes and the New South." For example, from Heerwig and McCabe 2009 (p. 678):

Pioneered by Kuklinski, Cobb, and Gilens (1997) to measure social desirability bias in reporting racial attitudes in the "New South," the list experiment is an increasingly popular methodological tool for measuring social desirability bias in self-reported attitudes and behaviors.

Kuklinski et al. described a list experiment that was placed on the 1991 National Race and Politics Survey. Kuklinski and colleagues appeared to propose the list experiment as a new measure (p. 327):

We offer as our version of an unobtrusive measure the list experiment. Imagine a representative sample of a general population divided randomly in two. One half are presented with a list of three items and asked to say how many of these items make them angry — not which specific items make them angry, just how many. The other half receive the same list plus an additional item about race and are also asked to indicate the number of items that make them angry. [screen shot]

The initial draft of my list experiment article reflected the belief that the list experiment originated with Kuklinski et al., but I then learned [*] of Judith Droitcour Miller's 1984 dissertation, which contained this passage:

The new item-count/paired lists technique is designed to avoid the pitfalls encountered by previous indirect estimation methods. Briefly, respondents are shown a list of four or five behavior categories (the specific number is arbitrary) and are then asked to report how many of these behaviors they have engaged in — not which categories apply to them. Nothing else is required of respondents or interviewers. Unbiased estimation is possible because two slightly different list forms (paired lists) are administered to two separate subsamples of respondents, which have been randomly selected in advance by the investigator. The two list forms differ only in that the deviant behavior item is included on one list, but omitted from the other. Once the alternate forms have been administered to the two randomly equivalent subsamples, an estimate of deviant behavior prevalence can be derived from the difference between the average list scores. [screen shot]

The above passage was drawn from pages 3 and 4 of Judith Droitcour Miller's 1984 dissertation at the George Washington University, "A New Survey Technique for Studying Deviant Behavior." [Here is another description of the method, in a passage from the 2004 edition of the 1991 book, Measurement Errors in Surveys (p. 88)]

It's possible that James Kuklinski independently invented the list experiment, but descriptions of the list experiment's origin should nonetheless cite Judith Droitcour Miller's 1984 dissertation as a prior — if not the first [**] — example of the procedure known as the list experiment.

---

[*] I think it was the Adam Glynn manuscript described below through which I learned of Miller's dissertation.

[**] An Adam Glynn manuscript discussed the list experiment and item count method as special cases of aggregated response techniques. Glynn referenced a 1979 Raghavarao and Federer article, and that article referenced a 1974 Smith et al. manuscript that used a similar block total response procedure. The non-randomized version of the procedure split seven questions into groups of three, as illustrated in one of the questionnaires below. The procedure's unobtrusiveness derived from a researcher's inability in most cases to determine which responses a respondent had selected: for example, Yes-No-Yes produces the same total as No-No-No (5 in each case).

blocktotalresponse

The questionnaire for the randomized version of the block total response procedure listed all seven questions; the respondent then drew a number and gave a total response for only those three questions that were associated with the number that was drawn: for example, if the respondent drew a 4, then the respondent gave a total for their responses to questions 4, 5, and 7. This procedure is similar to the list experiment, but the list experiment is simpler and more efficient.

Tagged with: , , , ,

Here's part of the abstract from Rios Morrison and Chung 2011, published in the Journal of Experimental Social Psychology:

In both studies, nonminority participants were randomly assigned to mark their race/ethnicity as either "White" or "European American" on a demographic survey, before answering questions about their interethnic attitudes. Results demonstrated that nonminorities primed to think of themselves as White (versus European American) were subsequently less supportive of multiculturalism and more racially prejudiced, due to decreases in identification with ethnic minorities.

So asking white respondents to select their race/ethnicity as "European American" instead of "White" influenced whites' attitudes toward and about ethnic minorities. The final sample for study 1 was a convenience sample of 77 self-identified whites and 52 non-whites, and the final sample for study 2 was 111 white undergraduates.

Like I wrote before, if you're thinking that it would be interesting to see whether results hold in a nationally representative sample with a large sample size, well, that was tried, with a survey experiment as part of the Time Sharing Experiments in the Social Sciences. Here are the results:

mc2011reanalysis

I'm mentioning these results again because in October 2014 the journal that published Rios Morrison and Chung 2011 desk rejected the manuscript that I submitted describing these results. So you can read in the Journal of Experimental Social Psychology about results for the low-powered test on convenience samples for the "European American" versus "White" self-identification hypothesis, but you won't be able to read in the JESP about results when that hypothesis was tested with a higher-powered test on a nationally-representative sample with data collected by a disinterested third party.

I submitted a revision of the manuscript to Social Psychological and Personality Science, which extended a revise-and-resubmit offer conditional on inclusion of a replication of the TESS experiment. I planned to conduct an experiment with an MTurk sample, but I eventually declined the revise-and-resubmit opportunity for various reasons.

The most recent version of the manuscript is here. Links to data and code.

Tagged with: , , , , , ,

In the Political Behavior article, "The Public's Anger: White Racial Attitudes and Opinions Toward Health Care Reform", Antoine J. Banks presented evidence that "anger uniquely pushes racial conservatives to be more opposing of health care reform while it triggers more support among racial liberals" (p. 493). Here is how the outcome variable was measured in the article's reported analysis (p. 511):

Health Care Reform is a dummy variable recoded 0-1 with 1 equals opposition to reform. The specific item is "As of right now, do you favor or oppose Barack Obama and the Democrats' Health Care reform bill". The response options were yes = I favor the health care bill or no = I oppose the health care bill.

However, the questionnaire for the study indicates that there were multiple items used to measure opinions of health care reform:

W2_1. Do you approve or disapprove of the way Barack Obama is handling Health Care? Please indicate whether you approve strongly, approve somewhat, neither approve nor disapprove, disapprove somewhat, or disapprove strongly.

W2_2. As of right now, do you favor or oppose Barack Obama and the Democrats' Health Care reform bill?

[if "favor" on W2_2] W2_2a. Do you favor Barack Obama and the Democrats' Health Care reform bill very strongly, or not so strongly?

[if "oppose" on W2_2] W2_2b. Do you oppose Barack Obama and the Democrats' Health Care reform bill very strongly, or not so strongly?

The bold item above is the only item reported on as an outcome variable in the article. The reported analysis omitted results for one outcome variable (W2_1) and reported dichotomous results for the other outcome variable (W2_2) for which the apparent intention was to have a four-pronged outcome variable from oppose strongly to favor strongly.

---

Here is the manuscript that I submitted to Political Behavior in March 2015 describing the results using the presumed intended outcome variables and a straightforward research design (e.g., no political discussion control, no exclusion of cases, cases from all conditions analyzed at the same time). Here's the main part of the main figure:

Banks2014Reproduction

The takeaway is that, with regard to opposition to health care reform, the effect of the fear condition on symbolic racism differed at a statistically significant level from the effect of the baseline relaxed condition on symbolic racism; however, contra Banks 2014, the effect of anger on symbolic racism did not differ at a statistically significant level from the effect of the relaxed condition on symbolic racism. The anger condition had a positive effect on symbolic racism, but it was not a unique influence.

The submission to Political Behavior was rejected after peer review. Comments suggested analyzing the presumed intended outcome variables while using the research design choices in Banks 2014. Using the model in Table 2 column 1 of Banks 2014, the fear interaction term and the fear condition term are statistically significant at p<0.05 for predicting the two previously-unreported non-dichotomous outcome variables and for predicting the scale of these two variables; the anger interaction term and the anger condition term are statistically significant at p<0.05 for predicting two of these three outcome variables, with p-values for the residual "Obama handling" outcome variable at roughly 0.10. The revised manuscript describing these results is here.

---

Data are here, and code for the initial submission is here.

---

Antoine Banks has published several studies on anger and racial politics (here, for example) that should be considered when making inferences about the substance of the effect of anger on racial attitudes. Banks had a similar article published in the AJPS, with Nicholas Valentino. Data for that article are here. I did not see any problems with that analysis, but I didn't look very hard, because the posted data were not the raw data: the posted data that I checked omitted, for example, the variables used to construct the outcome variable.

Tagged with: , , , , , , ,

The Public Opinion Quarterly article "Bias in the Flesh" provided evidence of "an evaluative penalty for darker skin" (quote from the abstract). Study 2 of the article was an MTurk survey. Some respondents were shown an image of Barack Obama with darkened skin, and some respondents were shown an image of Barack Obama with lightened skin. Both sets of respondents received the text: "We are interested in how people evaluate images of political figures. Consider the following image:"

Immediately following the image and text, respondents received 14 items that could be used to assess this evaluative penalty for darker skin; these items are listed in the boxes below. The first 11 items could be used to measure whether, compared to respondents in one of the conditions, respondents in the other condition completed more word fragments with words associated with negative stereotypes, such as LAZY or CRIME.

Please complete the following word fragments. Make sure to type out the entire word.
1. L A _ _
2. C R _ _ _
3. _ _ O R
4. R _ _
5. W E L _ _ _ _
6. _ _ C E
7. D _ _ _ Y
8. B R _ _ _ _ _
9. _ _ A C K
10. M I _ _ _ _ _ _
11. D R _ _
How competent is Barrack Obama?
1. Very competent
2. Somewhat competent
3. Neither competent nor incompetent
4. Somewhat incompetent
5. Very incompetent
How trustworthy is Barrack Obama?
1. Very trustworthy
2. Somewhat trustworthy
3. Neither trustworthy nor untrustworthy
4. Somewhat untrustworthy
5. Very untrustworthy
On a scale from 0 (coldest) to 100 (warmest) how do you feel about Barack Obama?

The three bolded items above are the only three items for which results were reported on in the article (items 1, 3, and 7) and in the corresponding Monkey Cage post. In other words, the researchers selected 3 of 14 items to assess the evaluative penalty for darker skin. [Update: Footnote 16 in the article reported results for the combination of lazy, black, poor, welfare, crime, and dirty (p=0.078).]

If I'm using the correct formula, there are 16,369 different combinations of 14 items that could have been reported, not counting the null set and not counting reporting on only one item. Hopefully, I don't need a formula or calculation to convince you that there is a pretty good chance that random assignment variation alone would produce an associated two-tailed p-value less than 0.05 in at least one of those 16,369 combinations. The fact that the study reported one of these combinations doesn't provide much information about the evaluative penalty for darker skin.

The really discomforting part of this selective reporting is how transparently it was done: the main text of the article noted that only 3 of 14 puzzle-type items were selected, and the supplemental file included the items about Obama's competency, Obama's trustworthiness, and the Obama feeling thermometer. There was nothing hidden about this selective reporting, from what I can tell.

---

Notes:

1. For what it's worth, the survey had an item asking whether Obama's race is white, black, or mixed. But that doesn't seems to be useful for measuring an evaluative penalty for darker skin, so I didn't count it.

2. It's possible that the number of permutations that the peer reviewers would have permitted is less than 16,369. But that's an open question, given that the peer reviewers permitted 3 of 14 potential outcome variables to be reported [Update: ...in the main text of the article].

3. The data are not publicly available to analyze, so maybe the selective reporting in this instance didn't matter. I put in a request last week for the data, so hopefully we'll find out.

---

UPDATE (Jan 12, 2016)

1. I changed the title of the post from "Researchers select 1 of 16,369 combinations to report" to "Researchers select 2 of 16,369 combinations to report", because I overlooked footnote 16 in the article. Thanks to Solomon Messing for the pointer.

2. Omar Wasow noted that two of the items had a misspelling of Barack Obama's first name. Those misspellings appear in the questionnaire in the supplemental file for the article.

---

UPDATE (Jan 13, 2016)

1. Solomon Messing noted that data for the article are now available at the Dataverse. I followed as best I could the posted R code to reproduce the analysis in Stata, and I came close to the results reported in the article. I got the same percentages for the three word puzzles as the percentages that appear the article: 33% for the lightened photo, and 45% for the darkened photo, with a small difference in t-scores (t=2.74 to t=2.64). Estimates and t-scores were also close for the reported result in footnote 16: estimates of 0.98 and 1.11 for me, and estimates of 0.97 and 1.11 in the article, with respective t-scores of 1.79 and 1.77. Compared to the 630 unexcluded respondents for the article, I had 5 extra respondents after exclusions (635 total).

The table below reports results from t-tests that I conducted. The Stata code is available here.

Bias in the Flesh Table 1

Let me note a few things from the table:

First, I reproduced the finding that, when the word puzzles were limited to the combination of lazy, dirty, and poor, unexcluded respondents in the darkened photo condition completed more word puzzles in a stereotype-congruent way than unexcluded respondents in the lightened photo condition.

However, if I combine the word puzzles for race, minority, and rap, the finding is that unexcluded respondents in the lightened photo condition completed more word puzzles in a stereotype-congruent way than unexcluded respondents in the darkened photo condition: the opposite inference. Same thing when I combine race, minority, rap, and welfare. And same thing when I combine race, minority, rap, welfare, and crime.

Sure, as a group, these five stereotypes -- race, minority, rap, welfare, and crime -- don't have the highest face validity of the 11 stereotypes for being the most negative stereotypes, but there doesn't appear to be anyone in political science enforcing a rule that researchers must report all potential or intended outcome variables.

2. Estimates for 5 of the 11 stereotype items fell to the negative side of zero, indicating that unexcluded respondents in the lightened photo condition completed more word puzzles in a stereotype-congruent way than unexcluded respondents in the darkened photo condition. And estimates for 6 of the 11 stereotype items fell to the positive side of zero, indicating that unexcluded respondents in the darkened photo condition completed more word puzzles in a stereotype-congruent way than unexcluded respondents in the lightened photo condition.

A 5-to-6 split like that is what we'd expect if there were truly no effect, so -- in that sense -- this experiment doesn't provide much evidence for the relative effect of the darkened photo. That isn't a statement that the true relative effect of the darkened photo is exactly zero, but it is a statement about the evidence that this experiment has provided.

For what it's worth, the effect size is 0.118 and the p-value is 0.060 for the combination of word puzzles that I think has the most face validity for being the most negative stereotypes (lazy, poor, welfare, crime, drug, and dirty); the effect size is -0.032 and the p-value is 0.560 for the combination of word puzzles that I think have the least face validity for being the most negative stereotypes (race, black, brother, minority, and rap). So I'm not going to make any bets that the true effect is zero or that the lightened photo fosters relatively more activation of negative stereotypes.

3. Results for the competence, trustworthiness, and feeling thermometer items are pretty much what would be expected if the photo manipulation had no true effect on these items, with respective p-values of 0.904, 0.962, and 0.737. Solomon Messing noted that there is no expectation from the literature of an effect for these items, but now that I think of it, I'm not sure why there should be no expectation that showing a darkened photo of Obama would be expected to [1] make people more likely to call to mind negative racial stereotypes such as lazy and dirty but [2] have no effect on perceptions of Obama. In any event, I think that readers should have been told about the results for the competence, trustworthiness, and feeling thermometer items.

4. The report on these data suggested that the true effect is that the darkened photo increased stereotype activation. But I could have used the same data to argue for the inference that the darkened photo had no effect at all or at best only a negligible effect on stereotype activation and on attitudes toward Obama, had I reported the combination of all 11 word puzzles, plus the competence, trustworthiness, and feeling thermometer items. Moreover, had I selectively reported results and failed to inform peer reviewers of all the items, it might even have been possible to have published an argument that the true effect was that the lightened photo caused an increase in stereotype activation. I don't know why I should trust non-preregistered research if researchers have that much influence over inferences.

5. Feel free to check my code for errors or to report better ways to analyze the data.

Tagged with: , , ,

The Monkey Cage published a post, "Racial prejudice is driving opposition to paying college athletes. Here's the evidence." I tweeted about this post in several threads, but I'm posting the information here for possible future reference and for anyone who reads the blog.

Here's the key figure from the post. The left side of the post indicates that white respondents expressed more opposition to paying college athletes after exposure to a picture of black athletes than in a control condition with no picture.

After reading the post, I noted two oddities about the figure. First, based on the logic of an experiment -- change one thing only to assess the effect of that thing -- the proper comparison for assessing racial bias among white respondents would have been comparing the effect of a photo of black athletes to the effect of a photo of white athletes; that comparison would have removed the alternate explanations that respondents expressed more opposition because a photo was shown or because a photo of athletes was shown, and not necessarily because a photo of *black* athletes was shown. Second, the data were from the CCES, which typically has team samples of 1,000 respondents; these samples are presumably intended to be a representative of the national population, so there should be more than 411 whites in a 1,000-respondent sample.

Putting two and two together suggested that there was an unreported condition in which respondents were shown a photo of white athletes. I emailed the three authors of the blog post, and to their credit I received substantive replies to my questions about the experiment. Based on the team's responses, the experiment did have a condition in which respondents were shown a photo of white athletes, and opposition to paying college athletes in this "white athletes" photo condition did not differ at p<0.05 (two-tailed test) from opposition to paying college athletes in the "black athletes" photo condition.

Tagged with: , , ,