Journals requiring the posting of data and code for published articles is a major improvement in the conduct of social science because it increases the ability of researchers to assess the correctness and robustness of reported results and because it presumably produces more careful analyses by researchers aware that their data and code will be made public.

But the DA-RT agreement to "[r]equire authors to ensure that cited data are available at the time of publication through a trusted digital repository" does not address selective reporting. For example, the current replication policy for the journal Political Behavior requires only that "[a]uthors of accepted manuscripts will be required to deposit all of the data and script files needed to replicate the published results in a trusted data repository such as ICPSR or Dataverse" (emphasis added).

This permits researchers to selectively report experiments, experimental conditions, and potential outcome variables, and to then delete the corresponding data from the dataset that is made public. Readers thus often cannot be sure whether the reported research has been selectively reported.

---

Consider uncertainty about the survey experiment reported in Filindra and Kaplan 2016, described in the article's abstract as follows (p. 255):

To determine whether racial prejudice depresses white support for gun control, we designed a priming experiment which exposed respondents to pictures of blacks and whites drawn from the IAT. Results show that exposure to the prime suppressed support for gun control compared to the control, conditional upon a respondent's level of racial resentment.

But here is a description of the experimental treatment (p. 261):

Under the guise of a cognitive test, we exposed 600 survey participants who self-identified as white to three pictures of the faces of black individuals and another three of white individuals.

I wasn't sure why a survey experiment intended "[t]o determine whether racial prejudice depresses white support for gun control" would have as its only treatment a prime that consisted of photos of both blacks and whites. It seems more logical for a "racial prejudice" experiment to have one condition in which participants were shown photos of blacks and another condition in which participants were shown photos of whites; then responses to gun control items that followed the photo primes could be compared for the black photo and white photo conditions.

Readers of Filindra and Kaplan 2016 might suspect that there were unreported experimental conditions in which participants were shown photos of blacks or were shown photos of whites. But readers cannot know from the article whether there were unreported conditions.

---

I didn't know of an easier way to eliminate the uncertainty about whether there were unreported conditions in Filindra and Kaplan 2016 other than asking the researchers, so I sent the corresponding author an email asking about the presence of unreported experimental conditions involving items about guns and photos of blacks and/or whites. Dr. Filindra indicated that there were no unreported conditions involving photos of blacks and/or whites, but there were unreported conditions for non-photo conditions that are planned for forthcoming work.

---

My correspondence with Dr. Filindra made me more confident in their reported results, but such correspondence is a suboptimal way to increase confidence in reported results: it took time from Drs. Filindra and Kaplan and from me, and the information from our correspondence is, as far as I am aware, available only to persons reading this blog post.

There are multiple ways for journals and researchers to remove uncertainty about selective reporting and thus increase research transparency, such as journals requiring the posting of all collected data, journals requiring researchers to make disclosures about the lack of selective reporting, and researchers preregistering plans to collect and analyze data.

Tagged with: , ,

In a survey experiment reported in LaFleur Stephens-Dougan's 2016 Journal of Politics article, "Priming Racial Resentment without Stereotypic Cues", respondents were shown a campaign mailer for a white candidate named Greg Davis, with experimental manipulations of the candidate's party (Democrat or Republican) and the photos on the candidate's mailers (five photos of whites, five photos of blacks, or a mixture of photos of whites and blacks).

One key finding described in the abstract is that "white Democratic candidates are penalized for associating with blacks, even if blacks are portrayed in a positive manner".

The JOP article describes the analysis in Chapter 5 of Stephens' dissertation, which reported on a July 2011 YouGov/Polimetrix survey. Dissertation page 173 indicated that the survey had 13 experimental conditions, but the JOP article reports only six conditions, omitting the control condition and the six conditions in which candidate Greg Davis was black. Stephens-Dougan might plan to report on these omitted conditions in a subsequent publication, so I'll concentrate on the outcome variables.

---

Many potential outcome variables were not reported on in the JOP article, based on a comparison of the survey questionnaire in Appendix D of the dissertation to the outcome variables mentioned in the main text or the appendix of the article. The box below describes each post-treatment item on the survey questionnaire except for the Q7 manipulation check: regular font with a [*] indicates items reported on in the article, and boldface indicates items not reported on in the article. See pages 224 to 230 of the dissertation for the exact item wording.

Q8. Feeling thermometer for the candidate.

Q9 [*]. Likelihood of voting for Greg Davis.

Q10. How well a series of terms describe Greg Davis:

  • Intelligent
  • Inexperienced
  • Trustworthy
  • Hardworking
  • Fair [*]
  • Competent

Q11. Perception of Greg Davis' political ideology.

Q12. Whether Democrats, Republicans, or neither party would be better at:

  • Helping Senior Citizens
  • Improving Health Care
  • Improving the Economy
  • Reducing Crime
  • Reforming Public Education

Q13. Like, dislike, or neither for the Democratic Party.

Q14. Like, dislike, or neither for the Republican Party.

Q15. Perception of how well Greg Davis would handle:

  • Helping Senior Citizens
  • Improving Health Care
  • Improving the Economy
  • Reducing Crime [*]
  • Reforming Public Education

Q16 [*]. Whether Greg Davis' policies will favor Whites over Blacks, Blacks over Whites, or neither.

Q17. Perception of which groups Greg Davis will help if elected:

  • Teachers
  • Latinos
  • Corporate Executives
  • Farmers
  • Senior Citizens
  • African Americans
  • Homeowners
  • Students
  • Small Business Owners
  • Whites

Q18 [*]. Perception of Greg Davis' position on affirmative action for Blacks in the workplace.

Q19. Perception of Greg Davis' position on the level of federal spending on Social Security.

Q20. Job approval, disapproval, or neither for Barack Obama as president.

The boldface above indicates that many potential outcome variables were not reported on in the JOP article. For example, Q10 asked respondents how well the terms "intelligent", "inexperienced", "trustworthy", "hardworking", "fair", and "competent" describe the candidate, but readers are told about results only for "fair"; readers are told results for Q16 about the candidate's perceived preference for policies that help whites over blacks or vice versa, but readers are not told results for "Whites" and "Blacks" on Q17 about the groups that the candidate is expected to help.

Perhaps the estimates and inferences are identical for the omitted and included items, but prior analyses [e.g., here and here] suggest that omitted items often produce different estimates and sometimes produce different inferences than included items.

Data for most omitted potential outcome variables are not in the article's dataset at the JOP Dataverse, but the dataset did contain a "thermgregdavis" variable that ranged from 1 to 100, which is presumably the feeling thermometer in Q8. I used the model in line 14 of Stephens-Dougan's code, but -- instead of using the reported Q9 outcome variable for likelihood of voting for Greg Davis -- I used "thermgregdavis" as the outcome variable and changed the estimation technique from ordered logit to linear regression: the p-value for the difference between the all-white photo condition and the mixed photo condition was p=0.693, and the p-value for the difference between the all-white photo condition and the all-black photo condition was p=0.264.

---

This sort of selective reporting is not uncommon in social science [see here, here, and here], but I'm skeptical that researchers with the flexibility to report the results they want based on post-hoc research design choices will produce replicable estimates and unbiased inferences, especially in the politically-charged racial discrimination subfield. I am also skeptical that selective reporting across publications will balance out in a field in which a supermajority of researchers fall on one side of the political spectrum.

So how can such selective reporting be prevented? Researchers can preregister their research designs. Journals can preaccept articles based on a preregistered research design. For non-preregistered studies, journals can require as a condition of publication the declaration of omitted studies, experiments, experimental conditions, and outcome variables. Peer reviewers can ask for these declarations, too.

---

It's also worth comparing the hypotheses as expressed in the dissertation to the hypotheses as expressed in the JOP article. First, the hypotheses from dissertation Chapter 5, on page 153:

H1: Democratic candidates are penalized for an association with African Americans.

H2: Republican candidates are rewarded for an association with African Americans.

H3: The racial composition of an advertisement influences voters' perceptions of the candidates' policy preferences.

Now, the JOP hypotheses:

H1. White Democratic candidates associated with blacks will lose vote support and will be perceived as more likely to favor blacks over whites and more likely to support affirmative action relative to white Democratic candidates associated with images of only whites.

H2. Counterstereotypic images of African Americans paired with a white Democratic candidate will prime racial attitudes on candidate evaluations that are implicitly racial relative to a comparable white Democratic candidate associated with all whites.

H3. Counterstereotypic images of African Americans paired with a white Republican candidate will be inconsequential such that they will not be associated with a main effect or a racial priming effect.

So hypotheses became more specific for Democratic candidates and switched from Republicans being rewarded to Republicans not experiencing a consequential effect. My sense is that hypothesis modification is not uncommon in social science, but the reason for the survey items asking about personal characteristics of the candidate (e.g., trustworthy, competent) is clearer in light of the dissertation's hypotheses about candidates being penalized or rewarded for an association with African Americans. After all, the feeling thermometer and the other Q10 characteristic items can be used to assess a penalty or reward for candidates.

---

In terms of the substance of the penalty, the abstract of the JOP article notes: "I empirically demonstrate that white Democratic candidates are penalized for associating with blacks, even if blacks are portrayed in a positive manner."

My analysis of the data indicated that, based on a model with no controls and no cases dropped, and comparing the all-white photo condition to the all-black photo condition, there is evidence of this penalty in the Q9 vote item at p=0.074. However, evidence for this penalty is weak in the feeling thermometer (p=0.248) and in the "fair" item (p=0.483), and I saw no evidence in the article or dissertation that the penalty can be detected in the items omitted from the dataset.

Moreover, much of the estimated penalty might reflect only the race of persons in the photos providing a signal about candidate Greg Davis' ideology. Compared to respondents in the all-white photo condition, respondents in the mixed photo condition and the all-black photo condition rated Greg Davis as more liberal (p-values of 0.014 and 0.004), and the p=0.074 penalty in the Q9 vote item inflates to p=0.710 when including the measure of Greg Davis' perceived ideology, with corresponding p-values ranging from p=0.600 to p=0.964 for models predicting a penalty in the thermometer and the "fair" item.

---

NOTES:

1. H/T to Brendan Nyhan for the pointer to the JOP article.

2. The JOP article emphasizes the counterstereotypical nature of the mailer photos of blacks, but the experiment did not vary the photos of blacks, so the experiment provides no evidence about the influence of the counterstereotypical nature of the photos.

3. The JOP article reports four manipulation checks (footnote 6), but the dissertation reports five manipulation checks (footnote 65, p. 156). The omitted manipulation check concerned whether the candidate tried to appeal to racial feelings. The dataset for the article at the JOP Dataverse has a "manipchk_racialfeelings" variable that is presumably this omitted manipulation check.

4. The abstract reports that "Racial resentment was primed such that white Democratic candidates associated with blacks were perceived as less fair, less likely to reduce crime, and less likely to receive vote support." However, Table 2 of the article and my analysis indicate that no photo condition comparison produced a statistically-significant main effect for the "fair" item and only the all-white vs. mixed photo comparison produced a statistically-significant main effect for perceptions of the likelihood of reducing crime, with this one main effect reaching statistical significance only under the article's generous convention of using a statistical significance asterisk for a one-tailed p-value less than 0.10 (the p-value was p=0.142).

Table 4 of the article indicated a statistically-significant interaction between photo conditions and racial resentment when predicting the "fair" item and perceptions of the likelihood of reducing crime, so I think that this interaction is what is referred to in the abstract statement that "Racial resentment was primed such that white Democratic candidates associated with blacks were perceived as less fair, less likely to reduce crime, and less likely to receive vote support."

5. The 0.142 p-value referred to in the previous item inflates to p=0.340 when the controls are removed from the model. There are valid reasons for including demographic controls in a regression predicting results from a survey experiment, but the particular set of controls should be preregistered to prevent researchers from estimating models without controls and with different combinations of controls and then selecting a model or models to report based on the corresponding p-value or effect size.

6. Code for the new analyses:

  • reg votegregdavis i.whitedem_treatments [pweight = weight]
  • reg thermgregdavis i.whitedem_treatments [pweight = weight]
  • reg fair_gregdavis i.whitedem_treatments [pweight = weight]
  • reg ideo_gregdavis i.whitedem_treatments [pweight = weight]
  • reg votegregdavis i.whitedem_treatments ideo_gregdavis [pweight = weight]
  • reg thermgregdavis i.whitedem_treatments ideo_gregdavis [pweight = weight]
  • reg fair_gregdavis i.whitedem_treatments ideo_gregdavis [pweight = weight]
  • ologit fair_gregdavis i.whitedem_treatments gender educ income pid7 south [pweight = weight]
  • ologit gregdavis_redcrim i.whitedem_treatments gender educ income pid7 south [pweight = weight]
  • ologit gregdavis_redcrim i.whitedem_treatments [pweight = weight]

7. I emailed Dr. Stephens-Dougan, asking whether there was a reason for the exclusion of items and about access to a full dataset. I received a response and invited her to comment on this post.

Tagged with: , , ,

In the Political Behavior article, "The Public's Anger: White Racial Attitudes and Opinions Toward Health Care Reform", Antoine J. Banks presented evidence that "anger uniquely pushes racial conservatives to be more opposing of health care reform while it triggers more support among racial liberals" (p. 493). Here is how the outcome variable was measured in the article's reported analysis (p. 511):

Health Care Reform is a dummy variable recoded 0-1 with 1 equals opposition to reform. The specific item is "As of right now, do you favor or oppose Barack Obama and the Democrats' Health Care reform bill". The response options were yes = I favor the health care bill or no = I oppose the health care bill.

However, the questionnaire for the study indicates that there were multiple items used to measure opinions of health care reform:

W2_1. Do you approve or disapprove of the way Barack Obama is handling Health Care? Please indicate whether you approve strongly, approve somewhat, neither approve nor disapprove, disapprove somewhat, or disapprove strongly.

W2_2. As of right now, do you favor or oppose Barack Obama and the Democrats' Health Care reform bill?

[if "favor" on W2_2] W2_2a. Do you favor Barack Obama and the Democrats' Health Care reform bill very strongly, or not so strongly?

[if "oppose" on W2_2] W2_2b. Do you oppose Barack Obama and the Democrats' Health Care reform bill very strongly, or not so strongly?

The bold item above is the only item reported on as an outcome variable in the article. The reported analysis omitted results for one outcome variable (W2_1) and reported dichotomous results for the other outcome variable (W2_2) for which the apparent intention was to have a four-pronged outcome variable from oppose strongly to favor strongly.

---

Here is the manuscript that I submitted to Political Behavior in March 2015 describing the results using the presumed intended outcome variables and a straightforward research design (e.g., no political discussion control, no exclusion of cases, cases from all conditions analyzed at the same time). Here's the main part of the main figure:

Banks2014Reproduction

The takeaway is that, with regard to opposition to health care reform, the effect of the fear condition on symbolic racism differed at a statistically significant level from the effect of the baseline relaxed condition on symbolic racism; however, contra Banks 2014, the effect of anger on symbolic racism did not differ at a statistically significant level from the effect of the relaxed condition on symbolic racism. The anger condition had a positive effect on symbolic racism, but it was not a unique influence.

The submission to Political Behavior was rejected after peer review. Comments suggested analyzing the presumed intended outcome variables while using the research design choices in Banks 2014. Using the model in Table 2 column 1 of Banks 2014, the fear interaction term and the fear condition term are statistically significant at p<0.05 for predicting the two previously-unreported non-dichotomous outcome variables and for predicting the scale of these two variables; the anger interaction term and the anger condition term are statistically significant at p<0.05 for predicting two of these three outcome variables, with p-values for the residual "Obama handling" outcome variable at roughly 0.10. The revised manuscript describing these results is here.

---

Data are here, and code for the initial submission is here.

---

Antoine Banks has published several studies on anger and racial politics (here, for example) that should be considered when making inferences about the substance of the effect of anger on racial attitudes. Banks had a similar article published in the AJPS, with Nicholas Valentino. Data for that article are here. I did not see any problems with that analysis, but I didn't look very hard, because the posted data were not the raw data: the posted data that I checked omitted, for example, the variables used to construct the outcome variable.

Tagged with: , , , , , , ,

The post is here.

Data are here for the 2016 ANES pilot study and here for the 2012 ANES time series study.

Stata code for the 2016 ANES pilot study analysis is here.

Stata code for the 2012 ANES time series study analysis is here.

Note that the use of blacks, Hispanics, gays and lesbians, feminists, transgender persons, and Muslims as the reference groups in the pilot study follows the use in the previous Monkey Cage post in white ethnocentrism.

---

To try to untangle the influence of attitudes about whites from attitudes about blacks, Hispanics, and Asians in predicting policy preferences, models were estimated for white respondents with the feeling thermometers for whites, blacks, Hispanics, and Asians entered separately, along with controls for sex, marital status, age, education, and household income. The feeling thermometers and the outcome variables were kept on an interval scale and were standardized. The data were weighted and were from the 2012 ANES time series study.

For support for more immigration, the point estimate for the standardized correlation of the Hispanic feeling thermometer was 0.22, and the point estimate for the standardized correlation of the white feeling thermometer was -0.24, indicating that attitudes about whites had roughly the same correlation as did attitudes about Hispanics: the more warm a white person feels toward Hispanics, the more immigration that person supports on average; and the more warm a white person feels toward whites, the less immigration that person supports on average.

For support for affirmative action for black students in university admissions, the point estimate for the standardized correlation of the black feeling thermometer was 0.28, and the point estimate for the standardized correlation of the white feeling thermometer was -0.12, indicating that attitudes about whites had roughly half of the correlation as did attitudes about blacks: the more warm a white person feels toward blacks, the more that person supports affirmative action in university admissions on average; and the more warm a white person feels toward whites, the less that person supports affirmative action in university admissions on average.

Stata code for the above analysis is here.

---

UPDATE (March 8, 2016)

Nathan Kalmoe asked about the results without the atypical groups for the white ethnocentrism measure. Results are below. Note that the phrase "oikophobic" refers to scores less than zero on an ethnocentrism scale, after the usage of Roger Scruton.

WHITES COMPARED TO ONLY BLACKS, HISPANICS, AND MUSLIMS

The percentage of white and nonwhite respondents on the negative side of the ethnocentrism scale is 21% and 48% (compared to 22% and 39% using the full set of six reference groups).

The range of candidate feeling thermometer scores for oikophobic whites is: Trump 22, Cruz 27, Carson 34, Fiorina 32, J Bush 33, Rubio 36, H Clinton 49, Obama 61, and Sanders 67.

The mean candidate feeling thermometer scores for nonwhites scoring less than zero are: Trump 24, Cruz 32, Carson 31, Fiorina 31, J Bush 32, Rubio 37, H Clinton 62, Obama 75, and Sanders 54.

The confidence intervals for oikophobic and ethnocentric whites on the black affirmative action item do not overlap and have a standardized difference of 0.55. The confidence intervals for oikophobic and ethnocentric whites on the immigration item do not overlap and have a standardized difference of 0.67.

WHITES COMPARED TO ONLY BLACKS AND HISPANICS

The percentage of white and nonwhite respondents on the negative end of the ethnocentrism scale is 29% and 60% (compared to 22% and 39% using the full set of six reference groups).

The range of candidate feeling thermometer scores for oikophobic whites is more muted for whites compared to using the full six reference groups or the black-Hispanic-Muslim set of reference groups: 32 for Trump, 36 for Cruz, 42 for Carson, 38 for Fiorina, 35 for J Bush, 42 for Rubio, 40 for H Clinton, 48 for Obama, and 54 for Sanders. The 95% confidence interval for Trump is [26, 37], which doesn't overlap with Obama or Sanders but overlaps with H Clinton [34, 45].

The range of candidate feeling thermometer scores is still fairly large for nonwhites scoring less than zero: 24 for Trump, 35 for Cruz, 34 for Fiorina, 33 for J Bush, 39 for Rubio, 62 for H Clinton, 73 for Obama, and 52 for Sanders. The 95% confidence interval for Trump is [18, 31], which doesn't overlap with Rubio, H Clinton, Obama, or Sanders, and is near the left edge of the confidence intervals for the remaining candidates (whose left edge confidence interval is either 29 or 30).

The difference between oikophobic and ethnocentric whites on the immigration item is still substantive: 0.65 standard deviations, with no confidence interval overlap.

The difference between oikophobic and ethnocentric whites on the item about affirmative action for blacks in university admissions is 0.56 on the 6-point scale (about 0.29 standard deviations), and the difference has a p-value of 0.04 even though the confidence intervals overlap.

Tagged with:

I tweeted about the possibility that whites are the proper target group for assessing racial bias among white liberals, so I thought I'd check available data to assess whether there is evidence for this. The recent Iyengar and Westwood 2015 AJPS article measuring different types of discrimination seemed a good place to look.

In the Iyengar and Westwood 2015 racial discrimination experiment, respondents were given a choice between two high school seniors competing for a scholarship, with names and clubs intended to signal race:

  • Arthur Wolfe, President of the Future Investment Banker Club
  • Jamal Washington, President of the African American Student Association

For some respondents, the two applicants had the same GPA (3.5 or 4.0), and, for other respondents, one of the applicants had a 3.5 GPA and the other had a 4.0 GPA.

Here are the results for white liberals and white conservatives:

EQUALLY QUALIFIED
Liberals: 73% selected the black target [n=34] CI: [58, 89]
Conservatives: 40% selected the black target [n=55] CI: [27, 53]
Difference between the 73% and the 40%: two-tailed p=0.001

BLACK TARGET MORE QUALIFIED
Liberals: 92% selected the black target [n=12] CI: [73, 110]
Conservatives: 56% selected the black target [n=23] CI: [35, 78]
Difference between the 92% and the 56%: two-tailed p=0.013

WHITE TARGET MORE QUALIFIED
Liberals: 44% selected the black target [n=18] CI: [19, 70]
Conservatives: 16% selected the black target [n=19] CI: [-2, 34]
Difference between the 44% and the 16%: two-tailed p=0.061

There were substantial differences in the estimates, with white liberals on average favoring the target with the black name when the targets were equally qualified.

Here are the results for white Democrats and white Republicans:

EQUALLY QUALIFIED
Democrats: 62% selected the black target [n=53] CI: [49, 76]
Republicans: 47% selected the black target [n=45] CI: [32, 62]
Difference between the 62% and the 47%: two-tailed p=0.125

BLACK TARGET MORE QUALIFIED
Democrats: 75% selected the black target [n=32] CI: [59, 91]
Republicans: 50% selected the black target [n=16] CI: [22, 78]
Difference between the 75% and the 50%: two-tailed p=0.104

WHITE TARGET MORE QUALIFIED
Democrats: 59% selected the black target [n=18] CI: [37, 81]
Republicans: 21% selected the black target [n=19] CI: [1, 41]
Difference between the 59% and the 21%: two-tailed p=0.012

Results indicated that white Democrats on average favored the target with the black name in all three scenarios, even when the white target was more qualified. The point estimate for white Republicans never crossed 50% in any scenario.

Data are here, and reproduction code is here.

Tagged with: