Page 21

How big are your samples, if the likelihoods are 0%, 50%, and 67%?

By L.J Zigerell Posted on July 22, 2016 Posted in Sex 2 Comments Tagged with secret data, sex

With Only 1 Woman in Your Candidate Pool, There’s Statistically No Chance She’ll Be Hired https://t.co/i5ZpUGKaLh pic.twitter.com/spZsAXvlwx

— Jennifer Lee (@JLeeSoc) April 26, 2016

The above tweet links to this article discussing a study of hiring outcomes for 598 job finalists in finalist groups of 3 to 11 members.

The finalist groups in the sample ranged from 3 to 11 members, but the data in the figure are restricted to an unreported number of groups with exactly 4 members. The likelihoods in the figure of 0%, 50%, and 67% did not suggest large samples, so I emailed the faculty authors at Stefanie.Johnson [at] colorado.edu (on April 26) and david.hekman [at] colorado.edu (on May 2) asking for the data or for information on the sample sizes for the figure likelihoods. I also asked whether a woman was hired from a pool of any size in which only one finalist was a woman. I later tweeted a question to the faculty author who I found on Twitter.

I have not yet received a reply from either of these faculty authors.

I acknowledge researchers who provide data, code, and/or information upon request, so I thought it would be a good idea to note the researchers who don't.

Funnel plot for "Differences in Helping Whites and Blacks"

By L.J Zigerell Posted on July 22, 2016 Posted in Race No Comments Tagged with race, selective reporting

I happened across the Saucier et al. 2005 meta-analysis "Differences in Helping Whites and Blacks: A Meta-Analysis" (ungated), and I decided to plot the effect size against the standard error in a funnel plot to assess the possibility of publication bias.The funnel plot is below.

Funnel plot asymmetry was not detected in Begg's test (p=0.486) but was detected in the higher-powered Egger's test (p=0.009)

---

NOTE:

1. Saucier et al. 2005 reported sample sizes but not ~~effect sizes~~ standard errors for each study, so I estimated the standard errors with formula 7.30 of Hunter and Schmidt (2004: 286).

2. Code here.

Symbolic racism and gun control attitudes

By L.J Zigerell Posted on July 20, 2016 Posted in Race No Comments Tagged with race, selective reporting, symbolic racism, you're doing it wrong

I previously discussed Filindra and Kaplan 2016 in terms of the current state of political science research transparency, but this post will discuss the article more substantively.

Let's start with a re-quote regarding the purpose and research design of the Filindra and Kaplan 2016 experiment:

To determine whether racial prejudice depresses white support for gun control, we designed a priming experiment which exposed respondents to pictures of blacks and whites drawn from the IAT. Results show that exposure to the prime suppressed support for gun control compared to the control, conditional upon a respondent's level of racial resentment (p. 255).

Under the guise of a cognitive test, we exposed 600 survey participants who self-identified as white to three pictures of the faces of black individuals and another three of white individuals (p. 261).

For predicting the two gun-related outcome variable scales for the experiment, Table 1 indicates in separate models that the treatment alone, the treatment and a measure of symbolic racism alone, and the interaction of the treatment and symbolic racism all reach statistical significance at at least p<0.10 with a two-tailed test.

But the outcome variable scales are built from a subset of measured gun-related items. Filindra and Kaplan 2016 reported an exploratory factor analysis used to select items for outcome variable scales: 7 of 13 policy items about guns and 8 of 9 belief items about guns were selected for inclusion in the scales. The dataset for the article uploaded to the Dataverse did not contain data for the omitted policy and belief items, so I requested these data from Dr. Filindra. I did not receive access to these data.

It's reasonable to use factor analysis to decide which items to include in a scale, but this permits researcher flexibility about whether to perform the factor analysis in the first place and, if so, about whether to place all items in a single factor analysis or to, as in Filindra and Kaplan 2016, separate the items into groups and conduct a factor analysis for each group.

---

But the main problem with the experiment is not the flexibility in building the outcome variable scales. The main problem is that the research design does not permit an inference of racial prejudice.

The Filindra and Kaplan 2016 experimental design of a control and a single treatment of the black/white photo combination permits at most only the inference of a "causal relationship between racial considerations and gun policy preferences among whites" (p. 263, emphasis added). However, Filindra and Kaplan 2016 also discussed the experiment as if the treatment had been only photos of blacks (p. 263):

Our priming experiment shows that mere short exposure to pictures of blacks can drive opposition to gun control.

The Filindra and Kaplan experimental design does not permit assigning the measured effect to the photos of blacks isolated from the photos of whites, so I'm not sure why peer reviewers would have permitted that claim, which appeared in exactly the same form on page 9 of Filindra and Kaplan's 2015 MPSA paper.

---

Filindra and Kaplan 2016 supplement the experiment with a correlational study using symbolic racism to predict the ANES gun control item. But, as other researchers and I have noted, there is an inferential problem using symbolic racism in correlational studies, because symbolic racism conflates racial prejudice and nonracial attitudes; for example, knowing only that a person believes that blacks should not receive special favors cannot tell us whether that person's belief is motivated by antiblack bias, nonracial opposition to special favors, or some combination of the two.

My article here provides a sense of how strong a residual post-statistical-control correlation between symbolic racism and an outcome variable must be before one can confidently claim that the correlation is tapping antiblack bias. To illustrate this, I used linear regression on the 2012 ANES Time Series Study data, weighted and limited to white respondents, to predict responses to the gun control item, which was coded on a standardized scale so that the lowest value is the response that the federal government should make it more difficult to buy a gun, the middle response is that the rules should be kept the same, and the highest value is that the federal government should make it easier to buy a gun.

The standardized symbolic racism scale produced a 0.068 (p=0.012) residual correlation with the standardized gun control item, with the model including the full set of statistical control as described in the note below. That was about the same residual correlation as for predicting a standardized scale measuring conservative attitudes toward women (0.108, p<0.001), about the same residual correlation as for predicting a standardized abortion scale (-0.087, p<0.001), and about the same residual correlation as for predicting a standardized item about whether people should be permitted to place Social Security payroll taxes into personal accounts (0.070, p=0.007).

So, based on these data alone, racial prejudice as measured with symbolic racism has about as much "effect" on attitudes about gun control as it does on attitudes about women, abortion, and private accounts for Social Security. I think it's unlikely that bias against blacks causes conservative attitudes toward women, so I don't think that the 2012 ANES data can resolve whether or the extent to which bias against blacks causes support for gun control.

I would bet that there is some connection between antiblack prejudice and gun control, but I wouldn't argue that Filindra and Kaplan 2016 provide convincing evidence of this. Of course, it looks like a version of the Filindra and Kaplan 2016 paper won a national award, so what do I know?

---

NOTES:

1. Code for my analysis reported above is here.

2. The full set of statistical control has controls for: respondent sex, marital status, age group, education level, household income, employment status, Republican party membership, Democratic Party membership, self-reported political ideology, and items measuring attitudes about whether jobs should be guaranteed, limited government, moral traditionalism, authoritarianism, and egalitarianism.

3. Filindra and Kaplan 2016 Table 2 reports a larger effect size for symbolic racism in the 2004 and 2008 ANES data than in the 2012 ANES data, with respective values for the maximum change in probability of support of -0.23, -0.25, and -0.16. The mean of the 2004 and 2008 estimate is 50% larger than the 2012 estimate, so increasing the 2012 residual correlation of 0.068 by 50% produces 0.102, which is still about the same residual correlation as for conservative attitudes about women. Based on Table 6 of my article, I would not be comfortable alleging an effect for racial bias with anything under a 0.15 residual correlation with a full set of statistical control.

Improving research transparency in political science

By L.J Zigerell Posted on July 20, 2016 Posted in Race No Comments Tagged with methods, race, symbolic racism

Journals requiring the posting of data and code for published articles is a major improvement in the conduct of social science because it increases the ability of researchers to assess the correctness and robustness of reported results and because it presumably produces more careful analyses by researchers aware that their data and code will be made public.

But the DA-RT agreement to "[r]equire authors to ensure that cited data are available at the time of publication through a trusted digital repository" does not address selective reporting. For example, the current replication policy for the journal Political Behavior requires only that "[a]uthors of accepted manuscripts will be required to deposit all of the data and script files needed to replicate the published results in a trusted data repository such as ICPSR or Dataverse" (emphasis added).

This permits researchers to selectively report experiments, experimental conditions, and potential outcome variables, and to then delete the corresponding data from the dataset that is made public. Readers thus often cannot be sure whether the reported research has been selectively reported.

---

Consider uncertainty about the survey experiment reported in Filindra and Kaplan 2016, described in the article's abstract as follows (p. 255):

To determine whether racial prejudice depresses white support for gun control, we designed a priming experiment which exposed respondents to pictures of blacks and whites drawn from the IAT. Results show that exposure to the prime suppressed support for gun control compared to the control, conditional upon a respondent's level of racial resentment.

But here is a description of the experimental treatment (p. 261):

Under the guise of a cognitive test, we exposed 600 survey participants who self-identified as white to three pictures of the faces of black individuals and another three of white individuals.

I wasn't sure why a survey experiment intended "[t]o determine whether racial prejudice depresses white support for gun control" would have as its only treatment a prime that consisted of photos of both blacks and whites. It seems more logical for a "racial prejudice" experiment to have one condition in which participants were shown photos of blacks and another condition in which participants were shown photos of whites; then responses to gun control items that followed the photo primes could be compared for the black photo and white photo conditions.

Readers of Filindra and Kaplan 2016 might suspect that there were unreported experimental conditions in which participants were shown photos of blacks or were shown photos of whites. But readers cannot know from the article whether there were unreported conditions.

---

I didn't know of an easier way to eliminate the uncertainty about whether there were unreported conditions in Filindra and Kaplan 2016 other than asking the researchers, so I sent the corresponding author an email asking about the presence of unreported experimental conditions involving items about guns and photos of blacks and/or whites. Dr. Filindra indicated that there were no unreported conditions involving photos of blacks and/or whites, but there were unreported conditions for non-photo conditions that are planned for forthcoming work.

---

My correspondence with Dr. Filindra made me more confident in their reported results, but such correspondence is a suboptimal way to increase confidence in reported results: it took time from Drs. Filindra and Kaplan and from me, and the information from our correspondence is, as far as I am aware, available only to persons reading this blog post.

There are multiple ways for journals and researchers to remove uncertainty about selective reporting and thus increase research transparency, such as journals requiring the posting of all collected data, journals requiring researchers to make disclosures about the lack of selective reporting, and researchers preregistering plans to collect and analyze data.

Discussion of Swigger 2016 "The Effect of Gender Norms in Sitcoms on Support for Access to Abortion and Contraception"

By L.J Zigerell Posted on July 19, 2016 Posted in Sex No Comments Tagged with sex

Pursuant to a request from Nathaniel Bechhofer, in this post I discuss the research reported in "The Effect of Gender Norms in Sitcoms on Support for Access to Abortion and Contraception", by Nathaniel Swigger. See here for a post about the study and here for the publication.

---

Disclosure: For what it's worth, I met Nathaniel Swigger when I was on the job market.

---

1. I agree with Nathaniel Bechhofer that the Limitations section of Swigger 2016 is good.

2. The article does a good job with disclosures, at least implied disclosures:

I don't think that there are omitted outcome variables because the bottom paragraph of page 9 and Table 1 report on multiple outcome variables that do not reach statistical significance (the first Results paragraph reports the lack of statistical significance for the items about federal insurance paying for abortion and spending on women's shelters). After reading the blog post, I thought it was odd to devote seven items to abortion and one item to contraception insurance, but in a prior publication Swigger used seven items for abortion, one item for contraception insurance, and items for government insurance for abortion.

I don't think that there are omitted conditions. The logic of the experiment does not suggest a missing condition (like here). Moreover, the article notes that results are "not quite in the way anticipated by the hypotheses" (p. 11), so I'm generally not skeptical about underreporting for this experiment, especially given the disclosure of items for which a difference was not detected.

3. I'm less certain that this was the only experiment ever conducted testing these hypotheses, but I'm basing this on underreporting in social science generally and not on any evidence regarding this experiment. I'd like for political science journals to adopt the requirement for—or for researchers to offer—disclosure regarding the completeness of the reporting of experimental conditions, potential outcome and explanatory variables, and stopping rules for data collection.

4. The estimated effect size for the abortion index is very large. Based on Table 1, the standard deviation for the abortion index was 4.82 (from a simple mean of the conditions because I did not see an indication of the number of cases per condition). For the full sample, the difference between the How I Met Your Mother and Parks and Recreation conditions was 5.57 for the abortion index, which corresponds to an estimate of d of 1.16, which—based on this source—falls between the effect size for men being heavier than women (d=1.04) and liberals liking Michelle Obama more than conservatives do (d=1.26). For another comparison, the How I Met Your Mother versus Parks and Recreation difference caused a 5.57 difference on the abortion index, which is less than the 4.47 difference between Catholics and persons who are not Christian or Muslim.

The experiment had 87 participants after exclusions, across three conditions. A power calculation indicated that 29 participants per condition would permit detection of a relatively large d=0.74 effect size 80 percent of the time. Another way to think of the observed d=1.16 effect size is that, if the experiment were conducted over and over again with 29 participants per condition, 99 times of 100 the experiment would be expected to detect a difference on the abortion index between the How I Met Your Mother and Parks and Recreation conditions.

Table 3 output for the dichotomous contraception insurance item is in logit coefficients, but Table 1 indicates the effect sizes more intuitively, with means for the How I Met Your Mother and Parks and Recreation conditions of 0.19 and 0.50, which is about a difference of a factor of 2.6. The control condition mean is 0.69, which corresponds to a factor of 3.6 difference compared to the How I Met Your Mother condition.

---

In conclusion, I don't see anything out of the ordinary in the reported analyses, but the effect sizes are larger than I would expect. Theoretically, the article notes on page 7 that the How I Met Your Mother and Parks and Recreation stimuli differ in many ways, so it's impossible to isolate the reason for any detected effect, so it's probably best to describe the results in more general terms about the effect of sitcoms, as Sean McElwee did.

Selective reporting in "Priming Racial Resentment without Stereotypic Cues"

By L.J Zigerell Posted on June 29, 2016 Posted in Race No Comments Tagged with methods, race, selective reporting, you're doing it wrong

In a survey experiment reported in LaFleur Stephens-Dougan's 2016 Journal of Politics article, "Priming Racial Resentment without Stereotypic Cues", respondents were shown a campaign mailer for a white candidate named Greg Davis, with experimental manipulations of the candidate's party (Democrat or Republican) and the photos on the candidate's mailers (five photos of whites, five photos of blacks, or a mixture of photos of whites and blacks).

One key finding described in the abstract is that "white Democratic candidates are penalized for associating with blacks, even if blacks are portrayed in a positive manner".

The JOP article describes the analysis in Chapter 5 of Stephens' dissertation, which reported on a July 2011 YouGov/Polimetrix survey. Dissertation page 173 indicated that the survey had 13 experimental conditions, but the JOP article reports only six conditions, omitting the control condition and the six conditions in which candidate Greg Davis was black. Stephens-Dougan might plan to report on these omitted conditions in a subsequent publication, so I'll concentrate on the outcome variables.

---

Many potential outcome variables were not reported on in the JOP article, based on a comparison of the survey questionnaire in Appendix D of the dissertation to the outcome variables mentioned in the main text or the appendix of the article. The box below describes each post-treatment item on the survey questionnaire except for the Q7 manipulation check: regular font with a [*] indicates items reported on in the article, and boldface indicates items not reported on in the article. See pages 224 to 230 of the dissertation for the exact item wording.

Q8. Feeling thermometer for the candidate.

Q9 [*]. Likelihood of voting for Greg Davis.

Q10. How well a series of terms describe Greg Davis:

Intelligent

Inexperienced

Trustworthy

Hardworking

Fair [*]

Competent

Q11. Perception of Greg Davis' political ideology.

Q12. Whether Democrats, Republicans, or neither party would be better at:

Helping Senior Citizens

Improving Health Care

Improving the Economy

Reducing Crime

Reforming Public Education

Q13. Like, dislike, or neither for the Democratic Party.

Q14. Like, dislike, or neither for the Republican Party.

Q15. Perception of how well Greg Davis would handle:

Helping Senior Citizens

Improving Health Care

Improving the Economy

Reducing Crime [*]

Reforming Public Education

Q16 [*]. Whether Greg Davis' policies will favor Whites over Blacks, Blacks over Whites, or neither.

Q17. Perception of which groups Greg Davis will help if elected:

Teachers

Latinos

Corporate Executives

Farmers

Senior Citizens

African Americans

Homeowners

Students

Small Business Owners

Whites

Q18 [*]. Perception of Greg Davis' position on affirmative action for Blacks in the workplace.

Q19. Perception of Greg Davis' position on the level of federal spending on Social Security.

Q20. Job approval, disapproval, or neither for Barack Obama as president.

The boldface above indicates that many potential outcome variables were not reported on in the JOP article. For example, Q10 asked respondents how well the terms "intelligent", "inexperienced", "trustworthy", "hardworking", "fair", and "competent" describe the candidate, but readers are told about results only for "fair"; readers are told results for Q16 about the candidate's perceived preference for policies that help whites over blacks or vice versa, but readers are not told results for "Whites" and "Blacks" on Q17 about the groups that the candidate is expected to help.

Perhaps the estimates and inferences are identical for the omitted and included items, but prior analyses [e.g., here and here] suggest that omitted items often produce different estimates and sometimes produce different inferences than included items.

Data for most omitted potential outcome variables are not in the article's dataset at the JOP Dataverse, but the dataset did contain a "thermgregdavis" variable that ranged from 1 to 100, which is presumably the feeling thermometer in Q8. I used the model in line 14 of Stephens-Dougan's code, but -- instead of using the reported Q9 outcome variable for likelihood of voting for Greg Davis -- I used "thermgregdavis" as the outcome variable and changed the estimation technique from ordered logit to linear regression: the p-value for the difference between the all-white photo condition and the mixed photo condition was p=0.693, and the p-value for the difference between the all-white photo condition and the all-black photo condition was p=0.264.

---

This sort of selective reporting is not uncommon in social science [see here, here, and here], but I'm skeptical that researchers with the flexibility to report the results they want based on post-hoc research design choices will produce replicable estimates and unbiased inferences, especially in the politically-charged racial discrimination subfield. I am also skeptical that selective reporting across publications will balance out in a field in which a supermajority of researchers fall on one side of the political spectrum.

So how can such selective reporting be prevented? Researchers can preregister their research designs. Journals can preaccept articles based on a preregistered research design. For non-preregistered studies, journals can require as a condition of publication the declaration of omitted studies, experiments, experimental conditions, and outcome variables. Peer reviewers can ask for these declarations, too.

---

It's also worth comparing the hypotheses as expressed in the dissertation to the hypotheses as expressed in the JOP article. First, the hypotheses from dissertation Chapter 5, on page 153:

H1: Democratic candidates are penalized for an association with African Americans.

H2: Republican candidates are rewarded for an association with African Americans.

H3: The racial composition of an advertisement influences voters' perceptions of the candidates' policy preferences.

Now, the JOP hypotheses:

H1. White Democratic candidates associated with blacks will lose vote support and will be perceived as more likely to favor blacks over whites and more likely to support affirmative action relative to white Democratic candidates associated with images of only whites.

H2. Counterstereotypic images of African Americans paired with a white Democratic candidate will prime racial attitudes on candidate evaluations that are implicitly racial relative to a comparable white Democratic candidate associated with all whites.

H3. Counterstereotypic images of African Americans paired with a white Republican candidate will be inconsequential such that they will not be associated with a main effect or a racial priming effect.

So hypotheses became more specific for Democratic candidates and switched from Republicans being rewarded to Republicans not experiencing a consequential effect. My sense is that hypothesis modification is not uncommon in social science, but the reason for the survey items asking about personal characteristics of the candidate (e.g., trustworthy, competent) is clearer in light of the dissertation's hypotheses about candidates being penalized or rewarded for an association with African Americans. After all, the feeling thermometer and the other Q10 characteristic items can be used to assess a penalty or reward for candidates.

---

In terms of the substance of the penalty, the abstract of the JOP article notes: "I empirically demonstrate that white Democratic candidates are penalized for associating with blacks, even if blacks are portrayed in a positive manner."

My analysis of the data indicated that, based on a model with no controls and no cases dropped, and comparing the all-white photo condition to the all-black photo condition, there is evidence of this penalty in the Q9 vote item at p=0.074. However, evidence for this penalty is weak in the feeling thermometer (p=0.248) and in the "fair" item (p=0.483), and I saw no evidence in the article or dissertation that the penalty can be detected in the items omitted from the dataset.

Moreover, much of the estimated penalty might reflect only the race of persons in the photos providing a signal about candidate Greg Davis' ideology. Compared to respondents in the all-white photo condition, respondents in the mixed photo condition and the all-black photo condition rated Greg Davis as more liberal (p-values of 0.014 and 0.004), and the p=0.074 penalty in the Q9 vote item inflates to p=0.710 when including the measure of Greg Davis' perceived ideology, with corresponding p-values ranging from p=0.600 to p=0.964 for models predicting a penalty in the thermometer and the "fair" item.

---

NOTES:

1. H/T to Brendan Nyhan for the pointer to the JOP article.

2. The JOP article emphasizes the counterstereotypical nature of the mailer photos of blacks, but the experiment did not vary the photos of blacks, so the experiment provides no evidence about the influence of the counterstereotypical nature of the photos.

3. The JOP article reports four manipulation checks (footnote 6), but the dissertation reports five manipulation checks (footnote 65, p. 156). The omitted manipulation check concerned whether the candidate tried to appeal to racial feelings. The dataset for the article at the JOP Dataverse has a "manipchk_racialfeelings" variable that is presumably this omitted manipulation check.

4. The abstract reports that "Racial resentment was primed such that white Democratic candidates associated with blacks were perceived as less fair, less likely to reduce crime, and less likely to receive vote support." However, Table 2 of the article and my analysis indicate that no photo condition comparison produced a statistically-significant main effect for the "fair" item and only the all-white vs. mixed photo comparison produced a statistically-significant main effect for perceptions of the likelihood of reducing crime, with this one main effect reaching statistical significance only under the article's generous convention of using a statistical significance asterisk for a one-tailed p-value less than 0.10 (the p-value was p=0.142).

Table 4 of the article indicated a statistically-significant interaction between photo conditions and racial resentment when predicting the "fair" item and perceptions of the likelihood of reducing crime, so I think that this interaction is what is referred to in the abstract statement that "Racial resentment was primed such that white Democratic candidates associated with blacks were perceived as less fair, less likely to reduce crime, and less likely to receive vote support."

5. The 0.142 p-value referred to in the previous item inflates to p=0.340 when the controls are removed from the model. There are valid reasons for including demographic controls in a regression predicting results from a survey experiment, but the particular set of controls should be preregistered to prevent researchers from estimating models without controls and with different combinations of controls and then selecting a model or models to report based on the corresponding p-value or effect size.

6. Code for the new analyses:

reg votegregdavis i.whitedem_treatments [pweight = weight]
reg thermgregdavis i.whitedem_treatments [pweight = weight]
reg fair_gregdavis i.whitedem_treatments [pweight = weight]
reg ideo_gregdavis i.whitedem_treatments [pweight = weight]
reg votegregdavis i.whitedem_treatments ideo_gregdavis [pweight = weight]
reg thermgregdavis i.whitedem_treatments ideo_gregdavis [pweight = weight]
reg fair_gregdavis i.whitedem_treatments ideo_gregdavis [pweight = weight]
ologit fair_gregdavis i.whitedem_treatments gender educ income pid7 south [pweight = weight]
ologit gregdavis_redcrim i.whitedem_treatments gender educ income pid7 south [pweight = weight]
ologit gregdavis_redcrim i.whitedem_treatments [pweight = weight]

7. I emailed Dr. Stephens-Dougan, asking whether there was a reason for the exclusion of items and about access to a full dataset. I received a response and invited her to comment on this post.

This female researcher should be cited more often

By L.J Zigerell Posted on June 6, 2016 Posted in Methods No Comments Tagged with inequality, list experiment, methods, sex, you're doing it wrong

My article reanalyzing data on a gender gap in citations to international relations articles indicated that the gender gap is largely confined to elite articles, defined as articles in the right tail of citation counts or articles in the top three political science journals. That article concerned an aggregate gender gap in citations, but this post is about a particular woman who has been under-cited in the social science literature.

It is not uncommon to read a list experiment study that suggests or states that the list experiment originated in the research described in the Kuklinski, Cobb, and Gilens 1997 article, "Racial Attitudes and the New South." For example, from Heerwig and McCabe 2009 (p. 678):

Pioneered by Kuklinski, Cobb, and Gilens (1997) to measure social desirability bias in reporting racial attitudes in the "New South," the list experiment is an increasingly popular methodological tool for measuring social desirability bias in self-reported attitudes and behaviors.

Kuklinski et al. described a list experiment that was placed on the 1991 National Race and Politics Survey. Kuklinski and colleagues appeared to propose the list experiment as a new measure (p. 327):

We offer as our version of an unobtrusive measure the list experiment. Imagine a representative sample of a general population divided randomly in two. One half are presented with a list of three items and asked to say how many of these items make them angry — not which specific items make them angry, just how many. The other half receive the same list plus an additional item about race and are also asked to indicate the number of items that make them angry. [screen shot]

The initial draft of my list experiment article reflected the belief that the list experiment originated with Kuklinski et al., but I then learned [*] of Judith Droitcour Miller's 1984 dissertation, which contained this passage:

The new item-count/paired lists technique is designed to avoid the pitfalls encountered by previous indirect estimation methods. Briefly, respondents are shown a list of four or five behavior categories (the specific number is arbitrary) and are then asked to report how many of these behaviors they have engaged in — not which categories apply to them. Nothing else is required of respondents or interviewers. Unbiased estimation is possible because two slightly different list forms (paired lists) are administered to two separate subsamples of respondents, which have been randomly selected in advance by the investigator. The two list forms differ only in that the deviant behavior item is included on one list, but omitted from the other. Once the alternate forms have been administered to the two randomly equivalent subsamples, an estimate of deviant behavior prevalence can be derived from the difference between the average list scores. [screen shot]

The above passage was drawn from pages 3 and 4 of Judith Droitcour Miller's 1984 dissertation at the George Washington University, "A New Survey Technique for Studying Deviant Behavior." [Here is another description of the method, in a passage from the 2004 edition of the 1991 book, Measurement Errors in Surveys (p. 88)]

It's possible that James Kuklinski independently invented the list experiment, but descriptions of the list experiment's origin should nonetheless cite Judith Droitcour Miller's 1984 dissertation as a prior — if not the first [**] — example of the procedure known as the list experiment.

---

[*] I think it was the Adam Glynn manuscript described below through which I learned of Miller's dissertation.

[**] An Adam Glynn manuscript discussed the list experiment and item count method as special cases of aggregated response techniques. Glynn referenced a 1979 Raghavarao and Federer article, and that article referenced a 1974 Smith et al. manuscript that used a similar block total response procedure. The non-randomized version of the procedure split seven questions into groups of three, as illustrated in one of the questionnaires below. The procedure's unobtrusiveness derived from a researcher's inability in most cases to determine which responses a respondent had selected: for example, Yes-No-Yes produces the same total as No-No-No (5 in each case).

The questionnaire for the randomized version of the block total response procedure listed all seven questions; the respondent then drew a number and gave a total response for only those three questions that were associated with the number that was drawn: for example, if the respondent drew a 4, then the respondent gave a total for their responses to questions 4, 5, and 7. This procedure is similar to the list experiment, but the list experiment is simpler and more efficient.