I had a recent Twitter exchange about a Monkey Cage post:

Below, I use statistical power calculations to explain why the Ahlquist et al. paper, or at least the list experiment analysis cited in the Monkey Cage post, is not compelling.

---

Discussing the paper (published version here), Henry Farrell wrote:

So in short, this research provides exactly as much evidence supporting the claim that millions of people are being kidnapped by space aliens to conduct personally invasive experiments on, as it does to support Trump's claim that millions of people are engaging in voter fraud.

However, a survey with a sample size of three would also not be able to differentiate the percentage of U.S. residents who commit vote fraud from the percentage of U.S. residents abducted by aliens. For studies that produce a null result, it is necessary to assess the ability of the study to detect an effect of a particular size, to get a sense of how informative that null result is.

The Ahlquist et al. paper has a footnote [31] that can be used to estimate the statistical power for their list experiments: more than 260,000 total participants would be needed for a list experiment to have 80% power to detect a 1 percentage point difference between treatment and control groups, using an alpha of 0.05. The power calculator here indicates that the corresponding estimated standard deviation is at least 0.91 [see note 1 below].

So let's assume that list experiment participants are truthful and that we combine the 1,000 participants from the first Ahlquist et al. list experiment with the 3,000 participants from the second Ahlquist et al. list experiment, so that we'd have 2,000 participants in the control sample and 2,000 participants in the treatment sample. Statistical power calculations using an alpha of 0.05 and a standard deviation of 0.91 indicate that there is:

  • a 5 percent chance of detecting a 1% rate of vote fraud.
  • an 18 percent chance of detecting a 3% rate of vote fraud.
  • a 41 percent chance of detecting a 5% rate of vote fraud.
  • a 79 percent chance of detecting an 8% rate of vote fraud.
  • a 94 percent chance of detecting a 10% rate of vote fraud.

---

Let's return to the claim that millions of U.S. residents committed vote fraud and use 5 million for the number of adult U.S. residents who committed vote fraud in the 2016 election, eliding the difference between illegal votes and illegal voters. There are roughly 234 million adult U.S. residents (reference), so 5 million vote fraudsters would be 2.1% of the adult population, and a 4,000-participant list experiment would have about an 11 percent chance of detecting that 2.1% rate of vote fraud.

Therefore, if 5 million adult U.S. residents really did commit vote fraud, a list experiment with the sample size of the pooled Ahlquist et al. 2014 list experiments would produce a statistically-significant detection of vote fraud about 1 of every 9 times the list experiment was conducted. The fact that Ahlquist et al. 2014 didn't detect voter impersonation at a statistically-significant level doesn't appear to compel any particular belief about whether the rate of voter impersonation in the United States is large enough to influence the outcome of presidential elections.

---

NOTES

1. Enter 0.00 for mu1, 0.01 for mu2, 0.91 for sigma, 0.05 for alpha, and a 130,000 sample size for each sample; then hit Calculate. The power will be 0.80.

2. I previously discussed the Ahlquist et al. list experiments here and here. The second link indicates that an Ahlquist et al. 2014 list experiment did detect evidence of attempted vote buying.

Tagged with: , , ,

Here are four items typically used to measure symbolic racism, in which respondents are asked to indicate their level of agreement with the statements:

1. Irish, Italians, Jewish and many other minorities overcame prejudice and worked their way up. Blacks should do the same without any special favors.

2. Generations of slavery and discrimination have created conditions that make it difficult for blacks to work their way out of the lower class.

3. Over the past few years, blacks have gotten less than they deserve.

4. It's really a matter of some people not trying hard enough; if blacks would only try harder they could be just as well off as whites.

These four items are designed such that an antiblack racist would tend to respond the same way as a non-racist principled conservative. Many researchers realize this conflation problem and make an effort to account for this conflation. For example, here is an excerpt from Rabinowitz, Sears, Sidanius, and Krosnick 2010, explaining how responses to symbolic racism items might be influenced in part by non-racial values:

Adherence to traditional values—without concomitant racial prejudice—could drive Whites' responses to SR [symbolic racism] measures and their opinions on racial policy issues. For example, Whites' devotion to true equality may lead them to oppose what they might view as inherently inequitable policies, such as affirmative action, because it provides advantages for some social groups and not others. Similarly affirmative action may be perceived to violate the traditional principle of judging people on their merits, not their skin color. Consequently, opposition to such policies may result from their perceived violation of widely and closely held principles rather than racism.

However, this nuance is sometimes lost. Here is an excerpt from the Pasek, Krosnick, and Tompson 2012 manuscript that was discussed by the Associated Press shortly before the 2012 presidential election:

Explicit racial attitudes were gauged using questions designed to measure "Symbolic Racism" (Henry & Sears, 2002).

...

The proportion of Americans expressing explicit anti-Black attitudes held steady between 47.6% in 2008 and 47.3% in 2010, and increased slightly and significantly to 50.9% in 2012.

---

See here and here for a discussion of the Pasek et al. 2012 manuscript.

Tagged with: , , , , ,

Ahlquist, Mayer, and Jackman (2013, p. 3) wrote:

List experiments are a commonly used social scientific tool for measuring the prevalence of illegal or undesirable attributes in a population. In the context of electoral fraud, list experiments have been successfully used in locations as diverse as Lebanon, Russia and Nicaragua. They present our best tool for detecting fraudulent voting in the United States.*

I'm not sure that list experiments are the best tool for detecting fraudulent voting in the United States. But, first, let's introduce the list experiment.

The list experiment goes back at least to Judith Droitcour Miller's 1984 dissertation, but she called the procedure the item count method (see page 188 of this 1991 book). Ahlquist, Mayer, and Jackman (2013) reported results from list experiments that split a sample into two groups: members of the first group received a list of 4 items and were instructed to indicate how many of the 4 items applied to themselves; members of the second group received a list of 5 items -- the same 4 items that the first group received, plus an additional item -- and were instructed to indicate how many of the 5 items applied to themselves. The difference in the mean number of items selected by the groups was then used to estimate the percent of the sample and -- for weighted data -- the percent of the population to which the fifth item applied.

Ahlquist, Mayer, and Jackman (2013) reported four list experiments from September 2013, with these statements as the fifth item:

  • "I cast a ballot under a name that was not my own."
  • "Political candidates or activists offered you money or a gift for your vote."
  • "I read or wrote a text (SMS) message while driving."
  • "I was abducted by extraterrestrials (aliens from another planet)."

Figure 4 of Ahlquist, Mayer, and Jackman (2013) displayed results from three of these list experiments:

amj2013f4

My presumption is that vote buying and voter impersonation are low frequency events in the United States: I'd probably guess somewhere between 0 and 1 percent, and closer to 0 percent than to 1 percent. If that's the case, then a list experiment with 3,000 respondents is not going to detect such low frequency events. 95 percent confidence intervals for weighted estimates in Figure 4 appear to span 20 percentage points or more: the weighted 95 percent confidence interval for vote buying appears to range from -7 percent to 17 percent. Moreover, notice how much estimates varied between the December 2012 and September 2013 waves of the list experiment: the point estimate for voter impersonation in December 2012 was 0 percent, and the point estimate for voter impersonation in September 2013 was -10 percent, a ten-point swing in point estimates.

So, back to the original point, list experiments are not the best tool for detecting vote fraud in the United States because vote fraud in the United States is a low frequency event that list experiments cannot detect without an improbably large sample size: the article indicates that at least 260,000 observations would be necessary to detect a 1% difference.

If that's the case, then what's the purpose of a list experiment to detect vote fraud with only 3,000 observations? Ahlquist, Mayer, and Jackman (2013, p. 31) wrote that:

From a policy perspective, our findings are broadly consistent with the claims made by opponents of stricter voter ID laws: voter impersonation was not a serious problem in the 2012 election.

The implication appears to be that vote fraud is a serious problem only if the fraud is common. But there's a lot of problems that are serious without being common.

So, if list experiments are not the best tool for detecting vote fraud in the United States, then what is a better way? I think that -- if the goal is detecting the presence of vote fraud and not estimating its prevalence -- then this is one of those instances in which journalism is better than social science.

---

* This post was based on the October 30, 2013, version of the Ahlquist, Mayer, and Jackman manuscript, which was located here. A more recent version is located here and has replaced the "best tool" claim about list experiments:

List experiments are a commonly used social scientific tool for measuring the prevalence of illegal or undesirable attributes in a population. In the context of electoral fraud, list experiments have been successfully used in locations as diverse as Lebanon, Russia, and Nicaragua. They present a powerful but unused tool for detecting fraudulent voting in the United States.

It seems that "unused" is applicable, but I'm not sure that a "powerful" tool for detecting vote fraud in the United States would produce 95 percent confidence intervals that span 20 percentage points.

P.S. The figure posted above has also been modified in the revised manuscript. I have a pdf of the October 30, 2013, version, in case you are interested in verifying the quotes and figure.

Tagged with: , ,

I came across an interesting site, Dynamic Ecology, and saw a post on self-archiving of journal articles.The post mentioned SHERPA/RoMEO, which lists archiving policies for many journals. The only journal covered by SHERPA/RoMEO that I have published in that permits self-archiving is PS: Political Science & Politics, so I am linking below to pdfs of PS articles that I have published.

---

This first article attempts to help graduate students who need seminar paper ideas. The article grew out of a graduate seminar in US voting behavior with David C. Barker. I noticed that several articles on the seminar reading list placed in top-tier journals but made an incremental theoretical contribution and used publicly-available data, which was something that I as a graduate student felt that I could realistically aspire to.

For instance, John R. Petrocik in 1996 provided evidence that candidates and parties "owned" certain issues, such as Democrats owning care for the poor and Republicans owning national defense. Danny Hayes extended that idea by using publicly-available ANES data to provide evidence that candidates and parties owned certain traits, such as Democrats being more compassionate and Republicans being more moral.

The original manuscript identified the Hayes article as a travel-type article in which the traveling is done by analogy. The final version of the manuscript lost the Hayes citation but had 19 other ideas for seminar papers. Ideas on the cutting room floor included replication and picking a fight with another researcher.

Of Publishable Quality: Ideas for Political Science Seminar Papers. 2011. PS: Political Science & Politics 44(3): 629-633.

  1. pdf version, copyright held by American Political Science Association

---

This next article grew out of reviews that I conducted for friends, colleagues, and journals. I noticed that I kept making the same or similar comments, so I produced a central repository for generalized forms of these comments in the hope that -- for example -- I do not review any more manuscripts that formally list hypotheses about the control variables.

Rookie Mistakes: Preemptive Comments on Graduate Student Empirical Research Manuscripts. 2013. PS: Political Science & Politics 46(1): 142-146.

  1. pdf version, copyright held by American Political Science Association

---

The next article grew out of friend and colleague Jonathan Reilly's dissertation. Jonathan noticed that studies of support for democracy had treated don't know responses as if the respondents had never been asked the question. So even though 73 percent of respondents in China expressed support for democracy, that figure was reported as 96 percent because don't know responses were removed from the analysis.

The manuscript initially did not include imputation of preferences for non-substantive responders, but a referee encouraged us to estimate missing preferences. My prior was that multiple imputation was "making stuff up," but research into missing data methods taught me that the alternative -- deletion of cases -- assumed that cases were missing at random, which did not appear to be true in our study: the percent of missing cases in a country correlated at -0.30 and -0.43 with the country's Polity IV democratic rating, which meant that respondents were more likely to issue a non-substantive response in countries where political and social liberties are more restricted.

Don’t Know Much about Democracy: Reporting Survey Data with Non-Substantive Responses. 2012. PS: Political Science & Politics 45(3): 462-467. Second author, with Jonathan Reilly.

  1. pdf version, copyright held by American Political Science Association
Tagged with: , , , ,

The American National Elections Studies (ANES) has measured abortion attitudes since 1980 with an item that dramatically inflates the percentage of pro-choice absolutists:

There has been some discussion about abortion during
recent years. Which one of the opinions on this page best agrees with your view? You can just tell me the number of the opinion you choose.
1. By law, abortion should never be permitted.
2. The law should permit abortion only in case of rape, incest, or when the woman's life is in danger.
3. The law should permit abortion for reasons other than rape, incest, or danger to the woman's life, but only after the need for the abortion has been clearly established.
4. By law, a woman should always be able to obtain an abortion as a matter of personal choice.
5. Other {SPECIFY}

In a book chapter of Improving Public Opinion Surveys: Interdisciplinary Innovation and the American National Election Studies, Heather Marie Rice and I discussed this measure and results from a new abortion attitudes measure piloted in 2006 and included on the 2008 ANES Time Series Study. The 2006 and 2008 studies did not ask any respondents both abortion attitudes measures, but the 2012 study did. This post presents data from the 2012 study describing how persons selecting an absolute abortion policy option responded when asked about policies for specific abortion conditions.

---

Based on the five-part item above, and removing from the analysis the five persons who provided an Other response, 44 percent of the population agreed that "[b]y law, a woman should always be able to obtain an abortion as a matter of personal choice." The figure below indicates how these pro-choice absolutists later responded to items about specific abortion conditions.

Red bars indicate the percentage of persons who agreed on the 2012 pre-election survey that "[b]y law, a woman should always be able to obtain an abortion as a matter of personal choice" but reported opposition to abortion for the corresponding condition in the 2012 post-election survey.

2012abortionANESprochoice4

Sixty-six percent of these pro-choice absolutists on the 2012 pre-election survey later reported opposition to abortion if the reason for the abortion is that the child will not be the sex that the pregnant woman wanted. Eighteen percent of these pro-choice absolutists later reported neither favoring nor opposing abortion for that reason, and 16 percent later reported favoring abortion for that reason. Remember that this 16 percent favoring abortion for reasons of fetal sex selection is 16 percent of the pro-choice absolutist subsample.

In the overall US population, only 8 percent favor abortion for fetal sex selection; this 8 percent is a more accurate estimate of the percent of pro-choice absolutists in the population than the 44 percent estimate from the five-part item.

---

Based on the five-part item above, and removing from the analysis the five persons who provided an Other response, 12 percent of the population thinks that "[b]y law, abortion should never be permitted." The figure below indicates how these pro-life absolutists later  responded to items about specific abortion conditions.

Green bars indicate the percentage of persons who agreed on the 2012 pre-election survey that "[b]y law, abortion should never be permitted" but reported support for abortion for the corresponding condition in the 2012 post-election survey.

2012abortionANESprolife4

Twenty-nine percent of these pro-life absolutists on the 2012 pre-election survey later reported support for abortion if the reason for the abortion is that the woman might die from the pregnancy. Twenty-nine percent of these pro-choice absolutists later reported neither favoring nor opposing abortion for that reason, and 42 percent later reported opposing abortion for that reason. Remember that this 42 percent opposing abortion for reasons of protecting the pregnant woman's life is 42 percent of the pro-life absolutist subsample.

In the overall US population, only 11 percent oppose abortion if the woman might die from the pregnancy; this 11 percent is a more accurate estimate of the percent of pro-life absolutists in the US population than the 12 percent estimate from the five-part item.

---

There is a negligible difference in measured pro-life absolutism between the two methods, but the five-part item inflated pro-choice absolutism by a factor of 5. Our book chapter suggested that this inflated pro-choice absolutism might result because the typical person considers abortion in terms of the hard cases, especially since the five-part item mentions only the hard cases of rape, incest, and danger to the pregnant woman's life.

---

Notes

1. The percent of absolutists is slightly smaller if absolutism is measured as supporting or opposing abortion in each listed condition.

2. The percent of pro-life absolutists is likely overestimated in the "fatal" abortion condition item because the item asks about abortion if "staying pregnant could cause the woman to die"; presumably, there would be less opposition to abortion if the item stated with certainty that staying pregnant would cause the woman to die.

3. Data presented above are for persons who answered the five-part abortion item on the 2012 ANES pre-election survey and answered at least one abortion condition item on the 2012 ANES post-election survey. Don't know and refusal responses were listwise deleted for each cross-tabulation. Data were weighted with the Stata command svyset [pweight=weight_full], strata(strata_full); weighted cross-tabulations were calculated with the command svy: tabulate X Y if Y==Z, where X is the abortion condition item, Y is the five-part abortion item, and Z is one of the absolute policy options on the five-part item.

4. Here is the text for each abortion condition item that appeared on the 2012 ANES Time Series post-election survey:

>[First,/Next,] do you favor, oppose, or neither favor nor oppose abortion being legal if:
* staying pregnant could cause the woman to die
* the pregnancy was caused by the woman being raped
* the fetus will be born with a serious birth defect
* the pregnancy was caused by the woman having sex with a blood relative
* staying pregnant would hurt the woman's health but is very unlikely to cause her to die
* having the child would be extremely difficult for the woman financially
* the child will not be the sex the woman wants it to be

There was also a general item on the post-election survey:

Next, do you favor, oppose, or neither favor nor oppose abortion being legal if the woman chooses to have one?

5. Follow-up items to the post-election survey abortion items asked respondents to indicate intensity of preference, such as favor a great deal, favor moderately, or favor a little. These follow-up items were not included in the above analysis.

6. There were more than 5000 respondents for the pre-election and post-election surveys.

Tagged with: , ,

From a New York Times article by Harvey Araton:

On a scale of 1 to 10, Andy Pettitte’s level of certitude seemed to be a 5. Halfway convinced he couldn’t grind out another year with the Yankees in New York, he opted for an unforced retirement in Houston to watch his children play sports and begin to figure out what to do with the rest of his life.

Perhaps the use of 1-to-10 scales should be retired, as well, because of the common misconception that 5 is halfway between 1 and 10. If you don't believe me, take a look:

This misconception is not restricted to sportswriters, as I reported in this article describing a review of thousands of interviews that the World Values Survey conducted around the world.

Among the data reported, respondents were asked whether they think that divorce can never be justified (1), always be justifiable (10), or something in between. Seventeen percent of the 61,070 respondents for which a response was available selected 5 on the scale, but only eight percent selected 6 on the scale. The figure below shows that 5 was more popular than 6 even in countries whose populations leaned toward the 10 end of the scale.

It seems, then, that 5 serves as the ‘‘psychological mid-point’’ (see Rose, Munro, & Mishler 2004) of the 1-to-10 scale, which means that some respondents signal their neutrality by selecting a value closer to left end of the scale. This is not good.

Source: Harvey Araton. 2011. Saying It's Time, but Sounding Less Certain. NY Times.

Tagged with: , ,