Did Stephen Fienberg and the National Academy of Sciences Get It Wrong?
Polygraphs, causality, and overconfidence
What if mass screenings for low-prevalence problems often fail because of human stupidity — and yet, the same programs also sometimes succeed for the same reason?
Statisticians and cognitive scientists tend to criticize programs of this structure as illustrating base rate bias. Such programs may pose threats to the very populations and goals they intend to protect and serve, when inferential uncertainty persists and secondary screening harms including resource misallocation make them backfire.
Yet, even the best scientists are human. And all of us human beings—even brilliant, highly educated ones with no known, relevant priors — are vulnerable to the common distortions of our faulty, shared cognitive-emotional software, including a tendency toward overconfidence when uncertainty reigns. This isn’t a condemnation; it’s an observation about how all of us, no matter how careful, see the world through the lens of our own perspective. There is no alternative. Thus, bias pervades bias research, science, and research integrity research itself.
Take the late Stephen Fienberg — eminent statistician, evidence-based forensics reformer, and, eventually, one of “lie detection” programs’ worst enemies. The mathematical basis for his studied opposition was irreproachable. Its sociopolitical implications remain profound. But what if he also missed something important?
What Fienberg got right
I have written many times (e.g., 1, 2, 3, 4, 5, 6) about Fienberg’s work co-chairing the National Academy of Sciences’ polygraph report, in which he applied Bayes’ rule to produce Table S-1A and B.
Bayes is a theorem that says the probability of an event changes depending on the subgroup. It’s a maxim of the universal laws of probability theory. It implies that when a condition is rare — like spying among National Lab scientists — even a highly accurate test will produce overwhelmingly false positives.
This table illustrates why many mass screenings for low-prevalence problems are doomed to backfire, harming the very populations and ends they are intended to protect and serve. The problem isn’t just false positives. It’s that these programs, which share the same mathematical structure, force an unacceptable choice between:
too many false positives in one universe, or
too many false negatives in another.
This irreducible tension stems from the accuracy-error trade-off, which in turn comes from the probabilistic nature of most cues in our universe. Across civilizations, humanity has grappled with the desire to overcome our limits — failing, with good intent, to recognize that we can’t. This is hubris (pride), followed always in the structure of Greek tragedies by nemesis (the fall). There is a warning here for those who will heed it: we cannot now, and never will, escape math to beat evil.
When I met Fienberg in 2009 to interview him for the polygraph research that was to become my dissertation, he didn’t get that dramatic. But he did spend a lot of his time for years after co-chairing the polygraph report publicly contesting next-generation “lie detection” programs that he feared would hurt civil liberties without working to catch terrorists, because (like polygraphs) they couldn’t be scientifically validated. My research focuses on extending his work to other programs that share the same mathematical structure.
What Fienberg got wrong
The NAS polygraph report he co-chaired concluded:
Polygraph testing yields an unacceptable choice for DOE employee security screening between too many loyal employees falsely judged deceptive and too many major security threats left undetected. Its accuracy in distinguishing actual or potential security violators from innocent test takers is insufficient to justify reliance on its use in employee security screening in federal agencies (NAS 2003).
Like many prominent scientists, Fienberg fell into the determinism trap. Specifically, he may have underestimated general human stupidity on one hand — while overestimating his own powers of reasoning on the other. But how? And why?
Overconfidence bias
Some researchers study human stupidity. The best-known work in this tradition comes from Daniel Kahneman and Amos Tversky, who demonstrated that humans systematically make biased, illogical decisions — even when they think they’re being rational. Kahneman, Tversky, Thaler, and a whole host of others, especially in social psychology and behavioral economics, oppose the homo economicus model of human decision-making on the basis of our demonstrated irrationality/stupidity.
Other researchers, including Gerd Gigerenzer and Ralph Hertwig, have pushed back against this view. They argue that scientists themselves often overstate human stupidity — because scientists are ourselves stupid humans. In this view, people can make surprisingly good decisions—but only if they’re given the right information or decide under the right conditions. (Perhaps ironically because his application of Bayes’ rule was meant as a corrective to the base rate bias — squarely on team Kahneman, Fienberg’s seminal table’s frequency-format presentation was a nod to Gigerenzer and Hoffrage’s “How to Improve Bayesian Reasoning Without Instruction: Frequency Formats” (Psychological Review, 102(4) 1995, 684-704).)
Fienberg, like many scientists, relied on mathematical reasoning to draw his conclusions. But, in so doing, he failed to account for irrational human behavior — and what it may mean for polygraph program efficacy…
What if some people are stupid enough to act like polygraphs work, and that’s why polygraph programs actually work? For instance, if recruits believe in polygraphs, they may be deterred from applying and/or confess under polygraph interrogation if they have a history of serious criminal conduct. Weeding out such candidates may increase workforce integrity and thus performance quality.
This type of bogus pipeline effect could explain why my dissertation found police departments that adopted polygraph programs to screen recruits saw subsequent decreases in police brutality (more on this in a future post). This analysis had flaws and limitations, though. The bottom line is that we do not know based on available evidence whether these programs net benefit or net harm the security they are intended to advance. But my results here highlight the possibility that mass polygraph screening programs might work to advance security, and not just as an intelligence whitewashing mechanism (a “noble lie” for institutional purposes) — but by causally improving the aggregate integrity of the workforce.
In retrospect, this is exactly what some interviewees — both on and off the record — were trying to tell me. (Sorry, Mark. Sorry, Fred.)
It’s funny to imagine intelligence community experts trying to tell Fienberg the same thing, while he kept trying to explain the implications of Bayes’ rule instead of listening more closely. Was he enacting the trope of the out-of-touch intellectual going on about his niche interest without understanding or caring how the real world works, a Bayesian statistician locked in his overly orderly mathematical world?
Maybe. We don’t know who’s right in this case of the classic conflict between practitioners and scientists. Polygraphs as mass screenings may net harm security by wasting investigative resources (in the universe with too many false positives) and/or relying on junk science (in the universe with too many false negatives) — Fienberg’s argument. Or they might net benefit security by weeding out stupid criminals. We don’t have enough information to know who’s right.
Fienberg’s mistake was not arguing the wrong side of the case. It was not seeing that, given the currently available evidence, we can’t know who’s right. Ironically, even the most brilliant scientists can be too sure of things that remain uncertain.
Why did he miss the overriding epistemic uncertainty?
Progress in progress
The big answer is that bias pervades human undertakings, no matter who the human or what the endeavor. Science itself is subject to bias. It’s imperfect, incomplete, and (when we’re lucky) getting better in the process of recognizing and correcting its mistakes.
The nitty-grittier explanation is that the causal revolution was just underway at the polygraph report’s writing. Judea Pearl’s groundbreaking textbook Causality: Models, Reasoning, and Inference saw its first edition published in 2000, three years before the NAS polygraph committee published its report. Scientific progress was (and remains) in progress.
So maybe Fienberg didn’t know to diagram the causal mechanisms people were talking about in relation to polygraphs. But we talked to some of the same people, read some of the same sources. He must have known it wasn’t just lie detection that needed validation — a potentially impossible feat, as he kept telling anyone who would listen.
It was also the interrogation component of the “test.” If we wanted to start all over again doing scientific polygraph research, it would need to start with a causal diagram (DAG) including these two, distinct mechanisms. One could easily imagine a scientific experiment validating the bogus pipeline mechanism, in contrast with the lie detection one. (Indeed, many such experiments exist.)
Missed opportunities in my own research
My biggest regret about my PhD dissertation as defended is that it entirely omitted the transcripts and recordings of a collection of videorecorded interviews that formed its chronological basis, the best of which was with Fienberg.
In retrospect, I should have:
Based my dissertation on his work.
Archived all transcripts and videos properly. (I still should; the videos are on Youtube, which I’m told is a bad place to archive videos. Suggestions welcome.)
Pushed back against current scientific bias against qualitative research. At the time, qualitative work was discouraged in my department/discipline. But now I understand even better how good science starts with field knowledge.
Then, perhaps I could have been quicker to both extend and critique Fienberg’s analysis.
Conclusion
Polygraphs are pseudoscience. But mass polygraph screening programs could still work for their intended purpose. Once we articulate a plausible causal mechanism, the bogus pipeline, and follow it out to an effect of interest, better workforce conduct, there are scientific ways to try to assess whether these programs net help or harm. Because people are stupid. But we can sometimes measure the effects of our own stupidity by following mathematical rules.
Dear Vera, thank you for writing this insightful piece. I am still confused about why you think Fienberg was wrong, and why you think that the polygraph can still be tested using mass screening programs. Was Fienberg wrong because he tried to provide a data analysis of something that shouldn't have been analyzed? And aren't screening tests just going to show that, since we don't know when someone is lying, we cannot tell whether the polygraph's signals indicate whether they're lying? And furthermore, that there seems to be contextual bias, such that polygraph examiners replicate the task-irrelevant information given to them during their analysis? Thanks!