Normally I post longer posts after thinking about them for a while, but today I just want to make three quick corrections and say more about them later.
Forget the AI Act; mass screenings for low-prevalence problems are bigger than that.
Lately I’ve been thinking more about how some of my PhD research on bias and error in decision-making technologies extends. The implications of probability theory are the same across security screenings that share the same structure (low-prevalence problems, mass not targeted screenings, inferential uncertainty) — like polygraphs and AI screening of chats. That is, these programs are doomed to fail.
Let’s moor this discussion about the meaning of math for policy in the context of the AI Act, I thought. That seemed like a good way to regulate the structure. I thought I was being clever; that should always ring a little warning bell.
Regulating the structure to stop the constant Whack-A-Mole with these dangerous programs is a good idea. But mooring the move to create that legal and ethical regime to a single piece of legislation is a bad idea. Proposals come and go, but math is forever.
This legislative piggybacking would also be empirically misguided. Mass screenings for low-prevalence problems can certainly include AI. For example, they include real-time biometrics and predictive policing in public spaces, programs the EU Parliament draft of the AI Act proposes banning.
But society also has an interest in thinking more broadly about mass screenings that don’t involve AI — like mammography for breast cancer, PSA testing for prostate cancer, and gestational diabetes screening. The conditions (notably inferential uncertainty) vary across realms like security and medicine in ways that are worth thinking about to guide sensible policy. But mass screenings for low-prevalence problems are still potentially dangerous and we should regulate the structure.
Mooring to first principles first is anyway a good idea. The first principles in play here are little things like privacy and autonomy that form part of the human dignity without which society doesn’t have any meaningful security at all. I can let that fall off the text cliff when I’m busy arguing that the liberty-security trade we often hear discussed in these contexts is empirically bogus. It is, but I still shouldn’t do that. Critics of dangerously stupid programs don’t have to choose between being right empirically or being right ethically.
H/t Joanna Bryson for many rapid-fire insights.
Forget the risk calculator; it’s a decision map.
Science is a dumpster fire; but what isn’t? Corruption, complexity, and misinformation seem to be conditions of late modern social and political life. So people need better information and other tools for decision-making under conditions of uncertainty. But the information and tools scientists can offer them aren’t neutral. (Neutrality is a myth.) How do we get out of this bind?
There are two ways to give people better information without (so much) risking playing the telephone game with GIGO (Garbage In, Garbage Out). The deep path is through doing better science. The broad path is through communicating what we know about heuristics. Where they meet, they can make each other better through iterative learning.
H/t Patrick Burden for coming up with this “map for problem-solving” name for what I was describing without seeing this important distinction, and other trenchant technical and conceptual thoughts. Walking people down pathways to solve problems with their own information is imminently more doable than somehow levitating over the science crisis with ongoing collaborative work in a limited number of cases; although doing both would be cool. So that’s the idea.
It’s not table S-1; it’s Table S-1A and S-B.
It’s not false positives that form the core of what the base rate fallacy means for mass screenings for rare problems; it’s the unacceptable choice between too many false positives in one possible policy universe, and too many false negatives in another.
I don’t know what I was thinking repeatedly (over several years) republishing and extending only half of the NAS polygraph report’s Table S-1 — committee co-chair and public statistician Stephen Fienberg’s application of Bayes’s Rule to show why polygraphing all National Lab scientists would be dumb and dangerous. You really have to see the whole thing. And we have to talk about not just how programs like this are doomed to fail by producing excessively large numbers of false positives, but also how they’re doomed to fail by alternately producing too many false negatives. It’s the irreducible tension here that matters.
Maybe I conveyed that already talking about the accuracy-error trade-off; maybe not. But I’m still wondering why (if there was a reason) I thought half the table cut it. For that matter, why do all the confusion matrix/false positive rate calculation how-to’s I turn up today seem to only offer one such matrix, and not a contrastive set of two like this? This seems like it should be the gold standard for showing the tension at the core, the policy implications of Bayes for many mass screenings — and the tragedy at the heart of all this.
On one hand, that tragedy is worth highlighting. It’s a recurrent structure — perhaps the recurrent structure in world literature across civilizations. That’s because it’s a perennial feature of probability theory and thus of life. From ancient Greek tragedies to the Bible, it’s often said that pride comes before the fall. In classics like Homer’s Iliad and Odyssey, the structure is arete, hubris, ate, nemesis – virtue, arrogance, fatal mistake, divine punishment. It’s also the structure of contemporary efforts to conduct mass screenings using highly accurate technologies with the intention of mitigating rare but serious threats. The virtue takes the form of aiming to fight baddies (spies, terrorists, pedophiles, cancer). The arrogance comes in not seeing that the literal laws of the universe (in the form of probability theory) keep us from being able to do this perfectly, even with shiny new tech. The fatal mistake is applying screening tools to whole populations instead of targeted subgroups. And the divine punishment comes from the properties of the screenings, which then produce excessively large numbers of false-positives along with non-negligible numbers of false negatives. There is no escaping math, but we can cause a lot of harm trying.
Denying the irresolvable nature of this tension reflects humanity grappling with the laws of the universe — and failing to reconcile itself with its limits, in a failure that reverberates across thousands of years. We deepen our humanity in reflecting on that failure. But we keep making the mistake. Like it’s hard-wired.
On the other hand, maybe the way in which this strikes me now as indescribably beautiful, simple, elegant, urgent, and universal — maybe this is a little much, and people don’t have time for all that. Maybe it’s better to show only the half of the table that’s probably “on the table” in security screening contexts, to keep it simple. We seem to be living in “suspicious mode” times. No one is proposing that we want to spend massive resources to catch only 20% of chatting pedophiles, in order to protect more innocents. And I would personally be persuaded by either half of this table that mass security screenings for low-prevalence problems are anyway a very bad idea (TM).
But you do want people to see the whole truth, to get a sense of the larger tension, which is about our innate limitations that technology can’t overcome. Only then can they see the nested dolls of the structure: from polygraphs and Chat Control, to mass security screenings for low-prevalence problems, to mass screenings with different degrees of inferential uncertainty — to the well-intentioned hubris of imagining across time that we can escape imperfection to beat evil. The stock writing advice is “show not tell.” So I think I need to add back the second component of the table to show how we are trapped.
H/t Stephen Fienberg, whose 2009 interview never gets old.