Seatbelts for Surveillance
Advancing evidentiary standards is like buckling up, and it's the law
With thanks to Moritz Schramm and Jürgen Bering for questions and insights; mistakes are mine alone.
When you are thrown violently (God forbid) from your vehicle, you want to have been wearing a seatbelt. When you find yourself flying off your bike, you’re happy to have on a helmet. Widespread laws mandating these simple, mechanical safeties are life-saving, evidence-based, and a few of my favorite things. How do we design the conceptual equivalent of seatbelts and helmets for mass screenings for low-prevalence problems (mass surveillance, broadly construed) — programs that target entire populations with imperfect tests in order to prevent rare but serious troubles? Why, again, are these programs problematic?
The Probability Theory Problem
Myth: Once we have better tech, accuracy will tip the scales and there will be no possible net harm from such programs.
When it’s acknowledged that we do indeed have to worry about such harm, we usually hear talk of striking a balance. Talk in which people assume that better tech will, one day, if it hasn’t already, tip the scales…
Reality: Statistics says no.
That is, the implications of probability theory challenge this typical techno-solutionist view. According to the laws of mathematics, the accuracy-error dilemma is inescapable. For a fuller recap with a current empirical demonstration of the problem, read this. (Regular readers may wish to skip to the next section.)
In brief, the accuracy-error dilemma is the empirical trade-off in any signal detection problem between sensitivity and specificity, or true and false negatives and positives — empirical outcome spreads that do not map directly onto values like security and liberty. A signal detection problem is the most abstract level of any type of problem where we are trying to figure out whether a signal is there in some noise, or not.
This inescapable trade-off is why we need to worry about massive societal damages to exactly the values we are seeking to protect through mass surveillance programs — an end that usually ends up at safety when you ask “why” enough times. The ironic danger in excessive safety-seeking presents most starkly under common conditions of rarity (when the problem at issue is low-prevalence), persistent uncertainty (when we can’t know for sure whether the test results are right), and secondary screening harms (when we do harm trying to figure it out). Rarity implicates the common cognitive bias known as the base rate fallacy; uncertainty raises the scientific problem of test validation under real-world conditions (which often we cannot solve); and the issue of secondary screening harms raises the problem of measuring benefits and harms.
Under these conditions, such screenings offer what Stephen Fienberg writing for the National Academy of Sciences on polygraphs called “an unacceptable choice” between bad options. Talk of striking a balance misses this point: many mass screenings for low-prevalence problems will never work as intended, because the problems they target are too rare, our uncertainty about whether they exist in a given case or not is too persistent in the real world, and secondary screenings to disambiguate true from false alarms cause too much harm for mass use to accrue net benefit. Thus, it is the class of similarly structured signal detection problems in abstract terms which needs to be regulated in order to protect society from its inherent dangers. What are we talking about?
Examples of such programs range from criminal screenings like Chat Control (for child sexual abuse material) or iBorderCtrl (for fraudulent border-crossing) in the security realm, to mammography and PSA testing (for early cancer detection) in the medical realm. Mass screenings for plagiarism in education also share an identical mathematical structure, as do mass screenings for disinformation in social media posting. I recently reviewed an excellent lay book surveying a wide array of medical case studies, and laying out the perverse structural incentives that contribute to regimes of ever more mass surveillance. The medical realm is not sui generi; observers will likely recognize similar incentive structure problems from their own areas of expertise.
So the generic problem is a dual one of education and incentives. It presents across diverse domains in relation to programs that share the same structure (mass surveillance). Existing legal regimes don’t address this problem, but maybe they could. Here’s one possible way.
The Proportionality/Suitability Solution
Under EU law, as a general principle and as referenced in Article 52 of the EU Charter of Fundamental Rights, authorities must balance their exercise of power against the intended aim. This is called proportionality. Suitability, a key stage in the usual proportionality test, requires measures to be effective in achieving their aim, potentially demanding evidence to support claimed effects. Suitability is legalese for “does this work?”
Focusing on Chat Control as an empirical demonstration, critics have already suggested the proposed program violates Article 7 of the Charter of Fundamental Rights of the EU (privacy), invoking the principle of proportionality. Two legal opinions highlight this:
• This legal opinion by Christopher Vajda KC discusses mass surveillance’s proportionality violation (Article 52, p. 18). Chat Control would violate proportionality by compromising essentially every European’s privacy.
• This legal opinion by Prof. Dr. Ninon Colneric notes the absence of a proportionality test so far in the Chat Control process (p. 18).
These opinions, while contributing important arguments to current legal discourse on this controversial program, neither address suitability, nor connect proportionality with probability theory. These two steps remain to be done in legal reform work bridging statistics and systems solutions, if we hope to design seatbelts for mass surveillance. It may be strategically wise to move the focus of the critical discourse from the privacy (Article 7) to suitability (Article 52) arguments, lest mass surveillance proponents rhetorically own the very terrain of security which they empirically degrade. The possible implications of such ownership are more than rhetorical.
Everyone cares about what works. Legal safeguards against mass surveillance which are predicated on privacy may be more prone to street-level disregard than those predicated on security, given that there is always a space between legal discourse and real-world practices. Given an apparent forced choice, security agencies may understandably be likely to do what they think works, not what is legal.
Focusing again on the Chat Control case, the argument is that proponents bear the burden of proof to show for what benefit Europeans would be trading their privacy (and other unknown costs). If such benefit-for-cost can’t be demonstrated — e.g., because the accuracy-error trade-off implies that mass surveillance would cause huge numbers of false positives to swamp existing investigative resources, implying likely net loss — that’s a suitability violation. And if the suitability violation characterizes the whole class of programs of mass surveillance broadly construed, that’s a proportionality problem with wide-reaching implications.
The main issue here is EU law, because the German Constitutional Court primarily applies the EU Charter when EU law harmonizes member state law (as is the case for the Chat Control). But German Constitutional violation is also implied. The reason is that, in addition to being a general principle of EU law, proportionality is also “the foundational principle of German law,” flowing from Prussian administrative law to seminal early post-war Constitutional Court rulings (“Proportionality Analysis by the German Federal Constitutional Court,” Andrej Lang, p. 22-133, in Mordechai Kremnitzer, Talya Steiner, and Andrej Lang’s 2020 Proportionality in Action (Comparative and Empirical Perspectives on the Judicial Practice), Cambridge University Press). Despite its conspicuous absence from the letter of the basic law of the land, the Grundgesetz, proportionality analysis is “the primary mode of constitutional adjudication” (p. 24, emphasis mine). (The Constitutional Court’s power of judicial review, however, is expressly authorized in the Constitution; p. 29.)
So there’s broad European legal agreement that courts have to balance laws’ costs and benefits, and assess their proportionality accordingly; and this means they have to assess whether laws actually work as a logical first step. At first, I thought this principle distinguished the Prussian-Continental legal tradition from its Anglo-American analogue. The Prussian insistence on the world being a logically rule-abiding place implies that net damaging security under the auspices of making people safer would violate the foundational principle of German and EU law. While that logic is consistent with a culturally Prussian worldview, it also resonates with the Lockean liberal premise that legitimate government is entrusted in the social contract with limited powers to uphold specific rights (life, liberty, and property).
In this way, these two legal traditions share a fundamental balancing principle which logically assumes that, if the government is taking some power (infringing some rights in the process), it needs to be for a specific, legitimate purpose for which it has democratic permission. In both traditions, this central balancing principle, putting slightly different spins on an underlying cost-benefits analysis, assumes that the law actually works for its intended purpose. We might call that an aspirational assumption that the public can ask governments to better uphold through improved mechanisms.
Yes, but How?
European law is thick on proportionality as a bedrock principle, but relatively thin on how, exactly, one is supposed to assess whether or not a law works for its intended purpose. On one hand, widely accepted scientific evidentiary rules offer well-established mechanisms for evaluating the likely impact of proposed interventions before implementation:
Proponents bear the burden of proof to establish that new interventions will do more good than harm.
Independent reviewers evaluate the evidence of claimed costs and benefits.
Relevant data must be public, including information about its production, storage, analysis, and interpretation.
Applying these rules to policymaking may prevent massive societal damages from ill-conceived policies. In the absence of such assessment, new interventions may backfire, causing more harm than they mitigate.
On the other hand, policymaking in reality generally does not apply these rules. And arguing that it should do so raises a familiar problem: Bias, error, and structural incentives pervade the realm of scientific expertise just as they do other social ecological niches.
In particular, science has a non-neutrality problem. So who’s to say that more reliance on expertise to counter ignorance and perverse incentives with knowledge and better evidentiary standards would, on balance, do anything but privilege industries that can pay for biased experts to spin science in their interests? In fact, couldn’t a higher evidentiary bar for regulation backfire by preventing democratically elected policymakers from countering corruption unless they could find their own, morally and scientifically better experts to fight back? Isn’t this specter of an arms race of hired guns a familiar neoliberal nightmare? You know, depoliticize the political, pretend the important choices are entirely fact-based and value-neutral, and generally keep the public from clobbering corruption by increasing reliance on expertise, when in reality it’s easy for powerful social and political networks to game a system like this — and we should know better, because biased, error-prone, corruptible experts are at least a big part of the problem in the first place? Doesn’t this sort of proposal just add another layer of delegated trust in a complex system rife with that trust and its abuse?
Better, Not Perfect
Yes, but do you have a better idea?
Admittedly, we are trapped in this imperfect, complex psychosocial ecosystem with limited cognitive and structural capacities to move the levers of power in our own lives, much less in the world writ large. And advancing evidentiary standards to hold states accountable for the cost-benefits analyses underpinning rights infringements cannot escape this larger reality.
That said, there’s an encouraging recent history of scientific reforms ameliorating longstanding legal and political problems of bias, error, and perverse structural incentives. While this type of reform is not the only way forward and has its own inescapable limitations, sometimes it works to make the world a better place.
Take, for example, this short list of recent reform efforts to make expertise more rigorous in ways that matter for people’s lives by advancing scientific evidentiary standards:
• “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics,” Sander Greenland, Wake Forest Law Review, Vol. 39, 2004, No. 2.
• “Scientists rise up against statistical significance,” Valentin Amrhein, Sander Greenland, Blake McShane, Nature, 2019.
• “Strengthening Forensic Science in the United States: A Path Forward,” Committee on Identifying the Needs of the Forensic Sciences Community, National Research Council, 2009. It’s worth situating this report in the context of reviewer Stephen Fienberg’s concurrent 11-year service on the National Academy of Sciences Report Review Committee, which the Innocence Project celebrated as part of Fienberg’s far-reaching, long-standing efforts to advance scientific evidentiary standards in criminal justice contexts.
These and other such efforts have improved countless lives by evolving the legal and scientific discourses about the evidence on which life-changing decisions are often based. Why not try to similarly improve on EU and German proportionality analysis by advancing the scientific evidentiary standards on which it’s predicated? It might succeed in at least nominally checking some abuses of power sometimes. Making things better imperfectly matters to the people who would otherwise be harmed by those abuses, and is all we can do.
The argument against PSA screening rests on the assumption that people will make wrong decisions following a +ve PSA test, and ignores the value of having better health information, even if you don't change treatment decisions. I had a go at this topic a while back https://crookedtimber.org/2012/03/23/danglyparts-and-decision-theory/