Structures, Agents, and Envisioning the Invisible

How do we get a grip on the unintended consequences of society's best-laid plans, when there's a lot we can't see?

Oct 02, 2023

Sometimes, what you can see with your own eyes is the evidence that you need to solve a problem. Like checking the mirror and finding the spinach in your teeth that was causing people to look at you funny. But sometimes, the evidence we see misleads us, because the most important observations are missing when we look where we can see.

Survivorship bias hurts inference when we only see the survivors of a process, and don’t ask who or what we don’t see. Wikipedia catalogs survivorship bias under the broader availability heuristic heading. The name change is telling: As a heuristic, availability — asking what you know about something from your own experience and those of your circle — can help people make good decisions. (See, e.g., “How Do People Judge Risks: Availability Heuristic, Affect Heuristic, or Both?” Thorson Pachur, Ralph Hertwig, and Florian Steinmann, Journal of Experimental Psychology: Applied, 2012, Vol. 18, No 3, 314-330: Public health campaigns often target common risks, so they might want to encourage people to think about their own experiences.) Availability just needs correcting for the things we don’t see when we use it, like rare events (“black swans”) that we may not have direction information about, or survivorship bias (a form of selection bias). Overconfidence based on first-hand knowledge of some cases (availability) may otherwise lead to incorrect inferences, as in the case of survivorship bias.

The original example of survivorship bias comes from Abraham Wald’s Columbia research on World War II planes (see, e.g., Chapter 7 in Richard McElreath’s inimitable Statistical Rethinking). The problem was, a lot of planes weren’t coming back. Wald figured out that looking at where the planes that did come back were damaged, and armoring them more there, wouldn’t solve the problem. Because to prevent future losses, we care most about the observations we don’t see: The planes that didn’t come back. Those were the ones that took more lethal hits — in more vital places. Those were the places that needed more armor to improve their survival odds.

This is a post about how survivorship bias affects the feedback that structures get on the conditions they set for agents’ choices. We might think of legal regimes, mass interventions, mass preventive interventions, and mass screenings as some non-exclusive general to specific types of such structures, and mass screenings for low-prevalence problems as a common subtype thereof. I take a peek at this big social sky from down inside a single rabbit hole, thinking about women’s healthcare practitioners’ perceptions of abortion risks.

First, I draw a confusion matrix of abortion and death — an intervention and a rare outcome we’d like to prevent. Then, I look at what’s going on and who sees it in each cell. Next, I ask if this case reveals anything about unknown unknowns in mass screenings for low-prevalence problems. I admit I don’t know if this the right set — the largest possible class of problem that shares the same mathematical structure — and examine why it may or may not be. This is all part of my attempt to figure out what this all looks like in the abstract — how broad the set can go, and what’s in the same neighborhood but different and why (aka what scope conditions matter) — by coming at it sideways, from something else that doesn’t seem so big. Mass screenings for low-prevalence problems are a common structure of program that affect entire populations; mass preventive interventions, interventions, and legal regimes are probably progressively more common structures conditioning choice. Abortion is a binary intervention under a legal regime (call it liberal or conservative) governing access; but abortion itself is rare in the West, and maternal death relating to it is still rarer. Finally, I gesture at the larger question implicated here, of how to visualize differences in both knowledge and perception across different kinds of interventions and different types of agents, a puzzle that is the topic of a future post.

Abortion-Death Outcomes

Call abortion and death binary. Some women who seek information about abortions have them, and some don’t. Some of each group live, and some die. There’s a debate in the literature about how long we should care about substantial possible early death risks among women who have had abortions, and whether substantial associations between abortions and early death (especially as the number of abortions a woman has had goes up) reflect causation (iatrogenic harm) or correlation (selection bias). Let’s say death within a year is what we care about. We could also debate whether we want to count women who thought about abortion, sought abortion care, or actually tried to abort/aborted. Let’s pick a middle ground for this thought experiment — say, women who took a step like getting information from a service provider about abortion.

Other details: “Service provider” is intentionally vague here to include illegal abortion procurement, like getting pills off the Internet instead of visiting a medical provider — which is possible in many places where abortion is legal or illegal. If it’s a medical action intentionally ending a pregnancy, it’s an abortion. Assuming all attempts work glosses over the fact that some attempts fail, or are reversed (big topics); this is just a model.

What’s Up and Who Sees It?

Square 1: Death yes, abortion yes.

This is the square abortion proponents tend to tell stories about; but they don’t tell all the stories, because selection bias affects who sees what. Proponents focus on maternal deaths from unsafe abortions — which mostly occur in low-resource settings, especially Sub-Saharan Africa — arguing that legal abortion advances women’s health by preventing such deaths. But there’s also a consistent, substantial association (typically about 2x or more) between abortion and suicide.

Leading abortion providers misrepresent abortion to women as health risk-mitigating, when we don’t know if the suicide link reflects causation or correlation. Discourse on abortion frequently labels discussion of such questions as misinformation. But that is itself a political form of misinformation — a dubious game in which “both sides” engage, as they typically do in hyperpolarized discourses. I also wrote about this recently in the context of Covid misinformation discourse. Labeling material that diverges from consensus narratives as misinformation, and censoring it (and sometimes going farther) is a core political problem in the information age.

Back on survivorship bias, abortion proponents often talk about not wanting to see women dying from botched abortions, like women’s healthcare providers did when abortion was illegal. But women’s healthcare providers were likely to see those patients. They were not likely to see dying patients who had abortions and subsequently committed suicide. Those patients are dead already; and if dying takes them some time and medical practitioners observe it, those are likely to be medics and ER doctors and nurses who don’t know their histories. They are not likely to be women’s healthcare practitioners. They are even less likely to be be the women’s healthcare practitioners who performed those patients’ abortions, because this is a smaller set. This appears to be an unrecognized selection bias in the literature. E.g., a PubMed search for “abortion death selection bias” doesn’t return any relevant hits.

So we need to be wary of this widely unrecognized survivorship bias. The people who think they are the subject-area experts with the most important expertise here, don’t see the patients who don’t come back. In some cases, that may be because iatrogenic harm was an antecedent condition of those patients’ preventable deaths.

In order to have evidence-based abortion policy prioritizing women’s survival, we may need to estimate net maternal deaths under liberal versus conservative legal regimes — an analysis requiring questionable cross-country and historical comparisons. The more conceptually sound but still practically difficult starting-point for such a comparison would be to compare death counts in the year following abortion information-seeking among women who did and did not have abortions. This comparison would still be threatened by possible selection bias. For instance, women whose religious beliefs forbid abortion may also have beliefs forbidding suicide. But in the real world, it may be the best we can do to begin to get a better empirical grip on the effects of abortion on women’s survival. Because it would enable within-group comparisons of women who did and didn’t abort after considering it, and women who did and didn’t die in the year after. But sometimes, the real pay-dirt in grappling with causality comes from talking to people, and dead people don’t talk. Moreover, it’s possible that only cross-country and historical comparisons would be feasible extensions to compare these counts across different legal regimes — and these are all apples to oranges comparisons. It’s difficult to envision a defensible quasi-experiment (one that wouldn’t violate the SUTVA). Still, estimating such death count difference-in-differences might improve on existing medical and policy discourse that often assumes on the basis of insufficient evidence that legal abortion regimes net benefit women’s health.

Or at least we could just say we don’t know what we don’t know.

Square 2: Death yes, abortion no.

Childbirth is medically risky, and so likely reflects the main cause of death in women who don’t have abortions after getting info about them. However, once you survive it, childbirth is also substantially protective against maternal early death, including by suicide (a cross-gender effect well-documented since Durkheim). We don’t know to what extent that reflects selection bias or causal protective factors, but it’s probably a bit of both: Healthy, safe women may have more babies — we all want to be good parents. Plus, illness and stress can adversely affect fertility. But, women with infants may also get better social protection from potentially lethal dangers — we’re a cooperative species that shares provision, childcare, and other labor in the interests of our exceptionally biologically expensive and slow-maturing young. So this square includes the shadow of women who might have died but for having a baby, and that’s a counter-factual that doesn’t normally get counted in discussions of abortion’s net effect on women’s health. But those inverse deaths matter in public policy terms, and they get counted in Square 4.

The number in Square 2 is likely larger than the number in Square 1, because deaths from unsafe abortions and from post-abortion suicides are rare events within a category of rare events; abortion is rare in the Western world. So deaths after birth swamp deaths after abortion, but that doesn’t mean that abortion net saves women’s lives. It just means common things swamp rare ones. Beware the base rate fallacy (as I’ve written previously).

Compare the selection bias valence of Squares 1 and 2: Square 1 includes both deaths from abortion, which are presumably higher under conservative legal regimes, where unsafe abortions incur more risks — plus deaths from suicide post-abortion, which may be higher under liberal legal regimes. But Square 2 includes only deaths abortion might have prevented (e.g., deaths from pregnancy/birth complications).

We still don’t know which way these two squares would swing evidence-based policy prioritizing women’s net survival, because Square 1 includes both deaths from unsafe abortion (possibly higher under conservative legal regimes) and deaths from suicide post-abortion (possibly higher under liberal legal regimes). We just know the selection bias among women’s healthcare providers so far has the valence abortion-life (in part from missing relevant death observations due to survivorship bias) from Square 1, plus now again a selection bias consistent with an abortion-life/women’s health association for women’s healthcare providers from Square 2 (since pregnancy/birth are risky). So women’s healthcare providers see women who die in both squares, but not women who die from suicide following legal abortion. This set of observations cements the association between abortion and net positive women’s health effects — an association colored by selection bias.

Square 3: Death no, abortion yes.

Women’s healthcare providers see the cases where women have safe abortions and don’t die. Another notch for the abortion-women’s health association, and a much bigger group/more frequent observation than deaths from any cause (abortion, suicide, or pregnancy/birth). Common swamps rare — but seeing rare events like deaths in young women may cause women’s healthcare providers to give those observations particular weight.

Square 4: Death no, abortion no.

What about women who get info from a provider about abortions, but don’t have one? Do their regular care providers know? Probably it’s a mix, but women might seek this information from different providers — keeping their normal doctors in the dark. That would create a population of women who went on to (mostly) have babies instead of abortions, and a population of women’s healthcare providers who didn’t know their patients had considered abortion.

Common swamps rare. Death is rarer than survival in reproductive-age women. Pregnancy/birth continuation is more frequent than abortion. Getting information about abortion may be more common than obtaining it. So a lot hinges on how we conceptualize the subgroup of interest here, and then getting data on it would be hard. Just asking people may not be a good way to get such sensitive information as whether pregnant women considered abortion.

The important thing here for my purposes is survivorship bias. If most women who think about abortion have babies instead, and most of their regular women’s healthcare providers don’t know that, then there’s a bias for women’s healthcare providers observing cases where liberal abortion access improves women’s survival, not observing cases where it may hurt them; and for observing cases where abortion happens and goes fine, but not cases where it doesn’t happen and that goes fine, too. This is a lot of different selection biases, all skewing toward the consensus narrative that abortion access net benefits women’s health. But that view is based on a distorted set of observations. In Square 4, restricted abortion access wouldn’t seem to hurt these women (possibly a large number). It may even increase their survival odds (as discussed above) — with those inverse deaths going uncounted. Not observing counter-factual survivors who might have died under more liberal hypothetical regimes thus represents another form of survivorship bias.

Bias Valence Recap

Two types of events represented in this matrix reflect selection biases that, if corrected, would challenge the consensus narrative that liberal abortion regimes necessarily net benefit women’s health. The first is post-abortion suicides, who by definition don’t survive, but are likely mostly invisible to the healthcare providers who consider themselves the most relevant subject-area experts. The second is no-abortion patients who considered but rejected it and lived, a group that is also likely mostly invisible to the healthcare providers who consider themselves the most relevant subject-area experts on these patients.

Like or Unlike? What’s the Set and Why?

I think a lot about mass screenings for low-prevalence problems, a class of signal detection problems. Things like polygraphs (“lie detectors”), mass digital communications scanning (most recently proposed as Chat Control in the EU, the Online Safety Bill in the UK, and the EARN IT Bill in the U.S.), as well as some programs the EU Parliament’s AI Act draft recently proposed banning. My usual rant is that, under conditions of persistent inferential uncertainty, probability theory dooms this type of program to backfire and harm what the program seeks to protect. This is true across diverse contexts like health, security, and information accuracy. Mass screening programs for various low-prevalence cancers and scanning programs for misinformation on digital platforms share the same mathematical structure with mass screening programs for low-prevalence crimes. These are all bad programs that cause a lot of destruction. But the danger they pose to society is poorly understood due to common biases like the base rate fallacy; better statistics education could save a world of suffering. More on this to come.

But lately I have been wondering whether this is really what I am thinking about when I think about this, or whether it isn’t a still broader category of similar things. Mass preventive interventions for low-prevalence problems could be the broader set. Things like vaccinating or boosting healthy young people against Covid to prevent rare but serious complications. Aren’t screenings just a subset of preventive interventions? Why wouldn’t the larger set work the same way in terms of applying the laws of probability theory to how it affects society? The base rate fallacy sure turns up across these sets. And what about laws? Aren’t they just mostly preventive interventions on populations through choice structures? Probably this goes too far, but I have yet to figure out just how.

So this is the larger research context in which I am wondering about this abortion case study, whether it fits into this class of problems, and why or why not. On one hand, this is a question about how to formalize the structure, probably playing with some cases that don’t fit along the way. Throwing pasta at the conceptual wall to see what sticks. On the other hand, it’s a question about what we care about in social and political terms: What does it make sense for laws to do, what does it not make sense for laws to try to do, and why?

Wait, What Was the Question?

It may seem like a short leap from mass screenings to mass preventive interventions. But take the example of requiring Covid vaccinations or boosters in health young people: Shots aren’t binary tests, so modifying the classic confusion matrix (below) to pertain to them is a non-obvious task.

Would a table estimating hypothetical intervention outcomes morph from showing true and false positives and negatives, to — what exactly? There is no neat translation into analogous binary outcomes. Maybe we care about post-vaccine reactions as if those cases were negatives, and infection protection as if those cases were positives. You can still apply Bayes’ rule to estimate those sorts of things, but there are more outcomes than this that we might care about, and anyway the outcome spread’s structure is different.

So, although it’s a simplification that can promote dangerous distortions, the binary structure of screening tests has useful properties conceptually, visually, and mathematically. Besides, it’s not my doing. We get a real world with binary screening test outcomes across security, health, information, and other contexts, like it or not. Might as well analyze it as a class.

Maybe there is a confusion matrix translation solution to be had here to broaden this class usefully, though. Maybe binary interventions with binary outcomes, like abortion (yes or no) and death (yes or no)? What parts of the standard confusion matrix for mass screenings for low-prevalence problems scale to any such 2x2, and what parts don’t? How do the properties change outside the special, artificially binary class?

And what about the even larger universe of mass interventions? What makes an intervention preventive? Some might argue “right to conceal” laws protect a preventive intervention (carrying a gun to protect yourself from crime); gun control proponents would point to gun deaths and consider their preferred policy regime preventive, too. Similarly, some liberal abortion proponents might argue “abortion on demand” regimes protect a preventive intervention (preventing birth); some conservative abortion regime proponents might argue more restrictive regimes protect women and children from preventable harms. So what’s the difference between a legal regime that tries to help prevent bad outcomes through screening, one that does the same thing through intervention, and one that does the same thing through making an intervention available to people that they probably won’t use — or through restricting that same intervention (aka many laws)?

Laws and interventions both condition people’s choices by changing choice structures. Preventive interventions like screenings can be written into laws, like mass digital communications scanning for child sexual abuse material (Chat Control and its analogues). So some screenings are formally nested within laws, but not all possible screenings are (in the sense that many mass screenings reflect accepted professional practices, not legal mandates). Are these mass XXX (screenings, preventive interventions, social constructs establishing preventive behavioral norms) for low-prevalence YYY (badnesses like crimes, diseases, and misinformation)? Or are they structures that condition (some more, some less heavily) individual choices around risks that society has an interest in mitigating? Or something else?

What about legal regimes around personal choices? This is a far broader category of thing than mass screenings for low-prevalence problems. But a lot of laws are mass preventive interventions for low-prevalence problems. They’re just problems that are high enough prevalence under conditions of low enough persistent inferential uncertainty that they make sense, like seat belt mandates for car accident death and serious injury prevention, or carding alcohol sales.

We might see mass preventive interventions or screenings as different from just any old law, because any old law tends to include what were once new mass preventive interventions. Could this, too, be a selection bias? Maybe we don’t “see” the survivors at the policy level — the interventions that made sense?

There are a lot of open questions here, and I’m hoping someone will give me some food for thought: Why are mass screenings for low-prevalence problems different from mass preventive interventions for low-prevalence problems, other than binary screening tests offer more convenient math? Why do laws differ from mass preventive interventions for low-prevalence problems, other than no one thinks of them that way, and only some of them have outcome spreads we can think about sensibly estimating as binary? Whether and when does any sort of 2x2 (binary intervention with binary outcomes) play out mathematically like this other, special sort of 2x2 (mass screenings for low-prevalence problems)?

Why Does This Rabbit Hole’s Eye View Matter?

Going back to the abortion-death 2x2 pictured above, survivorship bias skews women’s health narratives about abortion in favor of the consensus story that liberal legal regimes net benefit women’s survival. But there’s insufficient evidence to establish that story. And this case isn’t sui generis in Pseudoscience Swamp.

Similar distortions in what is seen and what is unseen, heard and unheard, affect different actors in different forms of mass screenings for low-prevalence problems (which share this case’s binary outcome spread structure). Future posts will say more about these things. Suffice it to say these sorts of asymmetries and distortions cut differently by actor type (subject, expert, and analyst = 3), the degree of inferential uncertainty in the intervention (high, medium, or low = 3), and the differences between both knowledge (2) and perception (2) across every type of actor and intervention — as well as outcome (true positive, false positive, true negative, and false negative = 4). But drawing that all out would create a 3x3x2x2x4, making 144 cells in one aptly named confusion matrix. And really, some of the values should be ranges (like the base rates underpinning the true/false positive/negative values) or otherwise account for heterogeneity and complexity better than they do in this heuristic. So such a model might be too complex to be useful just for this one, usefully simplifying class of program.

The larger problem remains that we need to better see differential costs of information asymmetries in mass screenings, preventive interventions, and laws. That’s what a next post thinks more about.

Wilde Truth

Discussion about this post