Sno' Right and the Seven Dwarves of Risk Assessment
A fairy tale of satisficing and informed consent
You’re still sitting in the waiting room where the doctor sent you to think it over, unsure what to do. Your friend sent you this free online risk calculator that shows relevant frequency count outcomes (Gigerenzer) comparing real-world options (Gardner) based on causal modeling (Pearl, Greenland, etc.) for someone in your subgroup. But you still don’t know what to do; it doesn’t feel right. You can’t just outsource your risk assessment to one shop and trust what it says; that creates a single point of failure in a system full of bias, fraud, and error (science and its trade show imitators). Besides, it’s your decision, and you have to take your own best shot. But you don’t know how.
Luckily, just then, seven dwarves enter the waiting room. They’re here to help you identify different approaches to doing risk assessment, to see what feels best for you in this situation. Clearly it calls for satisficing, making a good-enough decision from limited information; yet, you still want to have an informed say. Maybe the dwarves can help you live the fairy tale dream of having decision-making both ways?
Abdicate
Maybe you don’t have to make a decision at all. Of course, non-decision and thus non-action are also decisions by default in their own right. But it’s not your fault time flows in one direction without stopping!
Imitate
Doing what other people are doing is a good strategy for getting through the ballet recital (trust me). Why not try it in life? Your life choices then won’t be your own. But free will seems to be out of fashion, anyway. (If that bothers you, see this.)
Correlate
Many an academic’s best friend, Correlate looks at the data and draws conclusions. Publishable conclusions! Throw them on the dumpster fire that is contemporary institutional science and skewer up a marshmallow. (By the way, that experiment didn’t pan out, though people disagree about whether it’s been widely misinterpreted or failed to replicate.)
Bloviate
Well-funded by industry and often invited to state dinners, Bloviate can tell you stories. These stories typically involve very high accuracy rates, including for mass surveillance programs like polygraph screenings, social media snooping, and proposed European telecom legislation Chat Control. But if you ask him about false positives, he’ll suddenly get too Sleepy, Grumpy, and Sneezy to answer. That’s because his tale omits the implications of probability theory in the accuracy-error trade-off under conditions of inferential uncertainty, and mapping hypothetical mathematical outcome spreads onto practical values and realities we care about. If he understood that, he wouldn’t have a job. Pass the fleur de sel.
Mitigate
Mitigate (Tiggie to friends) believes there’s one right answer, and she knows what it is. She likes to promote it in order to mitigate harm in the public interest. Nudging and boosting are her current favorite methods.
There’s nothing wrong with promoting more informed consent and healthier behavior with better information and conditions. For example, showing patients fact boxes to that transmit statistical information in more easily understandable frequency count format, improving risk comprehension and statistical inferences without statistical training. Or making little environmental changes like giving kids more time to eat a healthier dinner, improving fruit and veggie intake.
Or is there?
Tiggie’s got problems
In Sander Greenland’s recent talk “There’s Not Much Science in Science” (video; slides), he reiterates the need for cognitive science and causality in statistics teaching and practice. Citing leading methodology reformer and Columbia University professor of political science and statistics Andrew Gelman’s blog posts on particularly bad (sometimes outright fraudulent) science and “nudgelords” thereof (1, 2, 3, 4), Greenland reminds us that we’re:
like ants that don’t begin to comprehend the system we’re in. In the words of some, we are apes with brains enlarged to coordinate action in large groups… Those who study and debate cognitive biases exhibit the same cognitive errors as everyone else — even or especially when discussing these problems.
At the individual level, this is where it gets dangerous to be an expert — if you’re not ok with making and admitting your mistakes, because we all make them, then you’re going to hurt people and not learn from it. At the level of decision-making science and the imperative for science communication to help people make better decisions, this is also why it’s dangerous to tell people what to do. There are a few ways this form of communicating risk assessment with the goal of mitigating known net harm (Tiggie) can go bad.
Manipulating people can backfire.
What if Tiggie tells people to do the (probably) right thing for the wrong reason, and it backfires? Like DARE, the famously inaccurate, scare-mongering underage substance use prevention program that accidentally caused increases in kids trying drugs? Or vaccine payouts or negative nudges for the vaccine hesitant, which similarly backfired?
Maybe some people don’t like being told what to do (hi). So that’s a problem for nudging, but not for fact boxes. One might see fact boxes as nudges toward particular decisions. But they don’t tell people what to do; just what they need to know to figure it out.
Still, there’s another problem with both more and less directive sorts of mitigate-minded decision-making aids: They are sand castles built on contemporary science. The foundation is unstable and, in many cases, the entire structure is about to go.
Basing decision aids solely on specific science may reflect suboptimal strategy.
Take, for example, the case of hormonal contraception and breast cancer risk, where correcting mistakes in the underlying medical literature swung the net death balance to zero when comparing the pill and withdrawal for mothers. There’s something to be said for trying to really show women what they’re choosing in life or death terms when they choose a birth control method. So it was fun doing this example out. There are other people (Gigerenzer/Harding Center; Hertwig/Science of Boosting) who already do this sort of thing better, and it’s important work.
But this is still a Tiggie strategy for risk assessment and communication. It reflects an optimizing impulse to get it right, and advise thinking people accordingly. Informed consent through better science. A fine fantasy for a better future.
This post, on the other hand, is a fairy tale of satisficing and informed consent — of checking that impulse, because (I fear) it ultimately makes us more vulnerable to our own hubris in the face of our imperfections. Oops. (I should note that Gigerenzer and Hertwig have done plenty of work on fast and frugal heuristics, too.)
So yeah, better to base a fact box on correct science than on incorrect science. But even better, maybe, to not try to magically resolve the science crisis overnight in order to get the calculations right before helping people make better decisions in a larger number of cases where science (being a publicly funded, public service enterprise) owes that to them right now. People don’t have time to wait for that Godot. And we will never stop being human beings making mistakes when we do science.
Also, my hormonal birth control/breast cancer calculations were a proof of concept. My numbers were metaphors. And if you’re going to use metaphors, then you might as well satisfice. It’s the better strategy.
The two strategies aren’t necessarily mutually exclusive in the context of decision-making tech. Where resources permit, going deep on case studies could help and be helped by going broad with heuristics along another path, with iterative learning from both improving outcomes. This might be especially useful under conditions of epistemic uncertainty. Those conditions are another problem for Tiggie…
There is a lot we don’t know.
This is a problem throughout science and science communication. It’s something leading science reformers talk about a lot, encouraging scientists to more explicitly incorporate uncertainty into their interpretations of their results. Meta-cognition about that interpretation is one level of that conversation.
Bringing that conversation to the problem of misinformation, however, is taboo. It’s uncomfortable to acknowledge that, on one hand, misinformation is a social and political problem; and, on the other hand, labeling something misinformation is itself a political act that reflects our own non-neutrality as human beings stewing in our own cultural soup. But it’s really important to acknowledge this bind, not least because “fake news” legislation can have chilling impacts on press freedom worldwide. And it’s also just good meta-cognition, since this is really a problem not of science per se, but of human nature (fallibility), and cognitive bias imbuing everything we silly humans do.
We are Stupid Midas; everything we touch turns out to grow mold. The kernel of truth in the promise of scientific progress is that we do figure it out as we go, correcting old mistakes, building on cumulative knowledge (sometimes by burning down the house). But the falsehood in it is what Greenland calls “romantic heroic-fantasy science,” which imbues the fact-value distinction with an absolute quality it does not possess.
Example: As I’ve written previously, the experts behind a sea of abortion myth-debunking wrongly promotes the consensus view about possible mental health harm from abortion (i.e., there is none) as definitively correct, when it is provably false due to methodological mistakes (i.e., we don’t know, which gets misinterpreted in various ways as knowing). Experts can and should correct their mistakes to prevent possible harm from this systematic, empirical misinformation. That is a doable correction (people just don’t want to do it). But this is still not a solveable problem, because neutrality is a myth.
Shake shake shake: Mix a little perceived manipulation with some empirical and logical errors (especially inattention to causality in the former category and misrepresentation of ignorance as knowledge in the latter), and you get a recipe for mistrust. Mistrust corrodes public culture and is associated with bad outcomes like parental vaccine hesitancy. This is the opposite of what Tiggie is trying to do.
Calculate
So you still want to calculate risks to enable better risk communication to the people science is supposed to be serving. Join the club. Now go relearn how to do science as the infrastructure is still being built to do it better. (Or call me and we’ll relearn it together.)
Calculate doesn’t want to wait. Join the club! But you cannot, in a majority of cases, do good enough risk calculations from the existing scientific and medical literature, even if your orientation to conveying the implications of the information therein is more circumspect. There is just too much work to be done applying the insights of the causal revolution and other recent methodological advances and reforms. There are exceptions. But often, the published scientific record is too shaky a ground on which to build the science communication house.
Take, for example, this pair of recent papers on birth interventions and associated problems. Qiu et al’s paper “Association of Labor Epidural Analgesia, Oxytocin Exposure, and Risk of Autism Spectrum Disorders in Children,” published last month in JAMA Network Open, links epidural and oxytocin with autism. It’s worth noting the unadjusted correlations in Table 3. (Table 2 should not have been published, because it includes 11,374 preterm or postterm births, when what we want is to compare term birth outcomes — apples and apples.) Table 3 reports epidural may increase autism risk by 33-55% (95% compatability interval 1.33-1.55), oxytocin by 19-35% (95% CI 1.19-1.35), and both 39-66% (95% CI 1.39-1.66). It’s generally accepted that oxytocin predicts epidural because ow, making contractions stronger — out of sync with natural painkillers the body might otherwise produce on its own — hurts, and women then need more labor pain relief; but there may also be an additive effect (more interventions, more risk).
We should not ignore the correlations in the raw data. As a matter of informed consent, women have a right to know that there is uncertainty about the safety of these common labor interventions for their children’s health. It’s the same with substantial typical correlations in unadjusted raw data between maternal antidepressant use and serious risks to pregnancy/offspring harm including autism, and between abortion and suicide. We don’t know if these correlations reflect causation or not, and the possible associations are substantial in all cases.
The uncertainty is the story. Not communicating that story is empirically wrong, dishonest, and undermines informed consent. But not communicating that story is the norm, because people often have aversive cognitive and emotional responses to uncertainty. Familiar stories feel better. Do we even have a mythology of uncertainty somewhere, sometime, across civilizations? I think we have mostly mythologies of characters with known flaws whose virtue leads them to hubris and then nemesis. Story structures are satisfying because they key on predictability in various ways.
Not coincidentally, as I’ve written, this Qui et al piece is a bullshit study re-reporting similar past findings without advancing scientific knowledge. It’s still not clear what knowledge exactly women have a right to know. Again, the uncertainty is the thing.
By failing to diagram causality and adjusting for many covariates that could share common cause with epidural, oxytocin exposure, and autism risk, researchers may have introduced more bias than they corrected for in the adjusted models. For example, women with worse health may make different reproductive choices such as having kids later which affect their subsequence medical care and pregnancy/birth risks and behaviors. This could cause selection bias where health acts on both pregnancy/birth problems which cause pain necessitating pain relief, and on autism risk, with no causal relationship between oxytocin/epidural and autism required to explain these correlations. That would mean that correcting for health as a simple confound could introduce collider bias into the models, producing more distorted estimates even more than not correcting for health would have done.
Equally importantly for informed consent, the practical implications of this study seem to directly conflict with those of another recent study in this realm. Muller et al’s “Induction of labour at 39 weeks and adverse outcomes in low-risk pregnancies according to ethnicity, socioeconomic deprivation, and parity: A national cohort study in England,” published last month in PLOS Medicine, links induction of labor (IOL) at 39 weeks with better perinatal outcomes — potentially lethal stuff like sepsis and necrotizing enterocolitis, as well as the likes of mere broken baby bones and IV fluids (Table 2). I’m not talking about the effect sizes, because the analyses are invalid…
As previously noted, the practical mismatch that then arises between these two sets of recent findings is that IOL is usually done with oxytocin, which often leads to epidural. And adverse perinatal outcomes are among the stressors commonly associated with substantially increased autism risk. So while the former study suggests common labor interventions may put baby brains at risk of permanent neurodevelopmental harm, the latter suggests the same interventions may protect them. What do we tell heavily pregnant women who want above all to keep their babies safe? “Sorry, science is a dumpster fire; call back next week?”
Notably, the same basic methodological problem affects this second study. No causal diagramming meant no identification of colliders, so the adjusted model may have again introduced more bias than it corrected for (collider-stratification bias). In this case, the adjustments were relatively minimal; but this bias story might still be consistent with the findings. The biggest possible effect sizes (Table 3) were for women who hadn’t had a baby before, non-white women, and socioeconomically deprived women — groups generally at heightened risk for birth complications. The adjusted model corrected for parity, race, and deprivation as simple covariates, when they might share common cause (e.g., maternal health problems) with the independent variable (IOL at 39 weeks) and the dependent variable (adverse perinatal outcomes) of interest. In other words, maternal health (exact same logic as in critiquing the other paper) could contribute causally to confounds included in the adjusted model, like socioeconomic background and maternal age — e.g., maybe sicker women make less money (disability being disabling) and have kids later, because kids are expensive, and maybe you wait until you can afford one or your clock is running out anyway. Maternal health could also causally contribute to IOL at 39 weeks, as well as to adverse perinatal outcomes. Even correcting for “purely” demographic factors like this could introduce collider-stratification bias in this context, potentially reversing the sign of the true effect.
The authors of this study didn’t only muck up their adjusted analysis in this way; they also poisoned their unadjusted analysis before doing anything else by excluding “women only from the group who had IOL and birth at 39 weeks if they had premature rupture of membranes, placental abruption, pregnancy-induced hypertension or pre-eclampsia, eclampsia, or amniotic fluid abnormalities.” These exclusions must be made from both groups. Otherwise, the comparison is outrageously bad and totally invalid for the same reason that both studies’ adjusted analyses are suspect: collider bias. This analysis should not have been published. But hey, this is the scientific literature we have, not the scientific literature we want.
The structure of the problem with these two studies is the same. It’s the one Sander Greenland keeps talking about:
we should ask what meaning, if any, we can assign to the P-values, ‘statistical significance’ declarations, ‘confidence’ intervals, and posterior probabilities that are used to decide what and how to present (or spin) discussions of analyzed data…. So-called ‘inferential’ statistics can only become contextually interpretable when derived explicitly from causal stories about the real data generator (such as randomization), and can only become reliable when those stories are based on valid and public documentation of the physical mechanisms that generated the data. Absent these assurances, traditional interpretations of statistical results become pernicious fictions that need to be replaced by far more circumspect descriptions of data and model relations. — “Connecting simple and precise P-values to complex and ambiguous realities,” Sander Greenland, Scandinavian Journal of Statistics. 2023; 50:899-9214.
As leading science reformer Richard McElreath reiterates, until we start diagramming causality first and running statistical analyses later, the uncertainty is the story. This is especially important for dealing with selection bias, which can cause the sign of the true effect to reverse. An effect and its opposite: that’s a lot of uncertainty.
Circumspect interpretation is all well and good, but you’re in a waiting room and you want to make your best decision. So are the two practical implications these two studies suggest — less and more labor interventions for better long-term neurodevelopmental outcomes — mutually exclusive? Not necessarily. Maybe the types of women who were induced at 39 weeks had some unmeasured indication that it was a good idea. After all, this was a cohort study, not a randomized trial. So maybe there is a long-term child neurodevelopment benefit to inducing labor at 39 weeks when something suggests it’s a good idea, but there’s a cost to doing it otherwise. That might make sense of these recent studies with apparently opposing implications. But it still wouldn’t produce a calculation helping anyone decide when to consent to IOL at 39 weeks.
Entirely remaking science to have greater integrity, better methods, and more appropriate public infrastructure enabling transparency and correction would still not solve the problem of calculations like this being hard or often wrong. Because this is also a problem of interpretation. Science reformers addressing the problem of bias in science are not suggesting that we can solve the problem, at all. As Greenland quips:
“DATA SAY NOTHING AT ALL! Data are markings on paper or bits in computer media that just sit there… If you hear the data speaking, seek psychiatric care immediately!”
So yes, we can try to calculate more and better, as best we can from the imperfect literature we have, and be circumspect about the practical implications of these calculations. But this wouldn’t necessarily help people in complex real-life decision-making situations, which is science’s public service imperative. There is too much unknown.
What now? The science crisis meets science communication and says “you’re a silly parrot; have a nice day”? Unsatisfying!
Last dwarf, you’re our only hope!
Simulate
Let’s not throw out the baby (Simulate) with the bathwater (Tiggie’s problems). Consider some cool recent research on simulation by Hertwig and colleagues. In a randomized controlled trial with 300 German chronic noncancer pain patients, Odette Wegwarth, Wolf-Dieter Ludwig, Claudia Spies, Erika Schulte, and Ralph Hertwig, studied “The Role of Simulated-Experience and Descriptive Formats on Perceiving Risks of Strong Opioids” (Patient Educ Couns, 2022 Jun;105(6):1571-1580). They found both fact box and simulation interventions improved accuracy in patient perceptions of the risks and benefits of strong opioids, and desire to change behavior — but the fact box produced patients better-informed on the numbers, while the simulation caused a lot more actual behavioral change (stopping or reducing opioids, starting an alternative therapy).
So if helping people enact choices advancing their own health is the goal, simulations may beat fact boxes. Maybe this is only true for risks that tend to play out over time though, like increasingly impactful harms of opioid use (i.e., addiction). There are also other scope conditions that we need to talk about, when we talk about fostering informed consent.
One is the quality of the evidence and, relatedly, the amount and type of uncertainty it contains. It’s well-established that opioids, like smoking, are highly addictive and extremely dangerous — contributing to the majority of drug-related deaths worldwide. Germany has avoided a U.S.-style opioid crisis, in part because it was a totally predictable problem. (And in part because U.S.-style corporate corruption and Chinese fentanyl trafficking lit the fire and stoked the flames.)
But the evidence on other medical interventions is not so hot. For example, the medical and public health establishment universally promote “exclusive breastfeeding” as the best infant feeding method. This set of practices, however, turns out to be a surprisingly recent invention based on historical ignorance and bad science. It causes common and preventable harm to newborns, including possible permanent neurodevelopmental damage and death. You wouldn’t know the science backing this norm was dangerous bullshit unless you really dug into it, thought about causes, and maybe listened to some moms (hah). You would think, like experts worldwide do, that you knew “best,” and try to promote it, whether people (well, women) like it or not.
Notably, people who need to make decisions can’t improve their decision-making in cases like this by using fast and frugal decision trees, which rely on the trustworthiness of credible-seeming sources. Because, we’re not supposed to say it out loud — pandemic! public health!! national security!!! — but those sources (like the WHO) are often wrong.
So what if we stopped trying to be right (Calculate) or helping people be right (Tiggie) — and started helping them, help themselves be less wrong in their own decision-making processes, based on what we know about usual mistakes (Simulate)? Instead of a risk calculator, then (which I was fantasizing about previously before realizing I was wrong and wronger), imagine a decision-making map that simulates conditions for correcting for common mistakes.
Some are cognitive biases, like confirmation bias (seeing evidence that supports what you believe, but not evidence that might disconfirm it), anchoring bias (being unduly influenced by a first-come, first-cognitive serve reference point), and congruence bias (failure to test alternatives). Others might be better thought of as heuristics, or shortcuts that we know can improve decision-making — with prompts to correct for the vulnerabilities they themselves can introduce (like the black swan problem for the availability heuristic - availability being about your experiences and those of your immediate network; black swans being rare outcomes most people haven’t experienced or heard about, which leads to under-weighting when we use the availability heuristic). Still others might be meta-scientific corrections for common scientific mistakes, like a bit of code that checks publications for statistical significance testing misuse. And still others should certainly bring direct experience to bear on decision-making. I keep returning to the example of infant feeding: If your baby is screaming with hunger and your breasts don’t seem to have much milk, could being cued to recognize the empirical data you yourself observe help you to buck bad medical and social advice in order to make better feeding choices for your child? I hope so.
This is a big quiver of different types of arrows. It’s just the outline of a direction where we might find some solutions to the problem of science communication under conditions of imperfect meta-cognition about imperfect cognition embedded in imperfect social and political structures. The end result should help people grappling with both uncertainty in outcomes, and in the underlying knowledge base. Otherwise, this sort of tool risks performing what Andrew Gelman calls “uncertainty laundering,” in which limited evidence summaries like p-values or compatability (confidence) intervals get glamorized as truth indicators. That’s not what science communication should do.
We’re going to lack perfect information and make mistakes. We just want them to be…
Mistakes We Can Use
In the classic fairy tale, Snow White (you, the patient in the waiting room) finds a home away from home with the Seven Dwarves (different decision-making strategies), where she works for her keep until the Wicked Witch in disguise (reality) comes back for her with a poisoned apple. You know the story: the protagonist is doomed to fall for it, take a bite of the apple (flawed knowledge), and keel over for dead until a handsome prince rescues her with a kiss.
Wait, no, sorry — that was Sleeping Beauty. Turns out Snow White was actually saved by a servant carrying her glass coffin, tripping — dislodging the poisoned apple bite, and waking her up. Oops.
We’re not going to stop making mistakes in life or in science. But maybe we can take a page from Snow White, and try to make more mistakes we can use. And make more use of the mistakes that we know we make.
Science communication (like humanity) is trapped in this classic story structure (doomed protagonist evades villain, until she doesn’t). Because science is done by humans (flawed reasoners with limited meta-cognition for the social systems of which we’re a part). And I am just picking on science as a scientist; the same criticisms can be made of the parrot problem in fields where the knowledge base until recently was more correctly termed junk science, like law enforcement (e.g., forensics). Cargo cult similarly abounds around technology and technology-mediated decision-making practices across fields, and we might want to distinguish that from science for similar reasons. Or maybe we should say Science (the ideal) versus science or Science’ (the practice) — but this distinction would seem to fall into the heroic fantasy science trap. We are all just people living in culture, making culture, being made by culture, trying to make it good, remaking mistakes, remaking corrections, remaking attempts to make decisions about complex problems like migration half as well as wild geese.