Lately I’ve been thinking about the big picture of modern society struggling to make sense of rapid change and rapidly growing complexity, much of it under the guise of science — and much of that not so scientific, after all. In the course of reading and learning from dialogue with statistician Sander Greenland about this in the context of my breastfeeding research after it took me down the statistical significance rabbit hole, I got interested in the case of birth defect risks from mothers taking antidepressants in pregnancy, an example he’s used lately. True to his reputation, SG was incredibly generous with his time and expertise as I wound my way through select corridors of the methodology literature, eventually diverting from my initial obsession, to this closer look at another instance of possible common and preventable harm to infants and children from their mother’s medical choices. Choices based on bad information, based on bad science. The usual disclaimer about mistakes being my own applies. The result, “Dubious Science: Downplaying the Risks of Antidepressants in Pregnancy,” is published today on Mad in America: Science, Psychiatry and Social Justice, a platform and body of work I appreciate tremendously.
Similarly, I was also extremely lucky to work with editor, medical journalist, and Mad in America publisher Bob Whitaker on this piece. I couldn’t have dreamed a better exercise in developmental editing for communicating science to a lay audience. And boy did I need one.
Still do. One of the things I'm still trying to figure out is, who is Bob Whitaker of women’s health? I need that person’s editing to complete a piece on abortion that’s been living partly in approximately 500 open tabs on my computer for over six months (no joke). And with my breastfeeding research (book manuscript, proposals…) — I have this problem on steroids. Maybe Santa will bring me executive function for Christmas. Or whatever special magic I need to pull off the big-picture synthesis that’s been brewing across these topics, possibly on spin science in women’s health.
Back in the context of the day, here are a few of the many things that are still bothering me about the research literature on antidepressants and pregnancy. This is a monster post because this is a monster of a disturbing literature, and I’m barely scratching the surface. I tried ordering points from most general-interest to research methods nerdiest. But, being me — often struggling to know which narrative thread is king, which unit of analysis is paramount, and which book proposal to work on — I have to start by saying something about the level of focus…
More Offenders
At a glance, the two articles my MIA piece focuses on critiquing, Brown et al and Suarez et al, are examples of worst offenders. But there are many others examples of recent spin science on risks of antidepressants in pregnancy that make similar moves. Here are three:
In a 2017 JAMA Original Investigation “Associations of Maternal Antidepressant Use During the First Trimester of Pregnancy With Preterm Birth, Small for Gestational Age, Autism Spectrum Disorder, and Attention-Deficit/Hyperactivity Disorder in Offspring,” Sujan et al found the usual, substantial possible effects including up to over a doubling of autism risk, but made it disappear with modeling and restricted subgroup analyses. Three of the authors reported taking money from pharmaceutical companies.
In a 2013 New England Journal of Medicine article “Use of selective serotonin reuptake inhibitors during pregnancy and risk of autism,” Hvid et al found again up to over a doubling of autism risk, and reported finding no “significant association” in their adjusted model results. While technically true, this common misuse of statistical significance testing dismisses potentially practically significant results.
In a 2013 Clinical Epidemiology article, Sørensen et al also found the usual effects of up to over a doubling of autism risk associated with antidepressant use, shrunk them with problematic modeling, and shrunk them more by restricting the analysis to look only within families. One of the authors reported taking money from two pharmaceutical companies.
It shouldn’t be normal for researchers to take money from pharmaceutical companies, and then publish papers based on questionable and nontransparent modeling that turn findings of substantial possible risks of preventable harm, into purported evidence that said harm is nonexistent. But this is the world we live in. Black-box hand-waving away of serious possible risks even to particularly vulnerable groups is part of normal science. It’s not that there is nothing to see here, or even that leading medical journal editors are failing to see it. They see something, they say something; but the problems don’t get solved (h/t McElreath). Do not adjust your sets…
Overlooked Preconception Risks
All these research teams — Brown et al, Hvid et al, Sørensen et al, Suarez et al, and Sujan et al — found that autism and other risks may also have been heightened for children whose mothers took antidepressants before pregnancy. Possible preconception risks of taking SSRIs is potentially consistent with other evidence: Oocytes (ovarian follicles that develop into mature eggs) take six months to develop. Serotonin regulates cell cleavage, a key part of oocyte maturation. And we know from other research on preconception risks from factors like famine and folic acid deficiency that something that can matter for brain development for life, may matter months before conception. More research is needed, but in the meantime, the available evidence invokes the precautionary principle. Serotonergic antidepressants are not proven safe in the preconception period.
Men Don’t Get A Free Pass
Reality isn’t as blindly sexist as people. Women and men make babies together. Women and men both take antidepressants sometimes. Women and men should both be accurately informed about possible risks of taking antidepressants when they're thinking about having children. But men get even less fair warning…
This is because most studies assessing neurodevelopmental risks associated with antidepressant use ignore fathers. But Sujan et al looked, and found paternal first-trimester antidepressant dispensations were associated with up to more than doubled ADHD risk, up to around a 60% autism risk increase, and possible increased risks of preterm birth and small for gestational age (Table 4). The authors then interpreted this as evidence that the antidepressant-neurodevelopmental risk connection was not causal. The evidence does not establish that.
To the contrary, the link could instead be suggestive that fathers taking antidepressants before conception may impose substantial risks on their offspring. The authors didn’t analyze fathers’ preconception antidepressant use, but first-trimester antidepressant use likely correlates with it. So, far from clearing antidepressants’ name, these findings are consistent with what we know about environmental conditions’ effects on sperm generally, and SSRIs’ potential to damage semen quality and sperm DNA integrity in particular. As in the case of antidepressant use during pregnancy and various risks, we cannot know from this evidence whether substantial possible links between antidepressant use and developmental problems are causal; but there are reasons to suspect they may be.
As always, this story is part of other stories. One of the larger, little-told stories here is that there are a number of relatively widespread male lifestyle choices that could adversely affect children’s long-term health through preconception damage. Psychiatric and other prescription drug-taking is a big one. Fasting diets are another that have not been studied in this context as far as I can tell, but should be because of what the evidence suggests about preconception famine exposures from a number of natural experiments. Recent research on aspartame exposure in mice triggering transgenerationally transmissible anxiety also comes to mind; food additives are basically another source of drug exposure that we might not often consider.
There are a lot of men and women who are making what they think are healthy preconception choices. In reality, they’re doing experiments on their kids, and possibly on their grandkids, because we don’t have adequate safety data on a lot of common behaviors and exposures. And they don’t know it.
The Pink Ghetto
Why does it matter so much that we worry about men, too, if the research gives cause for concern, but we don’t really know what’s going on here? One of my concerns is that people should get better information on which to base better choices as a matter of free will (bracketing whether it exists; it probably doesn’t really, but that’s too depressing to fathom). Another is that, well, I love men. And I see a lot of evidence that, in some ways, they’re actually the weaker sex (e.g., in terms of vulnerability to nutritional, infectious disease, and other environmental stressors), and need special protection that they’re not getting enough of. But another, strategic concern I have with “doing women's issues” is that treating issues, like possible long-term harm to children from parental antidepressant use, as “women’s issues” has the potential to ghettoize them.
To put this in context, it’s typical for concerns and professions that get feminized to be taken less seriously and valued less in literal terms. On one hand, women’s issues are societal issues. Whether women have good health, good lives, good sex, healthy relationships, happy families, and successful careers — or not — has a huge impact on societal outcomes like birth rates, not to mention everybody else’s well-being.
On the other hand, the more women go into a field, the lower the salaries tend to go, and that’s just one example of the pink ghetto. This matters here insofar as focusing solely on whether mothers take antidepressants before or during pregnancy, as opposed to whether fathers do too, feminizes the topic of possible substantial risks to children’s long-term health from parents’ antidepressant (and other drug) use. It’s not solely a women’s issue, empirically. And it may well be strategically costly to women to mischaracterize it as such.
I don’t want to say that talking about women’s issues as women’s issues hurts women, because society values women less. But what if it’s true? This would seem to create a dilemma: We have to communicate with other people about reality somehow. There is no perfect way to do it. Every path involves mistakes. Choices about level of focus, framing. Interpretation. Story-telling. Spin.
No one escapes the Matrix, the social fabric, the problematic web in which we’re all embedded, like it or not. You can’t effectively fight for women without calling it fighting for women, because you need other people to think and work better. But you can’t call it fighting for women without devaluing the cause, and fighting for women less effectively. So how do you fight for women?
Probably by fighting for everyone. But you can't do everything. “Choose your battles” is a cliché for a reason. Besides, you find more actual allies when you just say what you’re up to.
One possible solution to this problem is to bring a women’s issues lens to broader work. A feminist lens that insists the personal is political, perspective is pervasive, and knowledge is power in more ways than one. This is part of why I’ve argued that we need more feminism in research methods.
Correlation Doesn’t (Not) Equal Causation
The problematic research literature on antidepressants in pregnancy is part of a much larger problem: a widespread phenomenon of bad science in which it’s common to spin results from shoddy, complex, and nontransparent modeling as showing evidence of no effect, when the crude (unadjusted) estimates show a quite substantial possible one. Then, researchers often wave their hands and say there’s not an effect, when their own research shows there may be a big one.
It’s like they think they’re doing good science if they disprove a finding. See the correlation, destroy the correlation. Wax on, wax off. This sounds very much like an attempted pop version of some criticisms of the cult of statistical significance, because it is.
But this is not just about misuse of statistical significance testing versus fuller interpretation of results in terms of confidence intervals — aka painting the target, not the bullseye. It’s about the interplay between faulty causal inferences and exegetical evidence, which is about power shaping knowledge more generally.
There are lots of problems with the rejection of the idea that correlation may imply causation. Four seem particularly relevant here. They come straight from the “statistical methodology greatest hits” playlist on repeat in my methods nerd groupie head. I think maybe they’re all criticisms of different misinterpretations of the intro stats textbook truism “correlation doesn’t equal causation.” So I'm wondering who has already introduced the apparently necessary corollary: correlation doesn’t equal not causation, either.
Or, in meme parlance:
Correlation
Do not pass Go. Do not collect $200. Go back to the beginning and think about causality in a linear way first, with time factored in.
…
Causation!
(Note the traditional item 4 “Profit!” has been replaced by “Causation!” No coincidence. Public interest science has an incentives problem. Namely, lack of them. Different topic.)
1. Absence of Evidence Isn’t Evidence of Absence
There is no right way to spin absence of evidence establishing causality, as evidence of absence of that causality. But that’s exactly what Suarez et al, one of the two studies my MIA piece critiques, does. That study concludes with the hand-wave: “The results of this cohort study suggest that antidepressant use in pregnancy does not increase the risk of neurodevelopmental disorders in children.” In fact, the evidence this study reported cannot tell us whether antidepressant use in pregnancy caused big possible risk increases, or not. After baking in numerous questionable analytical choices (more on this below), the researchers turned evidence into something resembling absence of evidence, and then misrepresented it as evidence of absence.
The main point here goes back to causation. Correlation doesn’t equal causation. Correlation doesn’t equal not causation. Not correlation doesn’t equal not causation. And correlation isn’t best understood as a binary in the first place. With or without adjustment, if you can interpret a range of most likely correlation coefficients (aka effect estimates in the 95% compatability or confidence interval) as absence of evidence of an effect, that tells you something about the uncertainty in the evidence we have. Not something binary and definitive about the effect.
You know the old TV ad tag line about men’s hair replacement products? “I’m not just the CEO; I’m a customer.” Well, uncertainty isn't just the question in good science. It’s also the answer.
2. Causality Isn’t a Magic Word
Over email, Suarez contended “Our conclusions do not make any causal statements, and only state our interpretation of the results.” This conflicts with the study’s conclusion, stating, again, “The results of this cohort study suggest that antidepressant use in pregnancy itself does not increase the risk of neurodevelopmental disorders in children.” Suarez’s hedge evokes epidemiologist Miguel Hernán’s warning that “scientific euphemisms do not improve causal inference from observational data.”
We’re interested in correlation because we’re interested in causation. Interpreting results as suggesting something does or doesn’t increase risk, makes an implicit causal argument. Avoiding saying the magic word doesn’t change that.
It’s also notable that we cannot say, based on available data, what Suarez et al do explicitly say about causality: While there are big associations between mothers using antidepressants in pregnancy and their exposed children having neurodevelopmental disorders, “it is unlikely that these associations are causal.” The evidence doesn’t establish whether or not the link is causal. Rather, it gives cause for concern that it may be.
Suarez et al’s mistakes here show how causality gets (ab)used as a magic word in spin science in two senses: (1) In hedging games where researchers make implicit causal claims through euphemisms. Here, “causal” is magic in the sense of “not saying the magic word.” And (2) in making explicit causal claims that rely on the apparently objective language of statistics but are not established by the evidence. Here,“causal” is a magic word because we can’t understand the logic behind the claim, because it does not exist or is faulty; but we are supposed to be quiet about that, because it must be in the stats. Which brings us deeper into those stats…
3. Black-Box Statistics
There are some results that are just so wrong, you look at them and laugh. At least, before remembering other people may believe them, and die. A favorite, recent example is studies purporting to show that smoking protects people against bad COVID-19 outcomes like death.
Spoiler alert: No, smoking doesn’t protect you against COVID-19. That’s obvious bullshit that responsible scientists shouldn’t permit to harm lay people who don’t know any better. But we’re not supposed to say that as scientists! The preferred, more diplomatic and professional way of saying this is to stay within what the evidence establishes.
That looks like this: “One possibility is that by controlling for factors that are influenced by smoking, we may be distorting any causal relationship between smoking and COVID-19 risk; in statistical terms, this is known as collider bias.” So writes Sir David Spiegelhalter in Covid by Numbers. He goes on to explain that a simple model adjusting only for demographics (age, sex, deprivation, and ethnicity) showed the positive link we would expect to see between smoking COVID-19 death.
In other words, sophisticated models produced trash. Big, fancy, flashy, “kitchen-sink” (as in, “throwing in everything but the…”) quantitative tools that purported to account for everything that mattered, risked introducing more bias than they corrected for. It’s not even worth naming names, because these models are everywhere. They get you published, so who cares if they kill people? Not a bunch of researchers, apparently.
There’s got to be an intuitive analogy for this that can be keyed into for a popular audience. Like, would you trust a guy who showed up for your first date at a neighborhood coffee shop dressed in a suit and tie, parking a Ferrari by a perfectly good train station? Except he also gains career advancement from racking up dates? Ok, this analogy needs work.
But there is something to it: simpler, more straightforward statistical analyses are possibly better on the whole than souped-up ones. Just like simpler, more straightforward relationships are possibly better than complicated ones.
The German CCC may have hit on a better way of saying this in its successful activism against e-voting machines, accusing proponents of promoting a “ ‘culture of expertism,’ where voters were dependent on someone else having determined the reliability of the voting system.” We could say much the same thing about researchers using black-box statistics promoting such a culture, where readers are dependent on someone else having run a model.
But then we have to justify why scientific truth is on par with transparency in a democracy, where the people who vote have to be able to understand how their vote got counted. Is the logic that the research subjects have a right to understand how their data produced the result? Or that the people the research will affect have that right? Or just other scientists? There is a culture of expertism in research, and saying that’s a bad thing may require a little more justification. In fact, criticisms of cultures of expertism probably work best when you make them with more expertise than the other side, but don’t say that. (Oops.) Is this one of those social games someone needs to clue me in about?
I’m still wondering how to convince people that black-box statistics make bad science. I’m not sure the expertism argument is relevant, in the sense that the solution to the collider bias problem is to diagram out causality using DAGs in the pre-scientific phase of a causal project. DAGs are one-way causal diagrams incorporating time. DAGs are really easy once you understand them; but calling them simple is not going to resonate with most people, who’ve never heard of them. So the problem with black-box statistics may actually be that it’s usually not expert enough, as opposed to that it relies on expertism per se.
This is why it’s also going to be a problem, in the asymmetrical arms race between spin science and methods reform, when bad research includes pretty pictures of DAGs that aren’t necessarily well thought-out and, in some cases, aren’t really DAGs at all — and still excludes things like data and code access, preventing critics from easily running alternate models ourselves. Because that will still be black-box statistics, but then we may often be left saying merely “this is black-box” and not also “see, it’s entirely missing the essential first step.”
Then again, maybe it is enough to say that science must be transparent as a matter of first principles and methods alike. Maybe it is enough to poke holes in individual instances of bad causal diagrams. Probably it will take five years, and then we’ll be at the point where someone has typed common DAG mistakes (if they haven’t already). Then, researchers can still ignore the methods literature to publish spin science, typically with zero consequences; but at least we’ll have a nicely organized way of thinking about this.
Right now, we’re still at the first stage of the arms race with DAGs and black-box statistics. Check out Figures 1 and 2 in Hartwig eg al’s recent antidepressants-ADHD study. The figures show birthdate of offspring causing antidepressant exposure.
This is not a causal relationship. It doesn’t make sense. Nothing mysterious about it. Birthdate of offspring doesn’t causally contribute to antidepressant exposure. This is not a DAG. It’s a pictorial placeholder using appropriate software for what would have been the appropriate first step before modeling.
So far, we have: (1) correlation is not causation, and thinking about both as binaries is wrong — in part because “absence of evidence isn’t evidence of absence,” so don’t throw out the evidentiary baby with the messy results bathwater; (2) causality isn’t a magic word, so be careful about implicit and explicit causal claims from correlation, including using euphemisms where causation will be inferred; and (3) black-box modeling that makes raw (aka crude, unadjusted) correlations attenuate or reverse needs to be more rigorously causal logic-based and transparent than it usually is, and we should be wary of dismissing correlations that can be analyzed away like this on the basis of trust in experts. Because…
4. Fake Neutrality
I keep writing and thinking about science’s non-neutrality, which in some cases it seems fair to call explicitly fake neutrality. Corporate corruption is such a huge problem in drug research. As I wrote previously, Greenland points out that methods mistakes often help spin stories, with scientists tending to put out the preferred causal narratives of powerful social networks under the umbrella of objectivity while doing a lot of hidden interpretative work under the guise of apparently neutral statistics. In other words, spin science constructs fake neutrality that bolsters abuses of power. This form of knowledge is power dressed up as objectivity, and we can do better.
For example, in the Suarez et al article my Mad in America blog critiques, there are a number of questionable analytical choices embedded in the main analyses. All of them likely diluted the effect. We don’t know how much, because the data and code aren’t open-access, so it’s not possible to run different models to see how the results change. Here are a few of the most questionable such choices, with debt to SG for noting the first two:
a) Aggregating different drug classes. Serotonergic drugs (as opposed to tricyclics) present the biggest risks, Suarez et al’s own analyses show this, other research in the same field (like Brown et al’s) accounts for that by looking specifically at serotonergic drug effects — and yet they still aggregate unrelated drug types, diluting the effect.
Specifically, their Figure 3, as well as Tables e5-12 in the supplement, shows effect dilution through inappropriate drug class aggregation. This dilution affects the analyses in Figures 2 and 4. This includes their main findings, which are in Figure 2, on the adjusted and HDPS adjusted model lines.
b) Looking directly away from the earlier exposure window of greatest concern. Teratogens (substances that can harm embryos and fetuses) are widely recognized as most potentially toxic in early pregnancy. Suarez et al don’t include “early” pregnancy analyses by class or drug in their article or supplement. Looks like affect dilution again, although we can’t see it as clearly because they didn’t report these analyses.
c) Excluding children of mothers who took antidepressants in both early and late pregnancy from their analyses. This group might be expected to suffer increased risks from increased drug exposure, and so excluding them may have biased estimates toward zero. Again, this would be an invisible effect dilution in comparison to real-world effects.
Baking in effect dilution through questionable analytical choices while taking money from pharmaceutical companies doesn't just look bad. It is questionable science with serious possible consequences. Why should we give pharmaceutical-funded researchers the benefit of the doubt that their work is neutral if they say so, when their own analyses suggest that they knowingly structured their statistics in such a way as to reduce estimates of possible harm from pharmaceutical drugs to unborn children?
Correlation isn’t causation, but we still need to follow the data while using common sense. Much as it’s impossible for any human being to do that from a position of perfect neutrality. Much as intelligent, well-intentioned people can disagree about how to best do that.