Sociopolitical limits of U.S. research on guns and drugs
Leading British epidemiologist and Cochrane Collaboration author/editor Tom Jefferson proposed on the excellent Trust the Evidence that someone should conduct “a large, well-funded case-control” study on firearm shootings and psychotropic drugs in minors.
Perhaps Dr. Jefferson is unaware of the Dickey Amendment — which has, since 1997, prohibited federally funded research from advocating gun control. Originally, it covered only CDC funding, but in 2012 was extended to the NIH. This strongly discourages U.S. research into topics such as the one Jefferson proposes — raising the question of who would fund such a study, given that the major candidate public health institutions are effectively barred from doing so.
Congress’s now-routine political suppression of scientific research likely costs lives. The lives it costs are likely disproportionately black, brown, and poor. Yet U.S. academics have not taken to the streets in large numbers to protest this. It is an accepted part of the contemporary American sociopolitical landscape.
They have, however, recently protested in large numbers over diversity research cuts. To some, this looks corrupt. Venal. Self-interested.
We could try to tell a kinder story in which it’s merely stupid. “This is water” and all.
Not everyone is inclined to bend that way. As Vinay Prasad recently noted at Sensible Medicine:
there is no credible evidence that funding upper class and upper middle class academics to study diversity has helped poor black kids writ large. It has however given a lot of academics something to be smugly morally righteous about. It's like giving Generals more money to study the rations of soldiers and then watching the Generals get private drivers and chefs and the soldiers eat the same canned sardines. — “Is more science better than less science?” (April 9, 2025)
Except, in this canteen, friendly fire sometimes kills soldiers and civilians alike. We’re not allowed to ask questions about when or why. Instead, Generals respond by commissioning a new study on the psychological effects of fish oil. When a new President cuts its funding, its investigators protest. Intermittent friendly fire continues between fish casseroles.
UK experts promote bowel cancer screening — but are they recommending the right tests?
The UK National Screening Committee’s news on medical mass screenings for low-prevalence problems normally aligns with my understanding of the evidence. But today I got something that surprised me: an email linking to a blog post entitled “Wealth of evidence highlights the benefits of bowel cancer screening.”
Didn’t I just write about the latest evidence on the general (in)effectiveness of mass cancer screenings? Yes: twice. But what I wrote was incomplete…
The most up-to-date evidence comes from a recent meta-analysis (Bretthauer et al, JAMA, 2024) finding what many previous such analyses also found: that the data don’t establish a mortality benefit from most such screenings, casting doubt on their utility. What’s going on here?
On colorectal cancer screenings, Bretthauer et al looked at colonoscopy, sigmoidoscopy, and fecal occult blood testing (FOBT). They found:
The only screening test with a significant lifetime gain was sigmoidoscopy (110 days; 95% CI, 0-274 days). There was no significant difference following… colonoscopy (37 days; 95% CI, -146 to 146 days) or FOBT screening every year or every other year (0 days; 95% CI, -70.7 to 70.7 days)… The findings of this meta-analysis suggest that current evidence does not substantiate the claim that common cancer screening tests save lives by extending lifetime, except possibly for colorectal cancer screening with sigmoidoscopy.
The important takeaway here is that there are three possible colorectal cancer tests. Two out of three may not work, or may increase or decrease mortality by a practically unimportant measure. So, surely, the NHS recommends the one test that demonstrably lowers colorectal cancer mortality according to high-quality evidence?
Nope. Instead, the current NHS screening program includes sending out a fecal blood test kit — but the one they send out is a newer type. The faecal immunochemical test or FIT kit is more sensitive and specific. There is evidence, for instance from this 2024 nested case-control study, that FIT screening substantially reduces colorectal cancer mortality from left-sided colorectal cancers (95% CI .48-.71).
It appears Bretthauer et al examined data for an outdated fecal blood testing method, and then reported that it was ineffective. That seems like a misleading oversimplification that could cost lives.
Maybe, as usual, the NSC has it right: according to a 2015 recommendation, the NHS sends out a FIT kit and recommends follow-up colonoscopy. On one hand, we don’t have randomized trial evidence showing this yields a big mortality reduction. On the other hand, it’s non-invasive, maximizes participation, and makes sense: a suspicious test result can be followed up on with a secondary screening to disambiguate possible problems.
That said, one could argue that we’d prefer to see high-quality mortality reduction evidence before doing mass screening plus invasive follow-up testing on large numbers of people who are probably fine. We only have this evidence for colorectal cancer screening with sigmoidoscopy, not fecal blood testing or colonoscopy. John Mandrola and Vinay Prasad had an excellent piece at Sensible Medicine dealing with this in the context of the NordICC trial, which established that a modest mortality benefit from colonoscopy screening invitation was possible but not certain (“Screening Colonoscopy Misses the Mark in its First Real Test,” 2022).
Specifically, the NordICC trial investigators reported:
In intention-to-screen analyses, the risk of colorectal cancer at 10 years was 0.98% in the invited group and 1.20% in the usual-care group, a risk reduction of 18% (risk ratio, 0.82; 95% confidence interval [CI], 0.70 to 0.93). The risk of death from colorectal cancer was 0.28% in the invited group and 0.31% in the usual-care group (risk ratio, 0.90; 95% CI, 0.64 to 1.16). The number needed to invite to undergo screening to prevent one case of colorectal cancer was 455 (95% CI, 270 to 1429). The risk of death from any cause was 11.03% in the invited group and 11.04% in the usual-care group (risk ratio, 0.99; 95% CI, 0.96 to 1.04)… “the risk of colorectal cancer at 10 years was lower among participants who were invited to undergo screening colonoscopy than among those who were assigned to no screening” (Bretthauer et al, “Effect of Colonoscopy Screening on Risks of Colorectal Cancer and Related Death,” NEJM 2022).
In other words, Bretthauer 2022 was the expert on the evidence he and his colleagues went on to review in 2024. Is it strange that he seems to have changed his focus from emphasizing lower colorectal cancer risk associated with screening, to emphasizing no established mortality benefit from screening colonoscopy?
Maybe not. Maybe he just thought about it some more and wanted to see the net mortality effect numbers on a bunch of similarly structured programs. But why then not disambiguate the fecal blood tests in the more recent analysis, to highlight the practically important uncertainty about whether population-level FIT + colonoscopy does indeed make sense (in terms of increasing lifespan or decreasing colorectal cancer, cancer, and/or all-cause deaths)?
I don’t know. But this seems like a good example of a case study of a mass screening for a low-prevalence problem where the appropriate population-level policy, based on the best available evidence, is relatively complex and ambiguous, requiring interpretation and judgment. We can’t just measure even a simple outcome like deaths in the context of colorectal cancer screening policies and say what makes sense (apparently).
Life and death are binary. Health doesn’t exist on a continuum with some other value we also want (unlike security and liberty). If the evidence is still exegetical in this sort of context, how can we ever expect to do net cost-benefits analyses for anything? If we can’t, what does evidence-based policy even mean?
Cochrane breastfeeding-depression review authors respond to my critique, repeating their mistakes and doubling down
As my last links post noted and I explained in shorter form on X, two recent reviews of breastfeeding support intervention report evidence suggesting the interventions may be practically ineffective — or even harmful. Yet, both reviews misinterpret their core statistical evidence as suggesting that the interventions are effective and beneficial. This reflects confirmation bias in the expected direction of the dominant infant feeding paradigm, “breast is best.”
Under Cochrane’s unusually transparent process, I was able to post a comment explaining the mistakes on the Cochrane review (Lenells et al), which I had blogged about here, after warning Cochrane about the pre-registered review’s bias two years ago.
The authors posted their required response yesterday. It fails to meet basic standards of logical reasoning, transparency, and accountability.
The study’s last author, University of Toronto Nursing Professor, and leading “breast is best” proponent Cindy-Lee Dennis replied:
First, there was no misinterpretation of the core findings… There was no hiding of possible harm… there was no confirmation bias… we did not assume a one-way causal link… we took heed of all the reviewers’ comments… we agree that mothers deserve evidence-based care and full information. Thus, we took the time and energy to rigorously assess the effect (benefits and harms) of breastfeeding support interventions, in comparison to standard perinatal care, on postpartum depression – the most frequent form of maternal morbidity following childbirth. To give mothers better evidence-based care, we require larger-scale studies.
This reply completely fails to recognize the problems with the review that I raised. It reads like accountability theater — a process that checked the boxes, without addressing the critique’s substance. Giving the authors the benefit of the doubt here means assuming that is because they did not understand the criticisms, and explaining again in greater detail what went wrong and how to fix it.
As I wrote previously, Lenells et al’s reported evidence is consistent with substantial possible immediate increase in postpartum depression risk associated with breastfeeding interventions. This risk should lead researchers to question the wisdom of the interventions — the opposite of calling for conducting “larger-scale studies” as Dennis does in her reply.
In other words, the authors doubled down instead of correcting their mistakes. It doesn’t make sense to call for testing your preferred intervention on a larger population if your evidence actually suggests that it may backfire, causing exactly what you want to prevent. So what’s going on here?
Lenells et al misused statistical significance tests, a common methodological mistake against which scientists have risen up en masse in the name of reform to improve science and protect the public interest (see “Scientists rise up against statistical significance,” Nature, by Valentin Amrhein, Sander Greenland, Blake McShane and more than 800 signatories, Comment, March 2019). My original comment cited one of the most prominent academic journal articles explaining this problem (which is a matter of inappropriate dichotomization of ambiguous results) and its remedy (fully interpreting the 95% compatibility intervals to estimate possible effects in terms of lower and upper bounds).
Correcting this mistake would mean, for instance, writing that the reviewed interventions may have immediately reduced postpartum depression risk by up to 77% — or increased it by up to 170% — and acknowledging this substantial possible risk increase in the Summary, Background, and Conclusion sections, instead of acknowledging only possible benefits. This selective presentation of ambiguous evidence as only showing possible benefit and not also showing substantial possible harm misleads readers. The direction in which it misleads is not random.
Rather, this continued misinterpretation remains consistent with the confirmation bias I warned about two years ago, and that one would expect from researchers like Dennis who have vested reputational and other interests in “breast is best” ideology. Dennis famously wrote a thesis developing a scale on breastfeeding self-efficacy. This measure assumes that women’s perceptions of their breastfeeding ability are related to psychological factors such as stress, social feedback, and confidence, and that these psychosocially influenced perceptions are a more important determinant of clinically important infant feeding outcomes than women’s actual breastfeeding ability. Dennis has since published extensively on breastfeeding under the operative assumption that “breast is best.” Her oeuvre recognizes neither the evidence for common and preventable harm to neonates from breastfeeding insufficiencies, nor the possible implications of such harm — and, by extension, current infant feeding guidance — for maternal mental health.
This is not to say that Dennis is aware of her confirmation bias, and repeating her methodological mistakes while denying it anyway. Rather, it would be consistent with what we know about such bias for her perceptions of the evidence to be distorted by her prior beliefs and incentives, and for her awareness of that distortion to itself be biased (limited).
To be clear, this isn’t a personal indictment. It’s reflection on a pervasive human problem. Strong beliefs + professional incentives + ambiguous data = a perfect recipe for unintentional bias.
But this correction isn’t just about statistical significance testing misuse and the common cognitive distortions it frequently serves. It’s also about more than the resultant dangerous science (mis)communication that may harm vulnerable patients while promoting its proponents’ financial and professional interests. It’s about the so-called science crisis…
This is just one in a pair of the most recent bad breastfeeding intervention reviews that I recently critiqued. As such, it’s one more case study in how difficult (sometimes impossible) it is to correct the scientific record, even when it’s demonstrably wrong and clear mechanisms for correction exist. Another recent links post had another example of this sort of problem, dealing with bogus claims of established racial bias in polygraphy. Another publication wouldn’t even allow an online comment on its blatantly p-hacked abortion-suicide article that media outlets loved — and declined to correct their coverage of even after I explained why it was at best demonstrably wrong, and at worst fraudulent. I’ve heard from other scientists, high and low, who’ve similarly not been able to get the record corrected.
So science is not self-correcting, it’s no secret. This raises bigger questions:
Where’s the editorial oversight? Why did Cochrane publish this non-response? The usual answer is: scientific journal editors don’t do their jobs when it comes to correcting methods mistakes, because they’re incentivized against retractions. But in this case, the necessary correction to the online review would appear to be relatively low-cost or even cost-free. It would just take a little time to read, write, and result in making the authors unhappy. So what? Why not correct the record? Did no one read and understand the correction, or what? It doesn’t really seem plausible that Cochrane editors aren’t up on statistical significance test interpretation. What’s going on here?
What happens when accountability systems fail? Should I try to trigger this one again? I have young kids who need me, and my time to read, think, and write is scarce and precious. If I spend some more of my finite time on this earth reiterating the substance of my previous comment with as much patience and goodwill as I can muster, should I expect it to trigger more meaningful accountability than it did the first time? Why?
Maybe the right “lesson” here is that blogging is a better investment than fancier publication, even though I’ve been beating myself up again lately for just posting here and moving on. Then it doesn’t “count” for me professionally in any meaningful way (it’s not a publication credit, least of all an academic one). It just makes me happy to read, write, make the communicative effort, and keep doing that, getting better along the way.
People are going to be wrong whether I correct them or not, and this makes me unhappy if I think I should “fix” it. But I guess someone being wrong on the Internet isn’t my problem? Maybe the most important thing is to stay curious, keep learning in public, and try to help others think more clearly. (What they do with that good-faith effort is their prerogative.)Has the Cochrane heuristic lost its value? Cochrane — once considered the gold standard in evidence-based medicine — is facing a credibility crisis. Since the expulsion of co-founder and leading medical methodologist Peter Gøtzsche, its reputation for rigor and transparency have been tarnished. Science doesn’t need to be perfect, but it does need to be self-correcting. If Cochrane isn’t willing and able to correct basic methodological mistakes that change the meaning of the evidence it reviews, the decisions that evidence supports, and could impact relevant health outcomes, then it’s no longer what it claims to be on three out of three counts — “Trusted Evidence. Informed Decisions. Better Health.”
One of the great things about having little kids is that they give me perspective on this kind of thing. Cochrane is wrong. Science is broken. I’m right, it matters, and, apparently, rational, real-world incentives are all wrong for fixing this sort of thing. I could get really exercised about all this.
But, at the end of the day, all I really care about is smelling my little baby’s head and putting on my four-year-old’s latest favorite tunes for a dance party. Who knew? I have a new superpower as a mom. And it’s, ironically, caring less about the problems of the world, about which I have sometimes cared too deeply to be an effective scientist and writer. This enables me to have the perspective to just say my piece and move on. Life is short, and my tykes are shorter.
Can we please have an AI auto-response to bad media coverage of mass medical screenings for low-prevalence problems?
Lately, I often wonder how much of my health and methods critiques could be automated to save time and aggravation (see above), particularly when it comes to my main interest in mass screenings for low-prevalence problems. Recent coverage does not fail to stoke this desire. The original headline, which seems to have been changed pre-publication, but still lives on in the URL (a common editorial slip), was “Should you get full-body MRI scan?” (“A Full-Body MRI Scan Could Save Your Life. Or Ruin It,” Time Magazine, Matt Fuchs, April 9, 2025; h/t Paul Maidowski).
The answer is “No.”
The Time coverage compares remarkably poorly to the gold-standard Harding Center Fact Boxes on similarly structured screenings (e.g., mammography for early breast cancer detection, PSA testing for early prostate cancer detection, and non-invasive prenatal testing for Down’s syndrome and other trisomies). I have recently been brainstorming how to improve this risk communication model, but it remains head and shoulders above the sort of standard coverage Time offers here: Giving people frequency-format outcomes — as decision science shows improves their Bayesian statistical intuitions — and explaining in those terms what the possible costs and benefits are for average people, should be but is not the norm.
Instead, Time emphasizes the “black swan” possibility that the scan might save your life, with no recognition whatsoever of the possible serious physical harms of secondary screenings (e.g., infection or chronic pain from resultant surgery) that were overwhelmingly likely medically unnecessary. So the average reader would have no idea that full-body MRI scans’ net effect is overwhelmingly likely to be overdiagnosis, overtreatment — and resultant, serious, and fully preventable harm to a large number of people.
Instead, the problem of iatrogenesis is framed as a psychological one — that having a suspicious result might make people worry for a while. While the possibility of the screening saving your life is personalized with the only anecdote in the essay (about a guy whose life it saved). This is medical journalism at its worst.
Unnecessary medical testing often produces unnecessary, invasive further testing or intervention — sometimes causing irreversible damage. By learning to recognize common cognitive distortions such as the base rate bias that are implicated by this common mathematical structure of mass screenings for low-prevalence problems, people can actually better protect their health (and that of their loved ones) than through that excessive testing and intervention.
Teaching people to recognize the dangers in screenings of this structure might do more for public health than trying to correct every piece of bad coverage. Though many admirable efforts have been made in this vein (e.g., Dashiell Young-Saver’s Bayes’ Rule lesson plan in the NYT about prenatal diagnostics). And many other considerations affect such programs (e.g., test iterativity, qualitative use differences, and multiple causal mechanisms).