USPSTF promotes breastfeeding support interventions despite evidence suggesting inefficacy and possible harm
The U.S. Preventive Services Task Force (USPSTF) recently issued a new Recommendation Statement promoting primary care behavioral counseling to support breastfeeding (“Primary Care Behavioral Counseling Interventions to Support Breastfeeding,” US Preventive Services Task Force Recommendation Statement, JAMA Network, April 8, 2025; h/t José Luis Díaz Rosselló). This guidance lacks evidentiary basis.
As I’ve written previously, a March JAMA Pediatrics review by D’Hollander et al. concluded lactation consultant interventions are “effective.” But this conclusion was not supported by the evidence the authors reported, which established no practically important effects on breastfeeding duration or continuation. Their findings also suggested the interventions may increase child overweight risk by up to 146%. No proven benefit plus substantial possible harm does not justify a blanket recommendation.
Similarly, a February Cochrane review by Lenells et al. found psychosocial breastfeeding interventions may increase the immediate risk of postpartum depression by up to 170% — a risk omitted from the summary and conclusions in a bald act of statistical misinterpretation and confirmation bias about which I had warned Cochrane years prior. (Cochrane censored my follow-up comment when the authors denied without engaging the substance of this criticism.)
Both reviews reflect the same classic pattern of spin science: Researchers with vested interests in promoting the interventions they study misreported both the lack of proven efficacy of any practical importance whatsoever and the possibility of substantial possible harm as if they supported their preferred narrative that the interventions are effective and net beneficial. Consumers of these reviews like USPSTF then appear to uncritically accept their pseudoscientific conclusions. Ideology drives policy with a patina of evidentiary basis that, scratched with a bit of methodological knowledge, falls off.
Pointing this sort of thing out appears to have no effect whatsoever on the workings of the world. And basically no one pays for me to write about this stuff; I have six paid subscribers, if you must know. But maybe it enables some particularly savvy consumers to better defend themselves against potentially harmful nonsense. Or maybe it just helps me feel like I have kept some kind of solid grip on some piece of reality in a world full of lies, broken promises, bad policies, and dead houseplants. (Isn’t the planting half the fun anyway?)
One, mildly reassuring read on what’s going on here is that maybe USPSTF really didn’t bother reading the recent literature. Their recommendation would have already been drafted when these reviews were published. It’s basically the same as their 2016 recommendation. They probably think nothing has changed in what we “know” about this subject, because the causal revolution still hasn’t reached most of infant feeding science yet. So they don’t know that we don’t know, what they think we know: breastfeeding confers no proven benefits to child or maternal health, and apparent associated benefits may reflect instead selection effects (e.g., healthier moms make healthier babies and are better able to lactate).
Reporting on proposed U.S. missile defense “Golden Dome” misses the central problem
President Trump plans to start and finish constructing a “Golden Dome” U.S. missile defense system akin to Israel’s Iron Dome in three years for $175 billion. Critics say the project will take much more time and money, if it’s possible at all.
The program shares a mathematical structure with many others: Mass screenings for low-prevalence problems incur the accuracy-error trade-off, dooming them to fail under common conditions of rarity, persistent uncertainty, and secondary screening harms. One similar such program was the Reagan-Era “Star Wars” U.S. missile defense program, widely mocked for attempting to do the impossible.
Proponents argue deterrence changes the net effects of the system, so such programs can work by deterring attacks even if it doesn’t work by detecting, predicting, assessing, and intercepting incoming missile attacks. Critics counter that the strategic behavior and information effects may work the other way, and that resource reallocation concerns must factor in, too. So such a missile defense system would potentially fail to advance national security on all four possible causal pathways, with harms from misclassification (false positives and false negatives), sabotage and evasion, signaling, and resource misallocation.
In any event, one core issue is statistical, not technological — a point reporting thus far seems to miss. For instance, this recent CBS News video around minute 3 says the issue with the 1980s proposal was technological, and that era’s obstacles have been overcome. But this reflects a typical misunderstanding of shiny new programs of this structure; you can’t escape math with technology.
Another core issue is causal. Overall, it’s not clear that we know how to model equilibrium effects to estimate the strategic behavior and information effects lines of the causal model in which classification accuracy, strategy, information, and resource reallocation all play a role. Without tools to model each of those components, we may not know how to do a net policy effect analysis on a program of this structure. So perhaps we should think hard about whether such programs may backfire before spending potentially trillions of dollars on them.
UK NSC touts NHS AAA screening program effectiveness — define “effective”
Recently, the normally stellar UK National Screening Committee announced “NHS AAA Screening Programme proves its effectiveness” (Harriet Strachan, Phil Gardner and Sophie Mitra, UK National Screening Committee Blog, May 8, 2025). The headline misleads: The program doesn’t prove all-cause mortality reduction, only AAA death reduction.
In Overdiagnosed: Making People Sick in The Pursuit of Health (H. Gilbert Welch, Lisa M. Schwartz, and Steven Woloshin, MDs; Beacon Press, 2011; review), Welch et al suggest there’s no easy answer to the question of abdominal aortic aneurysm screening efficacy (p. 113-5). The screening helps a very small number of men (1.5 men per 1,000 screened over five years). It reduces AAA deaths, but doesn’t provably reduce all-cause mortality according to the then-best available data from four trials.
Moreover, Welch et al write that AAA screening has downsides including requiring a major operation to repair an aneurysm in more cases than the number of lives it saves (suggesting some are clinically unnecessary). It also triggers an ongoing cycle of follow-up testing for 55/1,000 men (overwhelmingly false positives). That false positive identification in turn may have considerable quality of life implications: One man said he was afraid to let his grandchildren sit on his lap.
To know if this screening program is effective, we might want to use the standard many argue is appropriate for such programs: All-cause mortality benefit. If researchers don’t want to use that standard, that’s fine. But they should acknowledge its role in the current discourse all the same. It’s the higher standard to which other experts hold programs of this structure, and it would be useful to know why it’s been rejected here.
If it’s not possible to assess all-cause mortality change from the UK dataset at issue here, which is presumably the case because it’s observational not experimental, then we still might expect experts to say that, and to highlight the absolute risk of AAA death (i.e., acknowledge its rarity).
We would also expect them to acknowledge the costs of the screening in terms of (over)treatment risks such as infection or other complication from surgery, and patient quality of life. Substantial costs to overwhelmingly false positive subjects are a major reason why many programs of this structure are doomed to fail.
Overall, this looks like a methodologically poor-quality assessment from UK NSC in an area, assessing mass screenings for low-prevalence problems, where it normally excels. The author of the assessment itself, Jonothan Earnshaw, relates his long history as an AAA screening proponent in a video presentation on the review before saying “so it’s a particular pleasure to me, as it will be to everyone who’s been involved in the screening program, to find out if it’s all been worth it.”
Earnshaw is the former national director of the NHS AAA Screening Programme. It should have been clear to the NSC that he had a conflict of interest and was not an appropriate party to conduct this review. NSC should in the future seek to clearly identify such conflicts, assign an independent reviewer to conduct future such program efficacy assessments, and publish an updated AAA screening program efficacy assessment by such a neutral expert.
Medication (aka medical or chemical) abortion associated with substantial possible risks, recent report claims
There’s a safety signal in medication abortion data that pro-choice activists would suppress, so researchers had to put it out in policy paper instead of peer-reviewed article form. So say Jamie Bryan Hall and Ryan T. Anderson writing for the EPPC (Ethics & Public Policy Center) in an FAQ published this week on the Center’s website (“Frequently Asked Questions About the Largest Study on Chemical Abortion,” May 7, 2025).
This argument has face plausibility. Last winter, there was a volley of retractions of abortion papers that critics charged was politically motivated and without valid basis. No invalidating errors were reported. That was only the latest battle in a long-running war in science between mostly victorious pro-choice activists and mostly defeated pro-lifers who dissent from the preferred narrative of powerful sociopolitical networks that abortion is safe and good for women.
Part of the legacy of that war is that some good dissenting science doesn’t get published. See, e.g., Geneviève Bloch-Torrico’s 2014 psychology doctoral dissertation (University of Montréal) finding abortion is associated with statistically and substantially increased risk of completed or attempted suicide, though we don’t know if the link is causal. (And my fuller, more recent review situating it in the literature.)
And the bad pro-choice science that does get published, gets held to a considerably lower standard than its pro-life analogues. See, for example:
Three prominent examples of expert misrepresentation in that review: the 2008 Report of the APA Task Force “Mental Health and Abortion,” the 2018 NAS Consensus Study Report “The Safety and Quality of Abortion Care in the United States” (highlights, full book) and Steinberg’s 2019 Lancet Psychiatry article.
Zandberg et al’s widely reported, blatantly p-hacked abortion restrictions-suicide finding.
Rampant statistical significant testing misuse in misinterpreting the famous Turnaway study data as showing abortion doesn’t increase depression risk.
More statistical significance testing misuse in claiming abortion with pills is as safe as surgical abortion when it may carrier far higher risks.
McIntosh and Vitale’s 2023 research integrity article on abortion science that itself lacked accuracy and integrity, resulting in a correction when it should have resulted in a retraction.
The people who get the mic in science here are pro-choice activists and abortion providers with vested financial, psychological, social, and professional interests in promoting abortion, denying evidence that it may harm women (including their own patients and subjects), and legitimating their actions through arguments which sometimes go under the name of science but are not very scientific.
So it’s not crazy to just do a good analysis of this and put it on the Internet. Scientific publishing is dying anyway. And good riddance. But that’s a topic for a different post.
Back to this new report: Is it good? Is it right? How strong is the claimed safety signal, how bad is the possible harm, and what sources of bias are in play, pushing/pulling in what directions?
Hall & Anderson find 10.93% of women who use mifepristone for abortion experience a serious adverse event within 45 days. So whether there is a safety signal here depends on what those events are. Only around 1% are infection and around 3% are hemorrhage. The modal event is ER visit related to the abortion (4.73%).
Pro-choice researchers look at this data and say: There is no safety signal, because the events are not really serious ones. But pro-life researchers look at it and say: There is a safety signal, because the women are saying there is one by going to the ER this frequently, no matter for what abortion-related reason. It’s related, and they’re going. That’s a statement.
One possible source of bias in this data is that ER visits are horrible. They’re time-consuming, expensive, and you only go if you think you might be dying or someone makes you (or maybe that’s just me). And if you have to go because you had an abortion, you might rather die than tell a bunch of strangers that (and some women have). Not everyone will feel this way, but stigma, distress, and legal-ethical concerns probably generate an under-reporting bias if we think of going to the ER as the woman reporting a serious adverse event in this dataset. This possible bias would tend to skew the numbers lower than they would otherwise be, weakening the safety signal.
Another possible source of bias is that abortion providers have incentives to advise patients to not go to the ER. They will probably be fine. If they are experiencing heavy bleeding or bleeding for a long time (both more common in medical than surgical abortions), it is probably not a hemorrhage. If they’re in a lot of pain (more likely for medical abortions than surgical ones, and more likely in medical abortions the later the pregnancy’s gestational age), it’s probably not signaling a dangerous medical problem.
So probably more women concerned about the most likely post-abortion problems, bleeding and pain, would have gone to the ER with what would have been logged as serious adverse events, if the same people they’re likely to consult with about this choice weren’t also the same people who prescribed them the medication in the first place. Again, this possible bias would tend to skew the numbers down, weakening the signal.
This is not to say that the pro-life researchers are right. But rather that their dataset appears likely to underestimate the true safety signal if there is one due to these biases. To deny that is to argue that the women are wrong.
This puts pro-choice researchers in the ironic position of asserting that women who go to the ER because they are having what they deem to be a serious adverse post-abortion event, are wrong. This does not look like believing, supporting, or empowering women, to me.
On the other hand, there are also possible sources of bias here that would work the other way, distorting what could be noise or a small safety signal into a false-positive safety signal, or a bigger one than really exists. The obvious one is the interpreter bias of pro-life researchers analyzing exegetical data to tell a story that conforms with their prior expectations (confirmation bias).
No one is neutral; but Hall & Anderson aren’t even trying. Or rather, they don’t see policy-neutrality as relevant to their role as analysts. That’s silly, because scientific evidence tends to be complex and involve interpretation of ambiguous realities. Bias in research matters.
So how do we mediate interpretive differences between pro-choice and pro-life analysts? It would be interesting to see survey responses from a representative sample of women who had medication abortions, and ask about their complications and how they experienced them. I can’t think of a better way to mediate between these two factions in relation to this sort of data, especially if we consider that it should be the women themselves and not the warring factions with predetermined policy interests and aligned sociopolitical biases who get to tell us how to interpret the evidence here.
Lacking that kind of evidence, going to the ER is a direct enough report from the women that something serious went wrong for them, to take this analysis seriously as possibly indicating a safety signal in the medical abortion data. But it could be better.
Here are a few ways the report’s accuracy and neutrality could be improved:
Some of the language is value-laden. For instance, “Women deserve better than the abortion pill.” This language could be excised to let the reader draw his own conclusions based on the evidence presented. That said, it’s no more policy-oriented and biased toward its preferred narrative in this regard than leading pro-choice researchers’ recent scientific articles and leading medical journals’ usual editorials and news stories framing abortion as medical care or a human right.
It should cite as examples modal instead of only the most extreme adverse events. For instance: “10.93 percent of women experience sepsis, infection, hemorrhaging, or another serious adverse event within 45 days following a mifepristone abortion.” Figure 1 suggests sepsis occurs in .10% of cases. So there’s a two orders of magnitude difference between how often this particular, worst-case serious adverse event occurs, and how often any serious adverse event occurs. That makes leading with it, and citing similarly lower-probability but higher-impact events, but not modal ones, misleading. It’s not clear from the report, but it should probably be corrected to read “10.93 percent of women experience so much more pain and bleeding than they expect, that they visit the ER within 45 days of a mifepristone abortion; rarely, women experience more serious adverse events like sepsis (.10%), infection (1.34%), or hemorrhaging (3.31%).”
Nowhere are pain and length of bleeding (modal medical abortion problems) mentioned. These are points where women might go to the ER thinking something is horribly wrong, but pro-choice experts might dismiss them saying they’re not experiencing serious adverse events. ERs might treat for those symptoms (e.g., with pain medication or additional misoprostol administration) — making them fall under the biggest, vaguest categories in Figure 1 — ER visit (related to the abortion; 4.74%), and other abortion-specific complications (5.68%).
The authors could acknowledge that other researchers analyzing the same dataset could make different, defensible analytical choices, resulting in different conclusions. Otherwise, they’ve set themselves up for a counter-attack framed as a replication failure. But we might expect different analysts making different, defensible choices to get different results in an ordinary dataset, and for this problem to be compounded in a dataset dealing with an emotive, hyperpolarized topic like abortion.
So is there a safety signal in de-identified patient data on medical abortion? It depends on how you define “safety signal.” Arguably, if you believe what women are saying when they walk into an ER, the answer based on this study is yes. Whether researchers with different policy views who make different analytical choices replicate it or not. Now someone should make the good-faith survey research effort to ask the women more about what happened.
“Avoidable costs” for whom?
Rummaging around on the TU Berlin website, I ran across this terrible study purporting to measure avoidable costs in healthcare: “Quantifying Low-Value Care in Germany: An Observational Study Using Statutory Health Insurance Data From 2018 to 2021,” Hildebrandt et al, Value in Health. The authors report “The 3 indicators—antibiotics for RTIs [respiratory tract infections], free T3/T4 level testing for hypothyroidism, and benzodiazepines for older persons—account for 82% (broad) of all low-value services measured in this analysis (narrow 75%).” But they don’t entertain counter-arguments for the fuller thyroid assessment they characterize as an avoidable cost. Maybe sometimes it is, and sometimes it isn’t. In which case, if we must dichotomize, it isn’t.
Measuring only TSH (thyroid stimulating hormone) as the authors propose to monitor patients already diagnosed hypothyroid is insufficient to support patient quality of life or health. There’s a big range of “normal” T3 and T4. One patient might require a higher dose than another to function normally: sleeping, temperature-regulating, thinking clearly, pooping. You wouldn’t know how much leeway you have as a clinician to titrate the dosage without the free thyroid hormone information from the fuller (standard) assessment.
If clinicians don’t use the information from the standard assessment to actively make patients’ quality of life better: shame on them. Don’t blame the protocol, blame the practitioner.
Moreover, a range of conditions can cause inadequate conversion to T3, requiring further investigation and possibly different treatment. If someone complained of hypothyroid symptoms (e.g., fatigue, cognitive fog) but had normal TSH and T4, you would still need to check T3 to see if they had a conversion problem requiring further diagnostics. These symptoms are vague enough that it’s probably easier (and more reliable) to check all three typical thyroid function tests in known hypothyroid patients, than it is to run through questions about the possible symptoms every time you see a patient with at least one complex, chronic condition like hypothyroidism.
Limiting thyroid assessments to TSH as the authors propose would also risk permanently damaging fetuses exposed in early pregnancy by resulting in under-medication of women whose diagnosed hypothyroidism recurs or worsens during pregnancy (as it often does). Fuller assessment can catch trends in thyroid function sooner, and point to better diagnostic and treatment strategies, improving patient quality of life and, in some cases, life-or-death outcomes.
Improperly managed hypothyroidism in pregnancy can cause problems including miscarriage and lower IQ in offspring. Early treatment makes a difference for child IQ, while evidence doesn’t establish that later treatment does. Thus, it seems maximally important to diagnose and fully treat hypothyroidism as early as possible in pregnancy, including before many women even know they’re pregnant. Checking only TSH would unnecessarily lengthen the diagnosis and treatment process.
The first author of the associated PLOS One paper, Carolina Pioch, has previously worked for Techniker Krankenkasse (German public health insurance; “Selecting indicators for the measurement of low-value care using German claims data: A three-round modified Delphi panel,” Feb. 2025). The senior author of both papers, Verena Vogt, holds an endowed professorship of the National Association of Statutory Health Insurance Physicians. It appears that both of these authors may have incentives that align with the cost-saving interests of insurers like TK.
It would certainly save TK a lot of money if doctors stopped checking free T3 and T4 along with TSH. It’s less clear what it would, in turn, cost patients and their families.
If your goal is to get a(nother) job with the statutory health insurance company, this is a fine paper. If it’s to promote patient quality of life and health, or (for some irrational reason) do good science, it’s terrible. It disproportionately devalues women’s quality of life, given then gendered prevalence of hypothyroidism. If TK stops covering standard thyroid measurement on the basis of this study, it will save a lot of money and hurt a lot of women who can’t do anything about it.
Science is culture
These are all recent examples of biases and perverse incentives shaping science and scientific discourse. We can defend ourselves against spin science by considering the source, digging into the evidence, and identifying where interpretation and data don’t match.
But no one has time to do this for every topic you need to know something about to make an important life, safety, or health choice for you or your loved ones. We have to delegate some trust somewhere. And yet, almost no one pays for science criticism, even though everyone has an interest in it.
It’s no one’s job to check the causal diagrams (or just notice when they’re still missing), audit the trade-offs, or ask who benefits from the usual distortions. But if no one does, policy becomes a cargo cult: dressed up in data, empty of wisdom. The cloak of evidentiary validity goes to the powerful, or to the highest bidder.
So what would better science look like? The usual answers point to transparency in disclosing potential conflicts of interest, emulating target trials, diagramming causality before running statistical analyses, and pre-registering study/analysis plans. But these measures don’t address the pervasive, inescapable nature of bias and perverse incentives.
As Doug Altman famously wrote: “We need less research, better research, and research done for the right reasons.”
The contemporary corollary might be: We need researchers who know no one is neutral. But who are trying anyway.
1. Vera, maybe you could write a post about why (and when) all-cause mortality is the ideal measure for these sorts of studies. I see a hint of the answer in John Best's comment and in your post, but it's still elusive to the lay person! If an intervention helps a life-or-death condition but doesn't reduce all-cause mortality, wouldn't we have to argue that the intervention made death from another cause more likely than it would have been without the intervention? (Asked with genuine confusion.)
2. I appreciate how well you keep up with abortion studies, and how you try to tease out the science from the politics. Thank you for keeping the topic in your spotlight.
3. I was listening to Jay Bhattacharya on the Uncommon Knowledge podcast today (the new director of the National Institutes of Health in the U.S.), and I thought he had smart thoughts related to your question about how to make science better. It had to do with the current lack of reproducibility, and how to encourage scientists to do replication studies--i.e. how to encourage this both financially and in terms of career advancement.
and for some, severe mental health ramifications which are difficult to factor in. if i had a chronic disease that may or may not end up killing me, it would plague my mind for the rest of my life, i would struggle to enjoy what time i had left, and i might not even die from it. even if i get it treated, i die from something else around about the same time. so all that anguish and expense wouldn't be worth it.
yes, there are a few anecdotes of people whose life was saved but there are many more who suffered from over-treatment. plus, even if you were saved from X, you might not have extended your life as Y would've killed you anyway, this is definitely true on average. the years gained for all that expense and suffering isn't substantial.
until treatments for these diseases become safer and better, in general, i don't think most screening is worth it.
(i think in general we struggle with the fact that we actually do not have as much control over our lifespans or health as we'd like. even with modern medicine our power is still very limited when it comes to certain diseases.)