More research suggests possible birth defects risk of Covid vaccination in pregnancy
Consistent with previous findings, a new study by Sharma et al and the CDC COVID‐19 Vaccine Pregnancy Registry team reports substantial possible birth defects risk associated with Covid vaccination in pregnancy (“COVID‐19 Vaccination During Pregnancy and Birth Defects: Results From the CDC COVID‐19 Vaccine Pregnancy Registry, United States 2021–2022,” Birth Defects Res, May 2025, 117(5):e2474).
We wouldn’t expect to see a statistically significant effect due to power problems: birth defects are individually uncommon. With only 5209 participants vaccinated in the first trimester, three different vaccines, and 94 different types of birth defects, we would expect to see a practically important safety signal clouded by uncertainty about whether it really exists, or not. These findings are consistent with that possible safety signal.
Table 2 reports major birth defects by organ system. Many of the compatibility intervals are wide and include the possibility of no effect, prevalence decrease, and prevalence increase. There’s noise here, and we have to think about causality: There’s no reason to expect Covid vaccines to decrease birth defects, and plenty of reasons to suspect that such effects probably reflect random noise plus healthy user bias. Plus, some of the birth defects that vaccination appears to potentially increase (e.g., hip dysplasia) don’t make obvious causal sense; hip dysplasia commonly results from a relatively big baby being stuck in a relatively small womb. So in this case, we might suspect healthy-user-with-tall-daddy bias.
More to the point, this is a terrible way to present these results. It makes the reader do the work, line by line, of comparing 95% CIs for the vaccinated population versus unvaccinated estimates. This comparison should be done for us. It’s the operative one of what the possible birth defect risk increase is.
To take just one example, looking for a group with larger numbers (because they’re easier to interpret) and practical importance, there were 145 cases of congenital heart defects among selected participants. The biggest subgroup numbers here are 26 atrial septal defects and 50 ventricular septal defects, both of which can require open heart surgery. There’s also some vague sort of causal face plausibility here; Covid vaccines incur cardiac risks for adults, so maybe they do for fetuses, too? Congenital heart defects are also the most common group of birth defects, so something that causes birth defects generally will likely cause congenital heart defects specifically.
We don’t know from this evidence if vaccines were associated with increased risks for these birth defects, or not. Extrapolating out from the relatively small study population to show 95% CIs in comprehensible units (a defensible analytic choice), Sharma et al estimate the atrial septal defect prevalence per hypothetical 10,000 in the vaccinated population to be 8.6-19.2, versus 10.6-12.4 in the general population. So there could be a decrease, no effect, or a practically important increase in the risk. Similarly for ventricular septal defects, the 95% CIs include 22.6-38.3 out of 10,000 vaccinated versus 35.9-39.2 out of 10,000 unvaccinated. Again, vaccination could confer a decreased risk (but we don’t have a causal story explaining why), no effect, or increased risk (e.g., from 36 to 38 — a small difference, but one that matters if it’s your baby who needs open heart surgery).
Sharma et al acknowledge the importance of incorporating timing as well as statistical power limitations:
Birth defects included in the timing analysis had to meet the following criteria: (1) at least four cases were reported in Group 1 [early pregnancy] as this was the minimum number of cases required to have sufficient statistical power to detect a significant increase in risk from Group 3 [later pregnancy] with 95% confidence, and (2) the origin of the birth defect is established to occur during the period of embryogenesis (first 8 weeks after conception). Mean gestational age at first vaccination conferring registry eligibility was 2.7 (SD 4.4), 12.1 (2.0), and 25.3 (6.8) for Group 1, Group 2, and Group 3, respectively.
Though they didn’t do this sort of comparison for the reader for all defects as arguably should have been done, the authors did it for this well-defined selection in Table 3. This makes it easier to see, for instance, that when women were vaccinated early versus later in pregnancy, their babies’ risk of birth defects including orofacial clefts, congenital heart defects, gastrointestinal defects, and genitourinary defects may have substantially increased. The comparator, however, has changed from vaccinated versus unvaccinated (Table 2) to vaccinated earlier versus later in pregnancy (Table 3); we would expect this to dilute any possible increased risk from vaccination. Another dilution effect may come from lumping in women who were vaccinated before 8 weeks of pregnancy (the riskiest period for birth defects) with women who were vaccinated before 14 weeks. And again, despite these analytical choices which would be expected to skew the results in the direction of finding no effect even if there is one, the table reflects uncertain but substantial possible risks.
There are different valid ways to interpret and act on this uncertainty. For instance, we might want to ask and estimate what the risks are of Covid infection versus vaccination.
But some people are more willing and able to isolate than others, making that an incorrect comparison for them. The analytical choices embedded in doing a net effect analysis here are complex, and the evidence with which to generate hypothetical estimates is limited.
Public health institutions can’t simply hand down ambiguous evidence in full to doctors and expect them to discuss research like this in detail with their patients. That’s not how medical encounters work.
So institutions tend to instead make general recommendations in line with evidence like this to minimize harm. When it comes to vulnerable groups like pregnant women and fetuses, the global norm is to err on the side of not intervening when it may incur net damages and we don’t know. That means changing U.S. Covid vaccination norms from boosting pregnant women, to not.
The authors, however, downplay these implications of their findings. This expected spin rises to the level of frank misinterpretation when they write “Prevalences did not differ by the timing of vaccination for seven defects examined.” The issue is that early pregnancy is the riskiest time for birth defects, and early pregnancy vaccination may indeed have produced elevated risks in comparison with later pregnancy vaccination according to their analysis.
A corrected statement would read: “Prevalences may have differed by vaccination timing, but due to limited statistical power, we don’t know.” For instance, there were 4 cases of gastrointestinal defects (esophageal atresia or stenosis) in the early vaccination group compared to 2 in the later. Table 3 reports this translates to a .90-26.83 adjusted prevalence ratio (95% CI). In other words, there may be a small protective effect of vaccination (though we have no causal story for this), no effect, or a substantial increased risk. There are so few cases that we can’t tell.
In some places, Sharma et al get it closer to right, as when they conclude “While there was no strong evidence of associations between vaccination and specific defects, statistical power was low.” But again here, a more technically accurate statement could be made about these findings: “This evidence is consistent with possible risk decreases, no effect, or substantial risk increases, and we don’t have enough data to know.” The emphasis should arguably be on the substantial possible risk increase, not the lack of conclusive evidence, if we want to be able to act on the best available evidence to do no harm.
Building on a safety foundation of sand in maternal RSV vaccination in pregnancy
The latest effort to justify RSV vaccination in pregnancy presents results from a small prospective cohort study lacking randomization, evaluating the widely critiqued proxy of antibody levels — not clinical outcomes, and dismissing paramount safety and comparative efficacy concerns without evidence.
Jasset et al claim: “Information about how maternal RSV vaccine timing impacts transplacental transfer is particularly important in light of recent real-world safety data suggesting no clear association between Abrysvo vaccine administration and preterm birth17” (“Enhanced placental antibody transfer efficiency with longer interval between maternal respiratory syncytial virus vaccination and birth,” Am J Obstet Gynecol, June 2025; 232(6):554.e1-554.e15).
The cited study by Son et al misinterpreted its statistical significance test results. As I wrote previously, Son et al’s adjusted compatibility intervals actually show possible increased risk of preterm birth associated with vaccination. They were not alone: Madhi et al and Dugdale et al similarly reported results consistent with substantial possible preterm birth risk increases associated with maternal RSV vaccination in pregnancy but misinterpreted their statistical significance test results, downplaying the possible risks of up to 46% higher preterm birth risk for babies of vaccinated moms (Madhi et al), up to 147% higher in low and middle-income countries (Dugdale et al), and up to 34% higher when accounting for possible immortal time bias (Son et al).
Evidence linking maternal RSV vaccination in pregnancy has since continued to accumulate.
So Jasset et al are technically correct in saying that Son et al’s data suggest “no clear association.” But that doesn’t mean no association. It means we don’t know, which should prompt clinicians to err on the side of caution and adhere to the principle of first doing no harm.
Uncertainty about possible net harm should halt, not justify, novel medical interventions, particularly in vulnerable populations. When it comes to protecting infants against RSV, gambling on this uncertainty is particularly problematic given the likely safer and more effective alternative to maternal vaccination in the form of infant monoclonal antibody treatment with nirsevimab.
Bad proxies, misinterpreting uncertainty about serious possible risks as a green light to intervene in a vulnerable population, and ignoring a possibly safer and more effective alternative — this article hits several common highlights of medical literature that promotes the vaccine consensus without grappling with the clinical meaning of the complex, ambiguous underlying evidence.
It also ignores its own evidence that maternal RSV vaccination in pregnancy may increase preterm birth risk. Descriptive statistics presented in the unnumbered table show that 8 of 142 vaccinated mothers gave birth prematurely (~5.63%), compared to 1 of 20 unvaccinated (~5%).
Attrition also raises concerns that these analyses may be troubled by selection bias. It remains unclear why only 29 of 162 participating mother-infant dyads had capillary blood collection at age two months for the study. The authors don’t say how prematurity was addressed in their analysis. Premature infants are at increased risk of RSV hospitalization, RSV vaccination in pregnancy may increase premature birth risk, and prematurity increases other morbidity/mortality risks. So it’s impossible to get what we most need to know from this study, which is how vaccine timing may have affected infant safety.
Until randomized trials establish net benefit, RSV vaccination has no place in routine prenatal care.
Statistical significance testing misuse in interpreting inconclusive randomized trial results on low molecular weight heparin and early-onset fetal growth restriction
González et al misreport results from a 49-patient randomized trial on low molecular weight heparin and early-onset fetal growth restriction as indicating the prophylactic dose does not prolong gestation (“Treatment of early-onset fetal growth restriction with low molecular weight heparin does not prolong gestation: a randomized clinical trial,” Am J Obstet Gynecol, June 2025, 232(6):552.e1-552.e10).
A full interpretation of the relevant compatibility (aka confidence) intervals is not as straightforward as it usually is in such cases of statistical significance testing misuse, because the reported results also appear to contain two concerning inconsistencies:
Forty-nine patients were included (23 in the low molecular weight heparin group and 26 in the placebo group). In the low molecular weight heparin group, the median prolongation of pregnancy was 42 days, while in the placebo group it was 41.5 days (median difference 0.5 days [95% confidence interval -22.7 to 6.3] (P=.667)) and in the low molecular weight heparin group, the median gestational age at delivery was 35.1 weeks, while in the placebo group, it was 34.6 weeks (median difference 0.5 weeks [95% confidence interval -3.4 to 1.2] (P=.639)).
The first inconsistency is in the first reported CI, which spans -22.7 to 6.3 days. This range is logically inconsistent with the reported median difference of .5 days and likely reflects a reporting error that requires correction. (We would expect the value to fall in the middle of the reported range, and instead this value is not in that range at all.)
The second inconsistency appears to be in the listing of results in terms of prolongation of pregnancy by number of days and by gestational age. The trial’s preregistration indicates gestational age is the primary endpoint. Its secondary endpoints were listed as: “efficacy of low molecular weight heparin in reducing neonatal morbidity,” “demonstrate that low molecular weight heparin improves the pro-angiogenic and anti-inflammatory profile,” and “efficacy of low molecular weight heparin in reducing thrombotic and ischemic placental lesions.” According to its preregistration, we would expect to see results also in terms of these outcomes and not in terms of pregnancy days, which is redundant with gestational age at delivery.
More concerning still, this small-N study was underpowered to detect a clinically important outcome difference between treatment and control group on its primary endpoint. Most patients with early-onset fetal (aka intrauterine) growth restriction — 80.3% — deliver at >34 weeks. Subtracting that incidence from 100 to get around 20% and conjecturing that the treatment group might have a substantially lower such incidence — say, around 10%, this sample size calculator suggests that such a trial would need 398 subjects to have sufficient statistical power to detect a treatment effect with the conventional .05 likelihood of a type I error (false positive) and a type II error (false negative rate) of 80%. Again, it had only 49 subjects, a low number that should have alerted editors to its power problem. The misrepresentation of this trial’s underpowered results as definitive likely reflects a combination of cognitive distortion in the direction of uncertainty aversion, and authors responding to perverse incentives, generating publication bias.
Moreover, the primary endpoint was poorly defined. The primary clinically important difference for this outcome is arguably not whether some women deliver later after 34 weeks with the treatment, but whether some women who would otherwise deliver before 34 weeks, deliver later instead. In other words, we care more about preventing preterm births, the more preterm they are, because with greater prematurity comes greater risk. The outcome needs to be measured the results reported in a way that lets us see that.
Abortion may increase future preterm birth risk, but researchers tout it as safe
Van Gils et al report the latest finding in a large body of evidence suggesting abortion may increase risks to future pregnancies including preterm birth (“Subsequent risk for preterm birth following second trimester medical termination of pregnancy,” Am J Obstet Gynecol, 2025 May 19:S0002-9378(25)00317-5; full-text). In an Amsterdam University Medical Center cohort study, they looked at women who had abortions using mifepristone and/or misoprostol from 2008-2023, and then had a subsequent pregnancy.
The outcome measure was spontaneous preterm birth < 37 weeks. On one hand, this is a defensible outcome measure because it’s so much riskier for infants to be born before that point than after. On the other hand, it would have been better practice to also include analyses using pregnancy length or gestational age at birth as a continuous variable, because dichotomizing continuous variables risks losing information for no reason (see Altman’s classic and other canonical sources here.
Of 1,438 abortion patients, 1,033 had a subsequent known pregnancy, of which only 986 outcomes were available and of those, 962 were singletons. (Multiples are excluded since their preterm birth risk is higher.) Of subsequent singleton pregnancies carried beyond 16 weeks:
rates of spontaneous preterm birth < 37 weeks were higher following an IPI < 3 months compared to 12-24 months (6.8% vs 3.2% aOR 2.2 95% CI 0.69-7.4, p-value 0.2), and higher for a GA >20 weeks at mTOP compared to < 12+0 - 15+6 weeks (5.9% vs 2.6% aOR 2.2 95% CI 0.92 - 5.4, p-value 0.07), though both not statistically significant. However, when gestational age at mTOP was included as a continues [sic] variable (in weeks) in a linear regression model, a significant positive association with subsequent spontaneous preterm birth was found (B=0.56, R2=0.31, p=0.04).
The authors repeatedly misinterpret these findings as demonstrating abortion safety with respect to preterm birth, saying: “This is the largest cohort to date and supports the safety of second-trimester mTOP [medical termination of pregnancy] in relation to sPTB [spontaneous preterm birth],” “Second-trimester medical termination of pregnancy can be considered safe with regards to subsequent spontaneous preterm birth risk,” and “In conclusion, second-trimester mTOP can be considered safe with regards to subsequent sPTB risk and should be the preferred method if appropriate, considering the increased risk associated with sTOP.”
This last claim is particularly interesting, as it implies we should be comparing increased spontaneous preterm birth risk from medical abortion versus surgical abortion. But the authors do not provide a compilation of other studies’ ranges of results associating subsequent preterm birth with surgical abortion, or any other figure or citation for that comparison. (Please let me know if someone has already made a figure like this.)
If that’s the correct comparison, then it’s important to see these ranges to know what to conclude from these results. As I’ve written previously, KC et al find 4-36% higher preterm birth risk associated with surgical versus medical abortion, as well as higher odds of low birthweight. On the lower end of the compatibility interval, this is next to nothing; on the upper end, it’s moderate.
But it’s arguably an incorrect comparison anyway, since accurately informing women about the possible risks of abortion including to subsequent pregnancies might change their behavior. If we care about informed consent and healthy patients, then we don’t just want to know what the comparative preterm birth risk is of surgical versus medical abortion. We want patients to know that this risk may exist, and let them adjust their decisions as accordingly; this may prevent some iatrogenic harms, because some women may then forego abortion. Others may have it earlier under more uncertainty rather than risk having it later. And a lot of couples would probably wait longer between pregnancies to try again, if they knew that pregnancies spaced closer to abortions may incur heightened risks such as preterm birth.
This dimension is especially important in the context of second-trimester abortions, which result disproportionately from prenatal screening test results. Parents want to have healthy children, and they should have the opportunity to weigh the risks of having kids with possible chromosomal abnormalities (for instance) versus having premature babies without them.
Spinning these results as demonstrating safety when they actually suggest possible risk makes a few common methods mistakes including: statistical significance testing misuse (here, misinterpreting non-significant results as reflecting no risk when they reflect substantial possible risk) and nullism (here, misinterpreting no proven causal link as proof of no risk). From using an incomplete, debatable outcome variable, to misinterpreting results on it in these ways, and finally not providing the suggested operative comparison, there are four common methods mistakes here. That they happen to serve the preferred narrative of powerful sociopolitical networks — that abortion is safe, and access to medical abortion should be increased including in later gestation (where their analysis finds a statistically significant associated preterm birth risk associated) — is not an anomaly.
Rather, it fits the pattern of spin science in which researchers interpret complex, ambiguous data in line with simple stories that get them published and don’t make enemies. It would be awkward for university researchers involved in reproductive medical care to contest the consensus that abortion is safe, to critique the substance of current consent protocols which tend to not acknowledge or explicitly deny serious possible risks to women’s mental health or future pregnancies, and to thus suggest that their colleagues (and sometimes they themselves) may have violated fundamental principles of informed consent along with the duty to first do no harm. Saying this sort of thing might cost professionals in this field friends, funding, and feeling good about themselves.
It’s not just about vaccines or abortion. The problem of spin science is vast, spanning every domain in our complex, fast-changing modern societies. Just because scientists aspire to be rational, doesn’t mean we can stand outside of the psychosocial webs in which we all live, free of bias and unshaped by incentives. There’s no exit from being limited human beings, even (or perhaps especially) when we’re doing science.
In the context of abortion, we see a vast array of evidence of possible harm to women and subsequent pregnancies, a lot of common methods mistakes in the service of denying that harm on the part of leading pro-choice researchers and abortion providers, and a longstanding conversation about bias in the scientific and popular discourse, where one set of standards seems to apply to researchers who support the consensus story, and another to those who dissent. It would be better for women and families if doctors and scientists could try to adhere to higher standards of transparency and neutrality. Acknowledging that we don’t know (and will likely never know) if abortion causes substantial harms, such as an approximately 2x increased suicide risk, as well as increased future preterm birth risk, would be a good start.
Antidepressants heighten preterm birth risk
New findings show possible increased risk of preterm delivery from antidepressants compared to counseling (“Comparative effectiveness of treating prenatal depression with counseling versus antidepressants in relation to preterm delivery,” Li et al, Am J Obstet Gynecol, May 2025;232(5):494.e1-494.e9). In a cohort study of over 82,000 pregnant women at Kaiser Permanente, depression was associated with a substantially increased risk of preterm birth, with 95% CIs suggesting as little as 24% and as much as 60% increased risk.
Mental health counseling was associated with as little as 4% risk reduction and as much as 29% risk reduction in preterm birth. The authors do not appear to recognize that, on the lower bound, this is effectively nothing. They argue that there’s a dose-responsive relationship here, with four or more counseling visits associated with a 27-55% reduced preterm birth risk (95% CI .45–0.73). This could reflect confounding; someone with the resources to go to counseling more could also have more money or time.
Antidepressants were associated with a 6-61% increased risk of preterm birth. Again the authors report a dose-responsive pattern, writing “a longer duration of use was associated with an even higher risk.”
These associations could, however, at least in part reflect shared underlying causes of both depression and preterm birth. Women with worse depression could opt for medication treatment at higher rates and/or for longer periods. And inflammation and poor glucose metabolism may influence both mental health and pregnancy outcomes. This raises the question of whether non-pharmacological treatments targeting those pathways, such as the Mediterranean diet and exercise, may better treat depression while protecting pregnancies.
In addition, these results join a wide array of evidence suggesting maternal and paternal preconception exposure to common types of antidepressants (SSRIs and SNRIs) may carry substantial risks to subsequent pregnancies, as may antidepressants in pregnancy. These risks implicate a separate causal mechanism, stemming from the role of serotonin in fetal development. Broadly, this is another area in which we would expect bias and corruption to shape spin science downplaying these possible risks, as I’ve written previously.
Osteoporosis dive: coffee, soy, and statistical significance testing misuse in a denosumab meta-analysis
A friend asked me to check on the osteoporosis literature for her this weekend. First, she was concerned about a possible relationship between drinking espresso and osteoporosis, which I didn’t find supported. At least, it’s not one of the lifestyle factors typically mentioned (e.g., smoking). Nothing much came up when I searched for espresso osteoporosis in PubMed. When I broadened to coffee or caffeine, I found some evidence of a moderate protective effect from coffee and tea (which I suspect may reflect confounding and anyway is inconclusive), one study finding a dose-responsive effect (so drinking a lot of coffee may increase hip fracture risk; but results have been inconclusive, mixed, and dependent on study design), and another suggesting there could be a genetic factor. The concern seems to be about caffeine and not coffee specifically, for which we see many possible mechanisms whereby caffeine could be bad for bone; but, one of the big things we’ve learned in the past 30 years in medicine is that seeing a mechanism doesn’t necessarily translate to an in vivo effect.
Enjoy your coffee.
Succumbing to my Guardian habit with my morning brew, I was then reminded that there are some gigantic country-level differences in breast and prostate cancer and hip fracture risks. One possible contributor is diet, with soy consumption much higher and many risks much lower in several Asian countries including Japan. The soy story checks out at least a little bit in the medical literature, with Taku et al’s meta-analysis reporting “Soy isoflavones may prevent postmenopausal osteoporosis and improve bone strength thus decreasing risk of fracture in menopausal women by increasing lumbar spine BMD and decreasing bone resorption marker urine deoxypyridinoline.” (Here is Harvard’s soy resource website that I’ve linked to previously, which also confirms and elaborates on this.)
Eat your soy snacks.
My friend also asked: “Are the new osteoporosis drugs safe? Prolia has been mentioned…”
Prolia is the drug denosumab. We don’t appear to know if it yields a net benefit. The problem is that, according to critics including University of Helsinki surgery professor Teppo Järvinen, there has not been a randomized trial comparing different approaches to fracture treatment in the elderly. That’s one of the reasons why osteopenia/osteoporosis features as a case study in Welch et al’s excellent Overdiagnosed (review). Another is that DXA scan readings, commonly used to diagnose the conditions, can be off by as much as 20% — especially in patients with lower bone mass densities. These complexities and ambiguities highlight questions about causality: dissenting researchers led by Pekka Kannus claimed falling, not osteoporosis, causes most fractures in the elderly, and proposed reorienting medicine from diagnosing and medicating osteoporosis, to fall prevention measures (e.g., exercise and hip protectors).
As for denosumab, a 2014 meta-analysis by Zhou et al concluded:
denosumab treatment significantly decreased the risk of non-vertebral fracture but increased the risk of SAE [serious adverse event] related to infection in the postmenopausal women with osteoporosis or low BMD [bone mass density]. However, no difference between the safety of denosumab and bisphosphonates was found.
The language (“significantly”) flags typical statistical significance testing misuse. The authors say they found an effect when their results suggest there may actually be no effect, certainly not one of any practical importance. We can’t tell from this evidence. They should say that.
The full interval estimates suggest the drug may reduce fracture risk by 0-26% (95% CI .74–1.00), and may increase serious infection risk from 0-52% (95% CI 1.00–1.52).
It looks suspicious that both intervals are right at 1. This allowed the authors to call their results “significant.” There are perverse incentives to do that (e.g., publication), and sometimes analysts try different, defensible analyses until they get such results (p-hacking).
Wikipedia also flags joint and muscle pain as common side effects, as well as possible increased risk of osteonecrosis of the jaw (as with bisphosphonates). It’s an injectable, so it’s not clear if you’re stuck with the side effects for six months if you have them. There is no established all-cause mortality benefit.
However, as usual, the evidence is complex and ambiguous. Newer analyses find stronger evidence of a denosumab benefit. Chotiyarnwong et al’s 2020 analysis pooled five placebo-controlled trials and found “treatment was associated with a lower incidence of non‐fracture‐related falls (p = 0.02).” Again, it would be nice to see less reliance on statistical significance (what the p refers to), and more translation into practically meaningful effect sizes, like number needed to treat. Instead, we get a supplementary figure showing the interaction between the drug and age. Gold standard risk communication, it’s not.
This raises questions about mechanism and endpoint relevance: How could a drug decrease non-fracture-related falls? Why is that an interesting endpoint? What outcome do we really care about here? And wait a minute, didn’t critics charge there were no placebo-controlled trials?
The question of endpoints is one that warrants more in-depth exploration in a future post. Suffice it to say, we might want all-cause mortality to be the default endpoint in a lot of health contexts, because otherwise we are stuck comparing quality of life on metrics that are very difficult to define. In theory, we care about fracture risk because it spikes that mortality risk. And bone density because it predisposes to fractures during falls. But something else (e.g., fatigue, poor balance, muscle weakness) might be the primary proximate cause of falls, making treating that (e.g., with exercise and protein) a priority.
It must be said that it doesn’t really make sense that an osteoporosis drug would decrease fall risk. We should question whether this finding might be due to confounding, and look for a straight causal line that makes sense in these sorts of analyses. For instance, does denosumab improve BMD and thus decrease risk of fractures?
Kendler et al 2022 argue that it does and “data continue to demonstrate that the benefits of osteoporosis therapy far outweigh the risks in patients at high risk of fracture.” They cite Cummings et al 2009’s report on a randomized trial (FREEDOM) comparing the drug to a placebo injection among 7868 women aged 60-90 with non-severe osteoporosis, finding denosumab may reduce vertebral fracture risk by 59-74% (95% CI .26-.41), hip fracture risk by 3-63% (95% CI .37-.97), and nonvertebral fracture risk by 5-33% (95% CI .67-.95), along with Simon et al’s 2013 wrist substudy finding a substantial possible wrist fracture risk reduction.
These are persuasive results, but they’re from just one randomized trial that uses placebo injections instead of an alternate, non-drug active treatment. At best, we need to see these results replicated to know (something closer to) the truth.
The fact is that the balance of this evidence is contested in the medical literature. The drug may substantially reduce fracture risks. But it may not. And it may also risk serious iatrogenic harm (e.g., from infections).
The placebo question is similarly complicated. Experts may disagree about what a properly controlled trial looks like here. Should it compare a different active intervention, like fall prevention via exercise, head-to-head with a drug treatment? Or just pit existing standard drug treatments (e.g., bisphosphonates versus denosumab)? Or use saline injections?
The point here is not that I have the answer and it’s to pit different treatments that we have good reason to suspect might work (though that would be my preference, because I put a higher value on helping patients than on proving someone’s pet theory; and I think this was the moral intuition on which Järvinen’s post was predicated). Rather, it’s that different well-informed, reasonable people can read this evidence base differently, as is often the case.
Anyone who tells you that the answer is clear, is failing to acknowledge this empirical reality. They may have perverse incentives to prescribe an expensive drug, such as higher rates of reimbursement for more expensive treatments. They may not have really read the underlying evidence, which I still haven’t grappled with properly myself; because to do so for every important decision in complex modern lives simply takes a prohibitive amount of time and attention. And, most likely, there’s overarching uncertainty about the net effect given possible fracture risk reduction benefits versus infection risk costs.
We never seem to have enough data to know what we want to know, and that’s not an accident. It’s a product in part of a medical research system fraught with waste, fraud, and abuse due to perverse incentives.
Copper IUDs may or may not be better emergency contraception than hormonal ones
Researchers ran an underpowered trial (N = 630), from which we still don’t know (“Estimating emergency contraception efficacy with levonorgestrel and copper intrauterine devices,” Nourse et al, Contraception, 2025 May 12:110946). In line with the typical uncertainty aversion, Nourse et al misinterpreted their findings to conclude they show copper IUDs are more effective. We don’t know that from this trial. There was only one unintended pregnancy in the whole trial. So what it really established is that both IUDs worked so well as emergency contraception that we’re not sure if the copper IUD worked better, or not.
Table 2 is a train wreck of trying to make something out of that difference from which no conclusive statements about comparative efficacy can be made: Zero pregnancies out of 318 women who got a copper IUD for emergency contraction, versus one pregnancy out of 312 women who got a levonorgestrel one. Nourse et al translate that zero versus one into proportion of pregnancies prevented, making it look like a bigger difference that it is. Then they make it look even bigger by translating it into pregnancy risk difference. That is silly. We should ignore those numbers, because they are not meaningful.
However, they are useful for enabling us to see that (as expected) the 95% CIs hug zero by a good margin, clarifying that the advantage could be substantial or small. It could also go the other way (hormonal IUDs are reported in some studies to have greater contraceptive efficacy than copper ones), or the two options could be equally effective. We just don’t know from this evidence.
The authors’ claim that “The levonorgestrel 52 mg IUD for emergency contraception prevented 93% to 96% of expected pregnancies using an established pregnancy risk method, while copper IUD users experienced 100% pregnancy prevention” obscures the reality of uncertainty, that they had perverse incentives (publication bias) to obscure.
It’s of practical importance for women to know that we don’t know which, if either, of these options is more effective as emergency contraception. A huge proportion of women have copper IUDs (around 50%) removed within a year of having them put in due to adverse effects like increased menstrual cramping and bleeding. The tolerability of the hormonal IUD is much higher, since it often decreases this cramping and bleeding instead of increasing it. So it could be a better bet for medium to long-term contraception.
At the same time, it’s plausible that this decreased tolerability relates causally to what may make copper IUDs more effective than their hormonal counterparts as emergency contraception. But the causal story could also work the other way: Hormonal IUDs could be more effective because they don’t just make the uterus inhospitable to a pregnancy, but also change the hormonal milieu in a way that copper IUDs don’t. We just don’t know. More research is needed — a cliché in science that’s conspicuously absent here.
New resources for rogue methodologist ninjas
So whose job is it to tell patients when medical interventions risk net harm? I’ve suggested previously that we need something like a nonprofit medical information service to at least try to overcome the neutrality problem — a rogue methodologist ninja helpline.
In cases of overt malfeasance threatening lives, a new institutional structure now supports whistleblowers and metascientists in collaborating on practically important reform: last week, science reformer James Heathers announced The Medical Evidence Project to identify major errors in meta-analyses that support harmful guidelines. Run under the auspices of the Center for Scientific Integrity — the parent non-profit of Retraction Watch — the organization aims to build tools to “find, expurgate, and correct dangerous papers.”
Color me supportive but skeptical: It takes both scientific and social knowledge to do this kind of critique. I doubt this task can be automated, because it requires meta-perspective on the interpretive components of science. And I’ve had Retraction Watch tell me that they wouldn’t cover a specific case of p-hacking because that would be like taking sides; but now they’re apparently fine with that.
Similarly, the excellent Open Science Foundation has published COSIG, a collection of science integrity guides. Ironically, its tagline is about inclusivity:
Anyone can do post-publication peer review.
Anyone can be a steward of the scientific literature.
Anyone can do forensic metascience.
Anyone can sleuth.
But its first link is to “PubPeer commenting best practices.” PubPeer requires an institutional affiliation to comment. In other words, not anyone — not even anyone with PubMed-indexed peer-reviewed articles to their name — can do it.
So COSIG doesn’t appear to recognize substantial gatekeeping that exists within scientific integrity work. It also doesn’t recognize the most common methodological mistakes I see in the medical literature (e.g., misinterpretation of statistical significance and publication bias test results), or the ones others have long noted.
In addition, neither The Medical Evidence Project nor COSIG acknowledge the fact that you can already just say what’s wrong with published scientific papers on the Internet for anyone to read (as I regularly do). We don’t have to wait for miscreants or publishers to stop responding to perverse incentives to deny, distract, or degrade critics. We can’t fix the broken system and don’t have to waste our time trying. Just tell the others so they can try to protect themselves. Maybe crying “fire!” in a burning building is good enough?
The problem isn’t that people don’t do this already, or need to know the guidelines for doing it better. The problem is that there’s already so much information being published so fast, with gatekeepers perversely incentivized to let the wrong stories in and keep the right ones out, that it’s very difficult if not impossible to correct the scientific record. If you get caught up in the game of trying to do so within the game of scientific publishing, it’s probably going to cost you a lot of time and aggravation. If you don’t, your audience is probably going to be limited to the people who are interested in reading contrarian narratives. Most people prefer to have their confirmation biases, well, confirmed.
In my view, the point of critiquing science openly and regularly with this kind of post is that it’s the only way to read the literature. To know what the evidence says, you have to correct other people’s methods mistakes, which commonly run in the direction of expected biases. Once you’ve done that, you might as well tell other people what you found: it takes minimally more time, might help someone, and is fun. (You might also catch your own mistakes that way… Or someone else might!)
Doing anything else is a probable waste of time that incurs substantial opportunity costs; journal editors, for instance, often ignore complaints, since retracting articles can generate bad things for them, from losing face to losing lawsuits. So there should just be more rogue methodologist ninjas openly critiquing science as par for the course of reading what’s new.
Not because we’re going to put out the dumpster fire that is science that way. Just because we should be doing the work anyway, to protect ourselves and our loved ones. No one can do it all. We have to share our work.
If you’re new here…
Please consider signing up, telling a friend, or supporting this writing with a paid subscription. I’m also always looking for ways to further my long-standing research interest in mass screenings for low-prevalence problems, improve my science communication, and skill up in tech. At this point, though, the main obstacle to all these things is limited work time that cannot be expanded due to the inexorable cuteness of my children.