Cough First and Ask Questions Later?
Bias in science and science communication on masks and respirators
We often don’t know what we’d like to know, but it doesn’t feel great to say so. We prefer to solve problems, tell stories, and hear stories we already know and agree with that confirm we’re right about how to solve problems (confirmation bias). It feels good.
Science and science communication pander to these preferences, not because scientists and science journalists are evil. We all have to make a living somehow, and you don’t usually get grants or clicks for focusing on how monumentally ignorant and stupid humans are. But rather, because confirmation bias feels so good to us all. And so it can even look like a public service to stoke it, because there is no exit from the psychosocial swamp.
Everyone is busy and wants executive summaries from science for practice, but the truth is that we usually don’t know what to tell the people who need to know. Added to this time-pressured recipe for erroneous uncertainty denial (aka overconfidence bias), most people usually have a preferred story to which they fit the available evidence. Working against these biases is hard. So hard that bias in science and science communication is normal, and we shouldn’t expect that we can fully correct it by attaining some mythological, objective perspective from which we can do the job right — but they (our opponents) were biased.
Recent critical discussion of the widely-reported recent Cochrane masks/respirators study illustrates how scientists and science communicators often deny these realities of the limits of consciousness, cognition, and meta-cognition. The previous post introduced the problem of epistemological limits in terms of bias plaguing science including bias science, focusing on the case study of the self-reflexive cognitive bias and paradox. This post continues the same argument through a second case study, of misinformation in misinformation discourse about misinformation discourse about masks and respirators — that is, bias in the bias-spotters three layers in. (I’ve also written previously on misinformation in misinformation discourse in Covid, abortion, abortion reversal, and mammography contexts, noting how misinformation is a political designation, and the same problems that plague scientific discourse writ large — especially ambiguity aversion — also plague this sub-corridor.)
Bias in Science and Science Communication on Masks and Respirators
The hyperpolarized information environment around Covid extends to science and science communication on masks and respirators for preventing transmission. Briefly, the scientific evidence on masks and respirators for preventing Covid transmission contains a lot of ambiguity and uncertainty, but proponents and opponents of particular policy regimes often misrepresent it as unambiguous and certain — just like other interested parties do most kinds of other science (see, e.g., many leading statistics reformer Sander Greenland citations here). These accents of social and political interests inflect science, science communication, and critical reflection on both. Here’s how.
For context, we’re looking at bias up to four layers into the discourse, where the first layer is the consensus, establishment, pro-masks/respirators position. The second layer is a critical (of consensus) scientific article (Jefferson et al 2023). The third layer is a critical (of Jefferson et al) Scientific American essay (Oreskes 2023). And the fourth layer is a critical (of Oreskes) blog (Demasi 2023).
Image: Dennis Jarvis, 2018, Russian Matryoshka, Creative Commons Attribution-Share Alike 2.0 Generic license, WikiMedia Commons.
On Jan. 30, 2023, Tom Jefferson et al published a Cochrane review entitled “Do physical measures such as hand-washing or wearing masks stop or slow down the spread of respiratory viruses?” To which the correct answer is: we don’t know, because it’s hard to study, our data are messy, and so we don’t have the data we need to answer that question. The predictability and uselessness of this answer make the question itself questionable.
Bracketing the question of the question, the authors correctly concluded that we don’t know. Specifically on masks and respirators, they reported:
Medical or surgical masks
Seven studies took place in the community, and two studies in healthcare workers. Compared with wearing no mask, wearing a mask may make little to no difference in how many people caught a flu‐like illness (9 studies; 3507 people); and probably makes no difference in how many people have flu confirmed by a laboratory test (6 studies; 3005 people). Unwanted effects were rarely reported, but included discomfort.
N95/P2 respirators
Four studies were in healthcare workers, and one small study was in the community. Compared with wearing medical or surgical masks, wearing N95/P2 respirators probably makes little to no difference in how many people have confirmed flu (5 studies; 8407 people); and may make little to no difference in how many people catch a flu‐like illness (5 studies; 8407 people) or respiratory illness (3 studies; 7799 people). Unwanted effects were not well reported; discomfort was mentioned.
This report may appear neutral, but is biased in a way that violates the commonly shared liberal democratic values of equality and freedom. It is also bad science. The problem is that it fails to consider mechanisms. This failure makes it possible to ignoring widely available information relevant to the mechanisms whereby we might expect masks and respirators to work. That information implicates protected categories (sex, race/ethnicity, religion).
Who Mentioned What?
Female health professionals have described more than discomfort from their PPE, with one frontline NHS worker saying “ ‘PPE is designed for a 6 foot 3 inch bloke built like a rugby player’ ” (quoted anonymously in Alexandra Topping’s April 24, 2020 Guardian article “Sexism on the Covid-19 frontline”). The problem is big and precedes Covid: “A 2016 survey conducted by the trade union Prospect, the TUC, and others found that just 29% of female respondents were using PPE designed for women, and 57% said their PPE hampered their work” (Topping again). Society has largely ignored affected women. For instance, the UK’s recent Covid-19 Inquiry revealed then-PM Boris Johnson:
asked the former chief executive of the NHS in England, Simon Stevens, about reports that female frontline healthcare workers were struggling with PPE that had been designed for men. Stevens is said to have ‘reassured’ the prime minister that there was ‘no problem.’
It is unclear what science Stevens was following here… report after report over decades has found that while PPE is usually marketed as gender-neutral, the vast majority has in fact been designed around a male body, and therefore neither fits nor protects women. In fact, more often than not, it’s a hindrance: a 2017 TUC report found that only 5% of female emergency service workers said that their PPE never got in the way of their job (Caroline Criado Perez, “Another truth from the Covid inquiry: women were being ignored over ill-fitting PPE long before the pandemic,” The Guardian, 3 Nov. 2023).
So PPE fit is a gender equality issue, because of how common ill-fitting is unequally distributed, disproportionately affecting women; and because powerful men ignored women who raised concerns about this while risking their lives to help others during a pandemic. It’s also a racial/ethnic equality issue for analogous reasons. See, e.g., “The role of fit testing N95/FFP2/FFP3 masks: a narrative review,” by A. Regli, A. Sommerfield, B. S. von Ungern-Sternberg (Anesthesia 2020) and “The influence of gender and ethnicity on facemasks and respiratory protective equipment fit: a systematic review and meta-analysis,” by Jagrati Chopra, Nkemjika Abiakam, Hansung Kim, Cheryl Metcalf, Peter Worsley, and Ying Cheong (BMJ Glob Health. 2021; 6(11)).
But, as one might expect:
Structural racism’ within the NHS saw some black and minority ethnic healthcare workers not being listened to, believed or responded to when they raised concerns about issues including ill-fitting face masks in the pandemic, the Covid inquiry has been told (“NHS ‘structural racism’ saw ethnic minority staff not heard on PPE, inquiry told,” Aine Fox, The Standard, Oct. 6, 2023).
These concerns are particularly serious, as they may be implicated in deaths:
Ethnic minorities were significantly more likely than white British people to both catch and die of Covid.
Bangladeshi, Pakistani and Caribbean backgrounds were particularly affected.
The first 10 doctors to die of Covid-19 in the UK all belonged to ethnic minorities - as did 85% of those who died in the first year of the pandemic (“Covid inquiry: Minority doctors less forthright about poor PPE, BMA says,” Ashitha Nagesh, BBC, Oct. 5, 2023).
Ill-fitting PPE also raises issues of religious freedom and equality, because beards and headscarves can affect fit.
Causal Mechanisms Matter
The point is that “standard” masks/respirators are made for white males, and disproportionately do not fit women and people of other races/ethnicities. This is not a niche or irrelevant concern; it is central to whether these things work. If PPE doesn’t fit many or most of the people wearing it — half the population, for instance — then it’s not particularly meaningful as a measure of possible PPE efficacy to assess how well this ill-fitting PPE works in the aggregate for its intended purpose. Critical thinking requires that first, we think about how this stuff is supposed to work (e.g., forcing air through a filter in the case of respirators). Then, we should assess how well it’s working by this mechanism— in this case, actually forcing air through a filter by way of sealing on the user’s face. Next, if a bunch of PPE doesn’t fit and so is likely to not activate this mechanism, we should fix that before assessing its efficacy and drawing conclusions about the utility of norms or policies requiring mask or respirator use.
Running statistical analyses on the existing efficacy evidence on PPE that very often doesn’t fit properly is like running statistical analyses on the existing efficacy evidence on condoms with holes poked through them. Researchers announce they don’t work, and are uncomfortable (a repurposed analogy and quote from Rose McDermott’s polygraph interview). But this is not a valid conclusion to draw about condom efficacy. It just proves you shouldn’t poke holes in condoms.
There is a more closely analogous real-world problem with higher condom failure rates in India due to smaller average penis size (see Wired 2006, Slate 2006, The Times of India 2023). There could be better data on this, of course. For some reason, getting a representative sample of all the men in the world to line up for penis measurement for science poses difficulties.
The point is that condoms should fit the penises they’re on. Otherwise, we might not expect them to work very well. Studies showing poor condom efficacy for poorly fitting condoms don’t show that condoms don’t work. They show that thinking about causal mechanisms before running statistical analyses makes sense.
Similarly, masks and respirators should fit on faces so that air has to go through them, if you want them to possibly work to prevent infectious disease transmission. They often don’t. Scientists who publish reports on a topic have a responsibility to be familiar enough with the relevant scientific and popular literature on it to spot relevant problems such as this.
So saying that “discomfort was mentioned” in studies reporting inconclusive PPE efficacy — without acknowledging that this discomfort relates in some proportion of cases to fit, fit systematically varies by gender and race/ethnicity among other factors, and fit influences efficacy — is like saying that discomfort was mentioned in studies on Indian condom failure without saying that the condoms didn’t fit. It omits the thinking about causal logic that should precede statistical analyses. The alternative is putting “science before statistics,” in leading science reformer Richard McElreath’s parlance — doing better science.
Cochrane Report Reception
We still don’t know what we want to know about masks: Do they work in Covid-like pandemic contexts? Technically, Jefferson et al were right; but, arguably, their empirically correct answer addressed the wrong question with fairly worthless data, and reflected (at best) ignorance of the evidence they were responsible for knowing. It’s predictable and fairly meaningless that we don’t know what we want to know here by typical Cochrane standards. And it’s well-established that ill-fitting PPE is a common and unequally distributed problem. Did the media leap on these issues with the review’s question and answer?
Of course not. As Harvard science history professor Naomi Oreskes wrote in Scientific American (“What Went Wrong with a Highly Publicized COVID Mask Analysis?” Nov. 1, 2023):
The media reduced these [findings] to the claim that masks did not work. Under a headline proclaiming “The Mask Mandates Did Nothing,” New York Times columnist Bret Stephens wrote that “the mainstream experts and pundits ... were wrong” and demanded that they apologize for the unnecessary bother they had caused. Other headlines and comments declared that “Masks Still Don't Work,” that the evidence for masks was “Approximately Zero,” that “Face Masks Made ‘Little to No Difference,’” and even that “12 Research Studies Prove Masks Didn't Work.”
Dichotomania strikes again. As Cochrane Library Editor-in-Chief Karla Soares-Weiser wrote in a statement responding to the media response to the report:
Many commentators have claimed that a recently-updated Cochrane Review shows that ‘masks don’t work,’ which is an inaccurate and misleading interpretation… Given the limitations in the primary evidence, the review is not able to address the question of whether mask-wearing itself reduces people's risk of contracting or spreading respiratory viruses.
“Absence of evidence,” in leading medical research methodologist Doug Altman’s parlance, “is not evidence of absence.” The main problem with Jefferson’s Cochrane report, then, is arguably that it asked the wrong question, developed an irrelevant (to the real world) answer accordingly, and put out information that was likely to be misinterpreted in predictably oversimplifying, black-and-white, uncertainty-denying ways. As it was and continues to be.
Oreskes’ Cochrane Criticism
Oreskes goes farther, arguing “The Cochrane Library, a trusted source of health information, misled the public by prioritizing rigor over reality” (her essay’s title and subtitle). Having had many a misleading title, subtitle, quote, or interpretation attributed to me by intelligent, well-intentioned reporters and editors, I wonder if that’s what Oreskes herself subtitled her own piece. Either way, it’s dangerously wrong. The problem is exactly the opposite: Jefferson et al’s review lacks scientific rigor.
By falsely equating privileging evidence from randomized trials over other forms of evidence with “rigor,” it gives an undeserved rhetorical advantage to exactly the form of evidence evaluation Oreskes criticizes. Oreskes’ error here is substantive as well as rhetorical: Rigor requires grappling with the causal logic of the effect of interest, something that Jefferson et al fail to do. By letting Cochrane rhetorically own the terrain of “rigor —” instead of critiquing the equation of evidence generated by one particular method with quality, Oreskes’ otherwise excellent essay risks contributing to the problem of randomized-trial worship in medicine and beyond that Greenland, Yalom, she, and others have critiqued.
That said, her main criticism of the study’s popular misinterpretation holds. Oreskes tells how:
The study's lead author, Tom Jefferson of the University of Oxford, promoted the misleading interpretation. When asked about different kinds of masks, including N95s, he declared, “Makes no difference—none of it.” In another interview, he called mask mandates scientifically baseless.
Recently Jefferson has claimed that COVID policies were “evidence-free,” which highlights a second problem: the classic error of conflating absence of evidence with evidence of absence. The Cochrane finding was not that masking didn't work but that scientists lacked sufficient evidence of sufficient quality to conclude that they worked. Jefferson erased that distinction, in effect arguing that because the authors couldn't prove that masks did work, one could say that they didn’t work. That’s just wrong.
Again, Oreskes is right. It’s just that introducing this argument under the heading “rigor over reality” does it a disservice. Rigor is exactly what’s missing from logical fallacies like the one Oreskes points out here. Through this mislabeling, Oreskes set her argument up to be attacked by people claiming the mantle of rigor for randomized-trial worship. And so it was…
The Ugly Duckling: Observational Evidence and Causal Inference
In her Dec. 6, 2023 SubStack “Did Cochrane's study on masks get it wrong?” Maryanne Demasi interviews leading medical methodologist and expelled Cochrane Collaborative co-founder Peter C. Gøtzsche about the Cochrane masks review, Oreskes’ essay, and the practically relevant scientific evidence on masks in the context of Covid. Demasi has done a lot of solid critical science journalism, e.g., on statins. This essay was not her best work.
In it, Demasi uncritically quotes Gøtzsche arguing against Oreskes that “ ‘There is not an absence of evidence. There is evidence from randomised trials, including those trying to prevent influenza transmission, and it shows that masks just don’t work.’ ” This statement makes the “absence of evidence = evidence of absence” logical error that Oreskes highlighted. Whether this is an honest mistake, or a cynical weaponization of Oreskes’ erroneous classification of randomized-trial worship under the heading of “rigor,” is not knowable from the outside. Either way, it works by ignoring the substance of her argument.
Oreskes critiques Cochrane’s approach as “methodological fetishism,” and cites observational evidence that masks work:
…there is strong evidence that masks do work to prevent the spread of respiratory illness. It just doesn't come from RCTs. It comes from Kansas. In July 2020 the governor of Kansas issued an executive order requiring masks in public places. Just a few weeks earlier, however, the legislature had passed a bill authorizing counties to opt out of any statewide provision. In the months that followed, COVID rates decreased in all 24 counties with mask mandates and continued to increase in 81 other counties that opted out of them.
Another study found that states with mask mandates saw a significant decline in the rate of COVID spread within just days of mandate orders being signed. The authors concluded that in the study period—March 31 to May 22, 2020—more than 200,000 cases were avoided, saving money, suffering and lives.
Similarly, Demasi writes:
The CDC has published multiple observational studies in its Morbidity and Mortality Weekly Report (MMWR), which has substantial influence on US health policy and is widely cited as evidence of mask effectiveness.
But an analysis by Høeg et al, published in Am J Med found that “MMWR publications pertaining to masks drew positive conclusions about mask effectiveness >75% of the time despite only 30% testing masks and <15% having statistically significant results.”
It’s valid to note the evidence on masks is ambiguous and uncertain, but not to mischaracterize it as showing they don’t work, dismiss observation evidence out of hand, and then go on to conclude from this evidence summary — as Demasi does — that “They knew all along….” implying a homogeneous opponent that knowingly made a definite mistake (masks work) that has been proven wrong (masks don’t work). Each element of this is wrong and polarizing. The evidence is exegetical, and we should just say that. Part of this exegesis is broadening what counts as good evidence. A lot of very important real-world problems are hard or impossible to study with the method of randomized trials. And a lot of randomized trials are vulnerable to the very sorts of selection biases they’re designed to protect against.
So Oreskes pushes for a more holistic approach to evaluating epidemiological evidence as a whole, while Demasi parrots Gøtzsche’s randomized trial worship. The irony that the supposedly more rigorous method fails to consider causal mechanisms here — and thus fails to uphold a minimum standard of logical rigor — is lost on the method’s proponents. This may reflect, in part, a generational divide.
The ugly duckling of observational studies is undergoing a methodological rebirth as the causal revolution filters down through social and medical sciences. When we recognize that selection bias plagues observational and experimental evidence alike, and think visually and systematically about a generative causal model before estimating correlative effects that we only care about because (secretly or not) we think they are causal, then it makes sense to allow that both forms of data can produce valid inferences. Conversely, when we pretend the highly artificial floating worlds of randomized trials are the only ones worth looking to for evidence, we risk ignoring the very selection bias randomized trials are supposed to guard against — and missing real-world insights from lay people who are the real experts in what we need to know to model causality, to do better science.
This is what I argue elsewhere has happened more than once when infant feeding researchers conducted randomized trials on breastfeeding without listening to mothers, who frequently report insufficient breastmilk, or babies, who frequently cry inconsolably from starvation in the two full days before most mothers’ mature milk comes in. By placing regard for one particular set of methods over regard for other people’s stories and well-being, researchers can do serious preventable harm. Conversely, by listening to people about their experiences, instead of working in a top-down fashion, we can often do better (more rigorous) science. One methodological reason is that we may be able to better envision the causal generative processes that rigor demands be behind the statistical analyses everyone cares so much about, when we know more about people’s relevant experiences.
It may not sound as fancy as “statistical significance.” But listening to ordinary people is science, too. Instead emphasizing the value of a particular methodological toolkit above all else reflects a bias in favor of this method. And that’s not the only bias potentially reflected here…
“Objectivity” Has Baggage
The way in which Demasi and Gøtzsche talk about Oreskes’ work risks implicitly invoking sexist stereotypes for no substantive pay-off. Demasi writes:
“It’s clear that Oreskes lacks scientific objectivity,” says Gøtzsche in a stinging rebuke. “Oreskes is actually arguing that the researchers should have lowered their standards and relied on weaker evidence in their review.”
Again, Gøtzsche’s “stinging rebuke” actually repeated the very logical error Oreskes’ essay called out (equating absence of evidence with evidence of absence). Gøtzsche and Demasi thus exhibit apparent bias here by missing Oreskes’ point. Pointing this out is awkward, both because everyone is vulnerable to bias (including me in making this sort of criticism — a self-reflexive vulnerability inherent in bias research and critical reflection on science generally). And because I would rather have a better discourse than one where opinion or perspective = bias = lack of scientific objectivity = dismissal of opponents’ substantive points. Everyone has opinions, everyone comes from somewhere, everyone is plagued by bias and error. And a lot of us biased people also have a point.
On one hand, maybe Gøtzsche is just taking the bait Oreskes was wrong (rhetorically and substantively) to put in her essay by calling Cochrane’s standards more “rigorous” than more methodologically pluralistic ones. That requires a strange definition of “rigor” that doesn’t attend to causal mechanisms and so doesn’t mean “better scientifically.” Most scientists would probably agree that more rigorous science concerns itself with causal logic, instead.
On the other hand, women are traditionally stereotyped as being more emotional and personal than men, for reasons beyond our control. For example, menstrual cycles make our emotional states generally more changeable over the month, and pregnancies show the world facets of our personal lives and choices — whether we want them to, or not. To flat-out ignore a female scientist’s substantive argument, as Gøtzsche did here, while saying that she lacks objectivity in making it, is to attack her for exhibiting signs of that human perspective which is innate and inescapable in us all — as if Gøtzsche (her prominent male opponent) had achieved a perfect objectivity which she herself had failed, through no specified exact action or other evidence, to attain. This sort of ad hominem attack is therefore unsuitable behavior, particularly in scientific discourse, and particularly from a senior methodologist.
Objectivity does not exist, invoking its absence may proxy for ad hominem attacks including gendered dismissals, and anyway, we should aspire as reasonable, decent people to treat other people’s arguments with the basic respect of entertaining them on their logical basis (ignoring them is the opposite). Scientific discourse should be about logic and facts — you know, about science. This is true both because and regardless of the fact that objectivity is an unattainable aspiration in science as in other human realms.
A self-reflexive paradox presents again here: I want the focus in this discourse to be on its substance. So I articulate what I perceive to be the gendered nature of Gøtzsche’s criticisms of Oreskes, in order to suggest that focusing on objectivity/bias is a distraction from that substance. But this, in turn, draws more focus away from the substance and potentially onto my own gendered perspective. This risks defeating my main purpose, which is drawing the focus back on to the science.
Get me out of here! There may be no objectivity, but there sure is bias in science and science communication. What’s a methodologist to do?
Exit: Forget Objectivity?
We could, Greenland suggests, forget objectivity.
Values influence choice of methodology and thus influence every risk assessment and inference. To deal with this inescapable reality, we need to replace vague and unattainable calls for objectivity with more precise operational qualities. (“Transparency and disclosure, neutrality and balance: shared values or just shared words?” Sander Greenland, 2012, J Epidemiol Community Health; p. 967; full text).
Why is calling for more objectivity in science so bad, and what can we do instead?
treating ‘objectivity’ as a cure for or opposite of bias is misguided. Consider this definition of objective conduct: ‘Objective: expressing or dealing with facts or conditions as perceived, without distortion by personal feelings, prejudices or interpretations’. Perceptual distortion can be negligible in physics, but is predominant in health science. It arises not only from personal prejudices, but also from cognitive biases and values built into methodologies that investigators follow and teach. Thus perceptual objectivity in the ordinary-English sense is an unrealistic goal in scientific research even though it is valued and thus claimed by most researchers. Worse, claims of ‘objectivity’ are often simply denial (to oneself as well as to others) of subjectivity and values in one’s assessments and methodology.
The problem with ‘objectivity’ is that it is too complex, ill-defined and unattainable (if not pretentious) to take as a claim or goal. In response, we can replace dubious claims and goals of objectivity with more precise, operational and approachable characteristics. Two highly valued characteristics of this sort are transparency (openness) and neutrality (fairness, balance, symmetry), which appear in analytical jurisprudence and arbitration. Reports that fail to disclose facts expected by readers might be viewed as violating transparency, although the bias implications of this violation may be subtle. Methodological behaviours or guidelines that fail to treat competing hypotheses or bias directions symmetrically may be taken as violating neutrality (p. 968).
But, examining a case study reveals that yet “more precision is needed in explicating and implementing such values” in practice (p. 967). In other words, replacing objectivity with transparency and neutrality makes the idea more operationalizable, but doesn’t offer an exit from problems of interpretation. There is no exit.
So his proposed alternatives, he admits, have problems: Greater transparency in conflict of interest (COI) disclosures — as in the development of a central registry of current ones — could lead important information to get buried in irrelevant information. It’s also unenforceable. And it remains exegetical what exactly constitutes a COI. One person’s transparency requirement could be another’s political litmus test for publishing science.
Neutrality runs into similar problems. One person’s neutrality could be another’s values packed into disputed facts. Such disputes come up not infrequently in hyperpolarized debates, like those dealing with Covid or abortion. Building on this discussion, the next post in this series deals with bias in research integrity work dealing with abortion research. We’re stuck with imperfect solutions to our imperfection problem — but we can still do better by analyzing where we’ve gone discernibly wrong.
It is generally true that whenever some intervention doesn't show an effect, it is possible that the intervention was done incorrectly (wrong drug dose, wrong patient population, etc, etc.). However, the conclusion stands that the intervention "as given" was ineffective. The problem of RCT vs observational data will never be answered to everyone's satisfaction. Real world biases vs difficulty in doing RCT studies means that neither will be the perfect gold standard. We have to make judgements as to cost / benefit on a case by case basis.